Hacking Go Compiler Internals / GoCon 2014 Autumn

19
Hacking Go Compiler Internals Moriyoshi Koizumi <[email protected]>

description

🍣

Transcript of Hacking Go Compiler Internals / GoCon 2014 Autumn

Page 1: Hacking Go Compiler Internals / GoCon 2014 Autumn

Hacking Go Compiler Internals

Moriyoshi Koizumi <[email protected]>

Page 2: Hacking Go Compiler Internals / GoCon 2014 Autumn

Intended Audience

• An eccentric Go programmer who happens to want to add feture XX to the language, knowing her patch will never be merged.

• A keen-minded programmer who wants to know how the compiler works.

Page 3: Hacking Go Compiler Internals / GoCon 2014 Autumn

Overall Architecture

Parser

Lexer

Codegen

Escape Analysis

Typegen GCproggen

Page 4: Hacking Go Compiler Internals / GoCon 2014 Autumn

Phase 1. Lexer

Page 5: Hacking Go Compiler Internals / GoCon 2014 Autumn

Lexer

• A lexer scans over the source code and cut it into a bunch of meaningful chunks (the first abstraction).

• Example:

a := b + c()

LNAME LASOP +LNAME LNAME ( )

Page 6: Hacking Go Compiler Internals / GoCon 2014 Autumn

Lexersrc/go/cmd/gc/lexer.c

static int32_yylex(void){

...l0:

c = getc();if(yy_isspace(c)) {

if(c == '\n' && curio.nlsemi) {ungetc(c);DBG("lex: implicit semi\n");return ';';

}goto l0;

}...

Page 7: Hacking Go Compiler Internals / GoCon 2014 Autumn

Lexer

...switch(c) {...case '+':

c1 = getc();if(c1 == '+') {

c = LINC;goto lx;

}if(c1 == '=') {

c = OADD;goto asop;

}break;

....}

Page 8: Hacking Go Compiler Internals / GoCon 2014 Autumn

When do you want to hack the lexer• Modify the keyword such as func and make.

• Modify the operator only cosmetically (e.g. != → ~=)

• Modify how literals and identifiers are represented.

• Add a new keyword or operator to the language to later use in the parser.

Page 9: Hacking Go Compiler Internals / GoCon 2014 Autumn

Example: Emojis for identifiers

• http://moriyoshi.hatenablog.com/entry/2014/06/03/121728

• Go doesn’t treat emojis as part of identifiers.

• But I wanted to have 寿司 (in the source)

./sushi.go:8: invalid identifier character U+1f363

Page 10: Hacking Go Compiler Internals / GoCon 2014 Autumn

Example: Emojis for identifiers

• Patched the following place to let it accept emojis:

if(c >= Runeself) {ungetc(c);rune = getr();// 0xb7 · is used for internal namesif(!isalpharune(rune) && !isdigitrune(rune) &&

(importpkg == nil || rune != 0xb7))yyerror("invalid identifier character U+%04x",

rune);cp += runetochar(cp, &rune);

} else if(!yy_isalnum(c) && c != '_')break;

Page 11: Hacking Go Compiler Internals / GoCon 2014 Autumn

Phase 2. Parser

Page 12: Hacking Go Compiler Internals / GoCon 2014 Autumn

Parser

• Parser repeatedly calls the lexer to fetch the tokens and builds an abstract syntax tree (AST) that represents the source code.

• The AST is retouched (“typecheck”and “walk” sub-phase) during type inference and assersion phase so it would be less verbose and contain information helpful for the later stages.

• src/cmd/gc/go.y, src/cmd/gc/dcl.csrc/cmd/gc/typecheck.c, src/cmd/gc/walk.c, src/cmd/gc/reflect.c

Page 13: Hacking Go Compiler Internals / GoCon 2014 Autumn

Parser

LNAME LASOP +LNAME LNAME ( )

OAS

ONAME

OADDONAME

OCALL

ONAME ∅

Tokens

AST

Page 14: Hacking Go Compiler Internals / GoCon 2014 Autumn

Parser

• src/cmd/gc/go.y

…/** expressions*/expr:

uexpr| expr LOROR expr

{$$ = nod(OOROR, $1, $3);

}| expr LANDAND expr

{$$ = nod(OANDAND, $1, $3);

}…

Page 15: Hacking Go Compiler Internals / GoCon 2014 Autumn

Example: Bracket operator overload!• Let the following code (A) expand to (B)

• https://gist.github.com/moriyoshi/c0e2b2f9be6883e33251

(A)

(B)

a := &struct{}{}fmt.Println(a[1])a[1] = "test2"

fmt.Println(a.__getindex(1))a.__setindex(1, "test2")

Page 16: Hacking Go Compiler Internals / GoCon 2014 Autumn

Example: Bracket operator overload!• Things to do:

• Introduce a new AST node type (e.g. OINDEXINTER)

• Add a branch point in “typecheck” to handle the case where the indexed target is neither a string, array, slice nor map type.

• Supply a code in “walk” to specially treat the assignment and dereference that involves that kind of node. The code synthesizes the node to invoke the special functions, then typecheck and walk over themselves in a recursive manner.

• Don’t forget to take care of evaluation order corrector.

Page 17: Hacking Go Compiler Internals / GoCon 2014 Autumn

Helpful functions to debug your hack• print(const char *, …)

• This is actually printf() of standard libc.

• Accepts the following extra format specifiers:• %N (node)

• %T (type)

• %E, %J, %H, %L, %O, %S, %V, %Z, %B, %F

Page 18: Hacking Go Compiler Internals / GoCon 2014 Autumn

Roll-up

• Go’s compiler internals should look complex at first glance, but it would turn out pretty straightforward and hacker-friendly ;)

Page 19: Hacking Go Compiler Internals / GoCon 2014 Autumn