Getting Started with ANTLR
description
Transcript of Getting Started with ANTLR
![Page 1: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/1.jpg)
Getting Started with ANTLR
Chapter 1
![Page 2: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/2.jpg)
Domain Specific Languages
• DSLs are high-level languages designed for specific tasks
• DSLs include data formats, configuration file formats, text-processing languages, …
• DSLs make their users effective in a specific domain
![Page 3: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/3.jpg)
The Big Picture
• Translators map input sentences to output sentences
• Translators have to recognize many different sentences
• We break recognition into two similar but distinct tasks: lexical analysis and parsing
![Page 4: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/4.jpg)
Lexical Analysis
• Lexical analysis consists of reading the input stream, character by character.
• Characters are combined and output as “tokens”
• if (x > 312){ system.out.println(“Hi”);}
• Tokens: if, (, x,WS, >,WS, 312, ),{, system.out.println, (,”Hi”, ), ;, }
![Page 5: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/5.jpg)
Lexical Analysis
• Tokens carry additional information in addtion to the characters they represent
• ANTLR generates a lexical analyser, a Lexer, based on an input grammar it is provided
• We will be building grammars and having ANTLR generate the lexer code for us
![Page 6: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/6.jpg)
Parsing
• Parsing consists of reading tokens and trying to organize them into a valid sentence in the language
• The parser can generate output immediately based on the sentences it recognizes or preserve the structure in the form of an abstract syntax tree (AST) which can be further processed
![Page 7: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/7.jpg)
Translation Data Flow
Lexer Parser
Tree
Walker
CharactersTokens
Output
AST
![Page 8: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/8.jpg)
Finally
• An emitter can take the output of the parser and emit output based on all computations of the previous phases
• Emitter can use templates (documents with holes) that can be filled in
• ANTLR uses the StringTemplate engine to make it easier to build emitters
![Page 9: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/9.jpg)
Characters, Tokens, ASTs
• Lexers consume characters from a CharStream such as ANTLRStream or ANTLRFileStream
• These streams assume that the entire input fits into memory and, as a result, can buffer all characters in memory
• Tokens point directly to character strings in the buffer rather than creating String objects for each token
![Page 10: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/10.jpg)
Characters, Tokens, ASTs
... W I D T H = 2 0 0 ; \ n …Characters
(CharStream)
… ID WS = WS INT ; WS …
x x x
tokens
(Token)
=
ID INT
AST
(CommonTree)
![Page 11: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/11.jpg)
Characters, Tokens, ASTs
• AST nodes point at token objects rather than copying token data into a tree node
• CommonTree is a predefined node containing a Token payload.
• The type of an ANTLR AST node is treated as Object so there are no restrictions on tree data types
![Page 12: Getting Started with ANTLR](https://reader036.fdocuments.net/reader036/viewer/2022082422/56813c6c550346895da5fd73/html5/thumbnails/12.jpg)
A-mazing Analogy
• Book focuses on the discovery of the implicit tree structure in input sentences and the generation of structured text
• A maze can be thought of as a language recognizer. Imagine a maze with words written on the floor
• Any sentence that guides you from the entrance to the exit is “valid”