Lecture 4: Lexical Analysis I
-
Upload
duongduong -
Category
Documents
-
view
234 -
download
1
Transcript of Lecture 4: Lexical Analysis I
![Page 1: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/1.jpg)
CSCI-GA.2130-001
Compiler Construction
Lecture 4: Lexical Analysis I
Mohamed Zahran (aka Z)
![Page 2: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/2.jpg)
Role of the Lexical Analyzer
• Remove comments and white spaces (aka scanning)
• Macros expansion • Read input characters from the source
program • Group them into lexemes • Produce as output a sequence of tokens • Interact with the symbol table • Correlate error messages generated by
the compiler with the source program
![Page 3: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/3.jpg)
Scanner-Parser Interaction
![Page 4: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/4.jpg)
Why Separating Lexical and Syntactic?
• Simplicity of design
• Improved compiler efficiency – allows us to use specialized technique for
lexer, not suitable for parser
• Higher portability – Input-device-specific peculiarities
restricted to lexer
![Page 5: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/5.jpg)
Some Definitions
• Token: a pair consisting of – Token name: abstract symbol representing
lexical unit [affects parsing decision] – Optional attribute value [influences
translations after parsing]
• Pattern: a description of the form that different lexemes take
• Lexeme: sequence of characters in source program matching a pattern
![Page 6: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/6.jpg)
Pattern
Token classes • One token per keyword • Tokens for the operators • One token representing all identifiers • Tokens representing constants (e.g. numbers) • Tokens for punctuation symbols
![Page 7: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/7.jpg)
Example
![Page 8: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/8.jpg)
Dealing With Errors
Lexical analyzer unable to proceed: no pattern matches
• Panic mode recovery: delete successive characters from remaining input until token found
• Insert missing character • Delete a character • Replace character by another • Transpose two adjacent characters
![Page 9: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/9.jpg)
Example
What tokens will be generated from the above C++ program?
![Page 10: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/10.jpg)
Buffering Issue
• Lexical analyzer may need to look at least a character ahead to make a token decision.
• Buffering: to reduce overhead required to process a single character
![Page 11: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/11.jpg)
![Page 12: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/12.jpg)
Tokens Specification
• We need a formal way to specify patterns: regular expressions
• Alphabet: any finite set of symbols
• String over alphabet: finite sequence of symbols drawn from that alphabet
• Language: countable set of strings over some fixed alphabet
![Page 13: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/13.jpg)
![Page 14: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/14.jpg)
Operations
Zero or one instance: r? is equivalent to r|ε
Character class: a|b|c|…|z can be replaced by [a-z] a|c|d|h can be replaced by [acdh]
![Page 15: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/15.jpg)
Operations
exp1/exp2
• Match exp1 only if followed by exp2 • exp2 is NOT consumed and remained to be returned in subsequent tokens • Only one “/” is permitted per pattern • Example: a/b matches a in string ab but will not match anything in a or ac
Trailing context:
![Page 16: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/16.jpg)
Examples
Which language is generated by:
• (a|b)(a|b)
• a*
• (a|b)*
• a|a*b
![Page 17: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/17.jpg)
Example
Presenting number that can be integer with option floating point and exponential parts?
![Page 18: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/18.jpg)
Example
Presenting number that can be integer with option floating point and exponential parts?
Let’s analyze some possible solutions
![Page 19: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/19.jpg)
Examples
Write regular definition of all strings of lowercase letters in which the letters are in ascending order
![Page 20: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/20.jpg)
Tokens Recognition
![Page 21: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/21.jpg)
Implementation: Transition Diagrams
• Intermediate step in constructing lexical analyzer
• Convert patterns into flowcharts called transition diagrams – nodes or circles: called states
– Edges: directed from state to another, labeled by symbols
![Page 22: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/22.jpg)
![Page 23: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/23.jpg)
Initial state Accepting or final state
Actions associated with final state
![Page 24: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/24.jpg)
Means retract the forward pointer
![Page 25: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/25.jpg)
![Page 26: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/26.jpg)
• Places ID in symbol table if not there. • Returns a pointer to symbol table entry
• Examine symbol table for the lexeme found • Returns whatever token name is there
![Page 27: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/27.jpg)
![Page 28: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/28.jpg)
Reserved Words and Identifiers
• Install reserved words in symbol table initially
OR
• Create transition diagram for each keyword, then prioritize the tokens so that keywords have higher preference
![Page 29: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/29.jpg)
Implementation of Transition Diagram
![Page 30: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/30.jpg)
Using All Transition Diagrams: The Big Picture
• Arrange for the transition diagrams for each token to be tried sequentially
• Run transition diagrams in parallel
• Combine all transition diagrams into one
![Page 31: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/31.jpg)
The First Part of the Project
![Page 32: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/32.jpg)
The First Part of the Project
declarations %% translation rules %% auxiliary functions
![Page 33: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/33.jpg)
declarations
translation rules
auxiliary functions
![Page 34: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/34.jpg)
Anything between these 2 marks is copied as it is in lex.yy.c
braces means the pattern is defined somewhere
pattern
Actions
![Page 35: Lecture 4: Lexical Analysis I](https://reader033.fdocuments.net/reader033/viewer/2022050803/58a0392d1a28ab7c4a8c71dd/html5/thumbnails/35.jpg)
Lecture of Today
• Sections 3.1 to 3.5
• First part of the project assigned