1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics...
-
Upload
morgan-robinson -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics...
![Page 1: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/1.jpg)
1
Parsing Tools
Introduction to Bison and Flex
![Page 2: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/2.jpg)
2
Scanning/parsing tools
lex - original UNIX lexics generator (Lesk, 1975)
create a C function that will parse input according to a set of regular expressions
yacc - "yet another compiler compiler" UNIX parser (Johnson, 1975)
generate a C program for a parser from BNF rules
bison and flex ("fast lex") - more powerful, free versions of yacc and lex, from GNU Software Fnd'n.
Jflex - generates Java code for a scanner
CUP - generates Java code for a parser
![Page 3: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/3.jpg)
3
Bison Overview
Purpose: automatically write a parser program for a grammar written in BNF.
Usage: you write a bison source file containing rules that look like BNF.
Bison creates a C program that parses according to the rules
term : term '*' factor { $$ = $1 * $3; } | term '/' factor { $$ = $1 / $3; } | factor { $$ = $1; } ;factor : ID { $$ = valueof($1); } | NUMBER { $$ = $1; } ;
![Page 4: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/4.jpg)
4
Bison Overview (2)
> bison myparser.y
myparser.tab.c
parser source code
myparser.y
BNF rules and actions for your grammar.
yylex.c
tokenizer function in C
> gcc -o myprog myparser.tab.c yylex.c
myprog
executable program
The programmer puts BNF rules and token rules for the parser he wants in a bison source file myparser.y
run bison to create a C program (*.tab.c) containing a parser function.
The programmer must also supply a tokenizer named yylex( )
![Page 5: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/5.jpg)
5
Bison Overview (3)
input file to be parsed.
yyparse( )
parser created by bison
In operation:
your main program calls yyparse( ).
yyparse( ) calls yylex when it wants a token.
yylex returns the type of the token.
yylex puts the value of the token in a global variable named yylval
yylex( )
tokenizer returns the type of the next token
yylval
parse tree or other result
![Page 6: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/6.jpg)
6
Bison source file
/* declarations go here */
%%/* grammar rules go here */
%%/* additional C code goes here */
The file has 3 sections, separated by "%%" lines.
Note: format for "yacc" is the same as for bison.
![Page 7: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/7.jpg)
7
Bison source file with C declarations
%{ /* C declarations and #define statements go here */#include <stdio.h>#define YYSTYPE double
%}/* bison declarations go here */
%%/* grammar rules go here */
%%/* additional C code goes here */
You usually include C code in the declarations section.
Declare that yylval will be a "double" type.
![Page 8: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/8.jpg)
8
Bison Example
Create a parser for this grammar:
expression => expression + term | expression - term| term
term => term * factor| term / factor| factor
factor => ( expression )| NUMBER
![Page 9: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/9.jpg)
9
Bison/Yacc file for example (1)
%{/* C declarations and #DEFINE statements go here */ #include <stdio.h> #define YYSTYPE double%}/* Bison/Yacc declarations go here */%token NUMBER /* define token type NUMBER */%left '+' '-' /* + and - are left associative */%left '*' '/' /* * and / are left associative */
%%/* grammar rules go here */%%/* additional C code goes here */
Structure of Bison or Yacc input:
![Page 10: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/10.jpg)
10
Bison/Yacc example (2)
%% /* Bison grammar rules */input : /* empty production to allow an empty input */ | input line ;line : expr '\n' { printf("Result is %f\n", $1); }expr : expr '+' term { $$ = $1 + $3; } | expr '-' term { $$ = $1 - $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | term '/' factor { $$ = $1 / $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | NUMBER { $$ = $1; } ;
![Page 11: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/11.jpg)
11
Bison/Yacc example (3)
$1, $2, ... represent the actual values of tokens or non-terminals (rules) that match the production.
$$ is the result.
expr : expr '+' term { $$ = $1 + $3; } | expr '-' term { $$ = $1 - $3; } | term { $$ = $1; } ;
Example:if the input matches expr + term then set the result ($$) equal to the sum of expr plus term ($1 + $3).
pattern to match actionrule
![Page 12: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/12.jpg)
12
Bison/Yacc example (4)
Q: why can we write "$$ = $1 + $3" ?
A: because we declared "#define YYSTYPE double", so all tokens and results are double.
expr : expr '+' term { $$ = $1 + $3; } | expr '-' term { $$ = $1 - $3; } | term { $$ = $1; } ;
pattern to match actionrule
![Page 13: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/13.jpg)
13
Scanner function: yylex( ) You must supply a scanner named yylex. yylex returns the token TYPE (not token value).
int yylex( void ) {int c = getchar(); /* read from stdin */if (c < 0) return 0; /* end of input */if ( c == '+' || c == '-' ) return c; /* for character tokens, TYPE = character itself */if ( isdigit(c) ) {
yylval = c - '0'; /* yylval is a global var */while( isdigit( c=getchar() ) )
yylval = 10*yylval + (c - '0');if (c >= 0) ungetc(c,stdin);return NUMBER; /* token type is NUMBER */
}...}
![Page 14: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/14.jpg)
14
Where is the token value? The value of the token is stored in a global variable named yylval.
int yylex( void ) {int c = getchar(); /* read from stdin */if (c < 0) return 0; /* end of input */if ( c == '+' || c == '-' ) return c; /* read number and store as yylval */if ( isdigit(c) ) {
yylval = c - '0'; /* get each digit */while( isdigit( c=getchar() ) )
yylval = 10*yylval + (c - '0');/* push next character back into input stream */if (c >= 0) ungetc(c,stdin);return NUMBER; /* token type is NUMBER */
}...}
![Page 15: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/15.jpg)
15
Useful C functions for yylex
int c = getchar( ); read next char from stdinreturns -1 at end of input
ungetc( c, stdin) put character back in input
#include <ctype.h>isdigit(c) true if c contains a digitisalpha(c) true if c contains a letterisalnum(c) isdigit(c) || isalnum(c)islower(c) true if c is lowercaseisupper(c) guess?isspace(c) space tab newline formfeed
scanf("%d", &num) read an integer from inputscanf("%lf", &dnum) read a double from input
![Page 16: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/16.jpg)
16
Scanner function for double values Suppose we want the data type of all tokens to be double.
In Bison input: #define YYSTYPE double.
This changes yylval to be a double!
%{ /* C declarations and #DEFINE statements go here */ #include <stdio.h> #define YYSTYPE double%}%%/* bison definitions go here */
![Page 17: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/17.jpg)
17
Scanner function for double Now yylex must know that yylval is "extern double".
Here is example of using scanf to parse numbers.
int yylex( void ) {int c = getchar(); /* read from stdin */if (c < 0) return 0; /* end of the input*/while ( c == ' ' || c == '\t' ) c = getchar( );if ( isdigit(c) || c == '.' ) {ungetc(c, stdin); /* put c back into input */scanf ("%lf", &yylval); /* get value using scanf */return NUMBER; /* return the token type */}return c; /* anything else... return char itself */
}
![Page 18: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/18.jpg)
18
Other C functions: yyerror
Bison requires an error routine named yyerror.
yyerror is called by the parser when there is an error.
You can include yyerror( ) in your Bison source file.
/* display error message */
int yyerror( char *errmsg ) {
printf("%s\n", errmsg);
}
![Page 19: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/19.jpg)
19
Other C functions: main
you need a write a main() function that starts the parser.
For a simple parser, main() calls yyparse().
/* main method to run the program */
int main( ) {
printf("Type some input. Enter ? for help.\n");
yyparse( );
}
![Page 20: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/20.jpg)
20
Running Bison
Compile the example file simple.y
CMD> bison simple.y
Output is "simple.tab.c" it is C code for the parser.
To compile the parser by itself using gcc:
CMD> gcc -o simple.exe simple.tab.c
To run the program at the command prompt:
CMD> simple.exe
![Page 21: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/21.jpg)
21
Simple example: definitions
/* The Bison declarations section */
%{
/* C declarations and #DEFINE statements go here */
#include <math.h>
#define YYSTYPE double
%}
%token NUMBER /* define token type for numbers */
%token '+' '-' /* + and - are left associative */
File: simple.y
No left/right associativity specified
![Page 22: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/22.jpg)
22
Simple example: grammar rules
%% /* Simple grammar rules */input : /* allow empty input */ | input line ;line : expr '\n' { printf("answer: %d\n", $1); }expr : expr '+' term { $$ = $1 + $3; } | expr '-' term { $$ = $1 - $3; } | term { $$ = $1; } ;term : NUMBER { $$ = $1; } ;
![Page 23: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/23.jpg)
23
yyerror and main
%% /* extra C code *//* display error message */int yyerror( char *errmsg ) { printf("%s\n", errmsg); }
/* main */int main() {
printf("type an expression:\n");yyparse( );
}
![Page 24: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/24.jpg)
24
Simple example: does it work?
cmd> bison simple.y
cmd> gcc -o calc.exe simple.tab.c
cmd> calc
Test the grammar rules:
3 + 5
3 - 4 - 5 + 6
How about this: -3
![Page 25: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/25.jpg)
25
Simple example: exploring BNF
%% /* Simple grammar rules */input : /* empty input */ | input line ;line : expr '\n' { printf("answer: %d\n", $1); }expr : expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3; } | term { $$ = $1; } ;term : NUMBER { $$ = $1; } | '-' NUMBER { $$ = -$2; } ;
%token NUMBER%right '+' '-'
![Page 26: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/26.jpg)
26
Exercise
Expand the grammar to include these operations:
4 * 5 multiplication
2 / 3 division
10 + 3 * 4 – 1 / 2 correct order of operations
2 * ( 3 + 4 ) grouping
![Page 27: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/27.jpg)
27
Complete example: definitions
/* The Bison declarations section */
%{
/* C declarations and #DEFINE statements go here */
#include <math.h>
#define YYSTYPE double
%}
%token NUMBER /* define token type for numbers */
%left '+' '-' /* + and - are left associative */
%left '*' '/' /* * and / are left associative */
File: simple.y
![Page 28: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/28.jpg)
28
Complete example: grammar rules
%% /* Bison grammar rules */input : /* allow empty input */ | input line ;line : expr '\n' { printf("Result is %f\n", $1); }expr : expr '+' term { $$ = $1 + $3; } | expr '-' term { $$ = $1 - $3; } | term { $$ = $1; } ;term : term '*' factor { $$ = $1 * $3; } | term '/' factor { $$ = $1 / $3; } | factor { $$ = $1; } ;factor : '(' expr ')' { $$ = $2; } | NUMBER { $$ = $1; } | '-' NUMBER { $$ = -$2; } ;
![Page 29: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/29.jpg)
29
Common Errors in Bison
1. Forgetting to quote literals: term '+' term
2. Not skipping space where space is allowed
3. Ambiguity in grammar rules
4. Forgetting ';' at the end of rule
%token NUMBER%token + -%% /* grammar rules */
expr : term + term { $$ = $1 + $3; }| term - term { $$ = $1 - $3; }| term { $$ = $1; }
term : NUMBER { $$ = $1; }| - NUMBER { $$ = -$2; }
![Page 30: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/30.jpg)
30
Shift / Reduce and operator order
Bison uses token look-ahead and a token stack. It shifts tokens onto the stack until it can choose a rule to reduce (replace) tokens with a non-terminal. Example:
expr ::= expr + expr
| expr - expr
| expr * expr
| expr / expr
| term
term ::= NUMBER | ( expr )
Suppose the input read is 10 - shift these onto stack because no matching rule yet. Suppose the next token is 2. What should Bison do?
![Page 31: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/31.jpg)
31
Shift / Reduce and operator order
INPUT ACTION STACK
10 shift 10
- shift 10 -
2 shift 10 - 2
* shift 10 - 2 *
STACK: 10 - CURRENT TOKEN: 2 This grammar is ambiguous. Bison could match "expr - expr" or it could shift 2 onto the stack and look at the next token... it maybe * such as: 10 - 2 * 3
This is called a "shift / reduce conflict". Unless you specify a disambiguating rule (next slide),
Bison chooses "shift" over "reduce". Meaning: if it's not clear how to resolve conflicts, wait.
INPUT ACTION STACK
3 reduce 10 - 6
reduce 4done
![Page 32: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/32.jpg)
32
The declarations section
The declarations section can contain Bison directives and C directives.
We already saw the use of %left, %right, etc.
%{
/* semantic value of tokens, as C datatype */
#define YYSTYPE double
#include <math.h>
/* why do we have to declare these? */
int yylex (void);
void yyerror (char const *);
%}
%token NUMBER
%left '+' '-'
%left '*' '/'
![Page 33: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/33.jpg)
33
Specifying operator order
In the declarations section you can write:
%noassoc NEG /* unary minus sign: - 3 */
%left '+' '-'
%left '*' '/' '%'
%right '^'
'+' and '-' are left associative and have the same precedence.
'*', '/', and '%' left associative and same precedence; they have higher precedence (later declarations are higher).
'^' is right associative and higher precedence than + - * / % 'NEG' has no associativity: "- - 3" is an error
![Page 34: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/34.jpg)
34
Scanner function main points
Tokens have a TYPE and a VALUE. The scanner (yylex) returns the TYPE of the next token For one-character tokens like '+', '=', "(' the character
itself can be used as the type. Define symbolic names for token types in Bison using %token NAME
Return the VALUE of a token using the global variable yylval.
By default, yylval is type "int". Change it using: #define YYSTYPE double
Rather than parse numbers yourself, use scanf.
![Page 35: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/35.jpg)
35
Scanner function main points (1)
%{#include <ctype.h>#include <math.h>#define YYSTYPE double
%}%token NUMBER%token SQRT%left '+' '-'%left '*' '/'%%line : expr '\n' { printf("%g\n", $$); }
;expr : expr * expr { $$ = $1 + $3; }
| SQRT '(' expr ')' { $$ = sqrt( $2 ); }... /* more rules */
Token for sqrt function
Token values are double
![Page 36: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/36.jpg)
36
Scanner function main points (2)
%%int yylex(void) {
int c = getchar( );while ( c == ' ' || c == '\t' ) c = getchar( );if ( isdigit(c) ) { ungetc(c,stdin);
scanf("%f", &yylval);return NUMBER;
}if ( isalpha(c) ) {
char *word = getword(c); /* get next word */if ( strcmp(word,"sqrt") ) return SQRT;
...}
Token is a number
Token is sqrt function
For a more general way of handling identifiers, see the Bison User's Guide, "mfcalc" example.
![Page 37: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/37.jpg)
37
Handling Multiple Data Types For a more general grammar, the scanner should be able to return
different data types as the value (yylval). In definitions section, define "%union" as union of all data types
that yylval (and hence $1, $2, ...) can have.
%union {double number;char* string;}
%token <number> NUMBER%token <string> IDENT%type <number> expr%type <number> term%left '+' '-'%left '*' '/'
Define all the data types that token values can have
For each token type, define the data type of its value.
For tokens that represent their own value, don't need to define
data type.
![Page 38: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/38.jpg)
38
Handling Multiple Data Types (2)
%%int yylex(void) {
int c = getchar( );while ( c == ' ' || c == '\t' ) c = getchar( );if ( isdigit(c) ) { ungetc(c,stdin);
double x;scanf("%f", &x);yylval.number = x;return NUMBER;
}if ( isalpha(c) ) {
char *word = getword(c); /* get next word */yylval.string = word;return IDENT;
...}
token value is a double
token value is a string
![Page 39: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/39.jpg)
39
Using a Separate File for yylex
You can put the scanner (yylex) in a separate file. BUT, yylex needs values that are defined by Bison,
such as NUMBER, IDENT, yylval. Solution: add the "%defines" option to your rules Bison will create a header file named
"simple.tab.h".
%defines
%{
#include <math.h>
#define YYSTYPE double
%}
token NUMBER
...
![Page 40: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/40.jpg)
40
Using a Makefile
make is an automatic build facility. Just type "make"
Reads a Makefile of "rules" for how to make things
# Makefile for simple calculatorcalc.exe: simple.tab.c yylex.o
gcc -o calc.exe simple.tab.c yylex.o -lm
# how to make simple.tab.c from simple.ysimple.tab.c: simple.y
bison simple.y
# yylex.o (obj. file) depends on simple.tab.hyylex.o: yylex.c simple.tab.h
gcc –c yylex.c
![Page 41: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/41.jpg)
41
Running Make
C:/calculator> make
bison -d simple.ygcc -c simple.tab.cgcc -o simple.exe simple.tab.o yylex.c -lm
![Page 42: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/42.jpg)
42
Advantages of Make Makefile record all the dependencies in a project
make only builds the parts that are missing or out of date
can manage complex projects with multiple Makefiles in subdirectories
![Page 43: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/43.jpg)
43
Introduction to flex
Flex is a program that automatically creates a scanner in C, using rules for tokens as regular expressions.
Format of the input file is like Bison.
%{/* C definitions for scanner */
%}flex definitions %% rules %% user code (extra C code)
![Page 44: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/44.jpg)
44
Flex Example
Read console input and describes each token read.
%{ inline int yywrap(void) { return 1; };%}/* flex definitions */DIGIT [0-9]LETTER [A-Za-z]%%{LETTER}+ printf("Word: %s\n", yytext); return 1;-?{DIGIT}+ printf("Number: %s\n", yytext); return 2;[[:punct:]] printf("Punct: %s\n", yytext); return 3;\n printf("End of line\n"); return 0;%% /* main method calls yylex to tokenize input */int main( ) { printf("Type something: ");
while( yylex() > 0 ) { }; }
![Page 45: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/45.jpg)
45
Flex Example (2)
Run flex to create yylex.c (on Linux: lex.yy.c).
cmd> flex mylex.fl
Compile lex.yy.c and create myscanner.exe
cmd> gcc -o myscanner.exe yylex.c
Run the program:
cmd> myscanner
Type something: hello, it's 9:00 o'clock. word: hellopunct: , word: itpunct: '
![Page 46: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/46.jpg)
46
Flex Explained: definitions
%{ /* C definitions and includes */#include "myparser.tab.h"
/* this fixes an unknown symbol "yywrap" */#ifdef yywrap#undef yywrap#endifinline int yywrap(void) { return 0; };
%}/* flex definitions */DIGIT [0-9]LETTER [A-Za-z]%%
![Page 47: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/47.jpg)
47
Flex Explained: Parsing Rules (1) pattern is a regular expression or a literal value.
action is zero or more C statements to execute.
The current matched input is (char *) yytext
The length of matched input is yyleng
{LETTER}+ printf("Found a word: %s", yytext);-?[0-9]+ printf("Found an int: %s", yytext);
pattern action
yytext is a string containing the current token.
![Page 48: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/48.jpg)
48
Flex Explained: Parsing Rules (2) Keep matching input until a return statement.
Flex chooses the rule that produces longest match.
If there is a tie, choose the first rule that matches.
Use yylval (global variable) for value of token.
%{ /* Bison header file defines INT, IDENT...*/#include "myparser.tab.h"
%}%% [0-9]+ yylval.itype = atoi(yytext); return INT;{LETTER}+ yylval.ctype = yytext; return IDENT;"+"|"-" yylval.ctype = yytext; return OP;"=" return ASSIGN;
![Page 49: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/49.jpg)
49
Flex yywrap
Flex can read input from multiple input files. when flex reaches the end of a file, it calls yywrap() if yywrap() returns 0, it means that there is another
input file to read. if yywrap() returns 1, it means that there are no more
input files.
%{ inline int yywrap(void) { return 1; };%}
/* same thing */%option noyywrap
![Page 50: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/50.jpg)
50
Using flex with bison1. Run "bison -d file.y" to create file.tab.h
2. #include "file.tab.h" in your flex source.
3. Set yylval to the token value.
4. Run flex.
5. Compile and link lex.yy.c and file.tab.c
%{#include <math.h>#include "file.tab.h"
%}LETTER [A-Za-z] /* same as [[:alpha:]] */%% -?[0-9]+ yylval.itype = atoi(yytext); return INT;{LETTER}+ yylval.ctype = yytext; return IDENT;"+"|"-" yylval.ctype = yytext; return OP;"=" return ASSIGN;
![Page 51: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/51.jpg)
51
Using Flex and Bison
> bison –d mygrammar.y
mygrammar.tab.c
parser source code
mygrammer.y
BNF rules for your grammar.
lex.yy,c
tokenizer function in C
> gcc -o myprog mygrammar.tab.c lex.yy.c
myprog
executable program
mylex.fl
Regular expressions that match and return tokens
> flex mylex.fl
![Page 52: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/52.jpg)
52
Learning Bison, Flex, CUP
On Linux or most UNIX systems type:
info bison
Using "info": SPACEBAR = next pageALT-V = previous page, CTRL-C = quit
Documentation on the Web:
GNU Free Software Foundation
http://www.gnu.org/software/bison/bison.html
... or many tutorials online (search Google)
Similar documentation for flex. CUP manual: http://www2.cs.tum.edu/projects/cup
includes many easy examples!
![Page 53: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/53.jpg)
53
Getting the Software bison and flex are included with Linux distributions
MS Windows version of GNU programs
http://gnuwin32.sourceforge.net/
http://gnuwin32.sourceforge.net/packages.html
The packages include excellent user's guides with examples.
CUP:
http://www2.cs.tum.edu/projects/cup/
software (jar file) and excellent manual; links to resources
Jflex (lex that creates java code):
http://jflex.sourceforge.net/
![Page 54: 1 Parsing Tools Introduction to Bison and Flex. 2 Scanning/parsing tools lex - original UNIX lexics generator (Lesk, 1975) create a C function that will.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649da05503460f94a8b0fb/html5/thumbnails/54.jpg)
54
GNU Tools and C/C++ compiler
Cygwin: www.cygwin.com
MinGW: www.mingw.org
free GNU toolkit.
Dev-C++
C++ integrated development environment
two downloads: with gcc or without gcc