SEG 2106 SOFTWARE CONSTRUCTION INSTRUCTOR: HUSSEIN AL OSMAN THE COURSE MATERIAL IS BASED ON THE...

download SEG 2106 SOFTWARE CONSTRUCTION INSTRUCTOR: HUSSEIN AL OSMAN THE COURSE MATERIAL IS BASED ON THE COURSE CONSTRUCTED BY PROFS: GREGOR V. BOCHMANN ( HTTPS://BOCHMANN/)

If you can't read please download the document

Transcript of SEG 2106 SOFTWARE CONSTRUCTION INSTRUCTOR: HUSSEIN AL OSMAN THE COURSE MATERIAL IS BASED ON THE...

  • Slide 1
  • SEG 2106 SOFTWARE CONSTRUCTION INSTRUCTOR: HUSSEIN AL OSMAN THE COURSE MATERIAL IS BASED ON THE COURSE CONSTRUCTED BY PROFS: GREGOR V. BOCHMANN ( HTTPS://WWW.SITE.UOTTAWA.CA/~BOCHMANN/) JIYING ZHAO (HTTP://WWW.SITE.UOTTAWA.CA/~JYZHAO/)
  • Slide 2
  • COURSE SECTIONS Section 0: Introduction Section 1: Software development processes + Domain Analysis Section 2: Requirements + Behavioral Modeling (Activity Diagrams) Section 3: More Behavioral Modeling (State Machines) Section 4: More Behavioral Modeling (Case Study) Section 5: Petri Nets Section 6: Introduction to Compilers Section 7: Lexical Analysis Section 8: Finite State Automata Section 9: Practical Regular Expressions Section 10: Introduction to Syntax Analysis Section 11: LL(1) Parser Section 12: More on LL Parsing (Error Recovery and non LL(1) Parsers) Section 13: LR Parsing Section 14: Introduction to Concurrency Section 15: More on Concurrency Section 16: Java Concurrency Section 17: Process Scheduling Section 18: Web Services 2
  • Slide 3
  • SECTION 0 SYLLABUS 3
  • Slide 4
  • WELCOME TO SEG2106 4
  • Slide 5
  • COURSE INFORMATION Instructor: Hussein Al Osman E-mail: [email protected]@uottawa.ca Office: SITE4043 Office Hours: TBD Web Site: Virtual Campus (https://maestro.uottawa.ca)https://maestro.uottawa.ca 5 Lectures: Wednesday13:00 - 14:30 STE C0136 Friday13:00 - 14:30 MRT 211 Labs: Monday 17:30 - 20:30 STE 2052 Tuesday 19:00 - 22:00STE 0131
  • Slide 6
  • EVALUATION SCHEME Assignments (4)25% Labs (7)15% Midterm Exam20% Final Exam 40% Late assignments are accepted for a maximum of 24 hours and they will receive a 30% penalty. 6
  • Slide 7
  • LABS Seven labs in total Three formal labs (with a report) Worth between 3 to 4% The other labs are informal (without a report) 1% for each one You show you work to the TA at the end of the session 7
  • Slide 8
  • INFORMAL LABS Your mark will be proportional to the number of task successfully completed: All the tasks are completed: 1% More than half completed: 0.75% Almost half is completed: 0.5% You have tried at least (given that you attended the whole session): 0.25% 8
  • Slide 9
  • MAJOR COURSE TOPICS Chapter 1: Introduction and Behavioral Modeling Introduction to software development processes Waterfall model Iterative (or incremental) model Agile model Behavioral modeling UML Use case models (seen previously) UML Sequence diagrams (seen previously) UML activity diagrams (very useful to model concurrent behavior) UML state machines (model the behavior of a single object) Petri Nets SDL 9
  • Slide 10
  • MAJOR COURSE TOPICS Chapter 2: Compilers, formal languages and grammars Lexical analysis (convert a sequence of characters into a sequence of tokens) Formal languages Regular expressions (method to describe strings) Deterministic and Non-deterministic Finite Automata Syntax analysis Context-free grammar (describes the syntax of a programming language) Syntactic analysis Syntax trees 10
  • Slide 11
  • MAJOR COURSE TOPICS Chapter 3: Concurrency Logical and physical concurrency Process scheduling Mutual exclusion for access to shared resources Concurrency and Java programing Design patterns and performance considerations 11
  • Slide 12
  • MAJOR COURSE TOPICS Chapter 4: Cool topics! We will vote on one or more of these topics to cover (given that we have completed the above described material, with some time to spare) Mobile programing (mostly Android) Web services J2EE major components Spring framework Agile programing (especially SCRUM) Other suggestions 12
  • Slide 13
  • CLASS POLICIES Late Assignments Late assignments are accepted for a maximum of 24 hours and they will receive a 30% penalty. 13
  • Slide 14
  • CLASS POLICIES Plagiarism Plagiarism is a serious academic offence that will not be tolerated. Note that the person providing solutions to be copied is also committing an offence as they are an active participant in the plagiarism. The person copying and the person copied from will be reprimanded equally according to the regulations set by the University of Ottawa. Please refer to this link for more information: www.uottawa.ca/academic/info/regist/crs/0305/home_5_ENG.htm. www.uottawa.ca/academic/info/regist/crs/0305/home_5_ENG.htm 14
  • Slide 15
  • CLASS POLICIES Attendance Class attendance is mandatory. As per academic regulations, students who do not attend 80% of the class will not be allowed to write the final examinations. All components of the course (i.e laboratory reports, assignments, etc.) must be fulfilled otherwise students may receive an INC as a final mark (equivalent to an F). Absence from a laboratory session or an examination because of illness will be excused only if you provide a certificate from Health Services (100 Marie Curie, 3rd Floor) within the week following your absence. 15
  • Slide 16
  • SECTION 1 SOFTWARE DEVELOPMENT PROCESS AND DOMAIN ANALYSIS
  • Slide 17
  • LECTURE TOPICS This lecture will briefly touch on the following topics: Software Development Process Domain Analysis
  • Slide 18
  • TOPIC 1 SOFTWARE DEVELOPMENT PROCESS
  • Slide 19
  • LIFE CYCLE The life cycle of a software product from inception of an idea for a product through domain analysis requirements gathering architecture design and specification coding and testing delivery and deployment maintenance and evolution retirement
  • Slide 20
  • MODELS ARE NEEDED Symptoms of inadequacy: the software crisis scheduled time and cost exceeded user expectations not met poor quality The size and economic value of software applications required appropriate "process models"
  • Slide 21
  • PROCESS AS A "BLACK BOX"
  • Slide 22
  • PROBLEMS The assumption is that requirements can be fully understood prior to development Unfortunately the assumption almost never holds Interaction with the customer occurs only at the beginning (requirements) and end (after delivery)
  • Slide 23
  • PROCESS AS A "WHITE BOX"
  • Slide 24
  • ADVANTAGES Reduce risks by improving visibility Allow project changes as the project progresses based on feedback from the customer
  • Slide 25
  • THE MAIN ACTIVITIES They must be performed independently of the model The model simply affects the flow among activities
  • Slide 26
  • WATERFALL MODELS Invented in the late 1950s for large air defense systems, popularized in the 1970s They organize activities in a sequential flow Standardize the outputs of the various activities (deliverables) Exist in many variants, all sharing sequential flow style
  • Slide 27
  • A WATERFALL MODELS Domain analysis and feasibility study Requirements Design Coding and module testing Integration and system testing Delivery, deployment, and maintenance
  • Slide 28
  • WATERFALL STRENGTHS Easy to understand, easy to use Provides structure to inexperienced staff Milestones are well understood Sets requirements stability
  • Slide 29
  • WATERFALL WEAKNESSES All requirements must be known upfront Deliverables created for each phase are considered frozen inhibits flexibility Can give a false impression of progress Does not reflect problem-solving nature of software development iterations of phases Integration is one big bang at the end Little opportunity for customer to preview the system (until it may be too late)
  • Slide 30
  • WHEN TO USE WATERFALL Requirements are very well known Product definition is stable Technology is very well understood New version of an existing product (maybe!) Porting an existing product to a new platform High risk for new systems because of specification and design problems. Low risk for well-understood developments using familiar technology.
  • Slide 31
  • WATERFALL WITH FEEDBACK Domain analysis and feasibility study Requirements Design Coding and module testing Integration and system testing Delivery, deployment, and maintenance
  • Slide 32
  • ITERATIVE DEVELOPMENT PROCESS Also referred to as incremental development process Develop system through repeated cycle (iterations) Each cycle is responsible for the development of a small portion of the solution (slice of functionality) Contrast with waterfall: Water fall is a special iterative process with only one cycle
  • Slide 33
  • ITERATIVE DEVELOPMENT PROCESS Iteration Planning Requirements Update Architecture and Design Implementation Domain Analysis and Initial Planning Testing Evaluation (involving end user) Deployment Cycle
  • Slide 34
  • AGILE METHODS Dissatisfaction with the overheads involved in software design methods of the 1980s and 1990s led to the creation of agile methods. These methods: Focus on the code rather than the design Are based on an iterative approach to software development Are intended to deliver working software quickly and evolve this quickly to meet changing requirements The aim of agile methods is to reduce overheads in the software process (e.g. by limiting documentation) and to be able to respond quickly to changing requirements without excessive rework.
  • Slide 35
  • AGILE MANIFESTO We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more.
  • Slide 36
  • THE PRINCIPLES OF AGILE METHODS PrincipleDescription Customer involvement Customers should be closely involved throughout the development process. Their role is to provide and prioritize new system requirements and to evaluate the iterations of the system. Incremental delivery The software is developed in increments with the customer specifying the requirements to be included in each increment. People not process The skills of the development team should be recognized and exploited. Team members should be left to develop their own ways of working without prescriptive processes. Embrace change Expect the system requirements to change and so design the system to accommodate these changes. Maintain simplicityFocus on simplicity in both the software being developed and in the development process. Wherever possible, actively work to eliminate complexity from the system.
  • Slide 37
  • SCRUM PROCESS
  • Slide 38
  • PROBLEMS WITH AGILE METHODS It can be difficult to keep the interest of customers who are involved in the process Team members may be unsuited to the intense involvement that characterizes agile methods Prioritizing changes can be difficult where there are multiple stakeholders Minimizing documentation: almost nothing is captured, the code is the only authority
  • Slide 39
  • TOPIC 2 DOMAIN ANALYSIS
  • Slide 40
  • DOMAIN MODELING The aim of domain analysis is to understand the problem domain independently of the particular system we intend to develop. We do not try to draw the borderline between the system and the environment. We focus on the concepts and the terminology of the application domain with a wider scope than the future system.
  • Slide 41
  • ACTIVITIES AND RESULTS OF DOMAIN ANALYSIS 1.A dictionary of terms defining the common terminology and concepts of the problem domain; 2.Description of the problem domain from a conceptual modeling viewpoint We normally use UML class diagrams (with as little detail as possible) Remember, we are not designing, but just establishing the relationship between entities 3.Briefly describe the main interactions between the user and the system
  • Slide 42
  • EXAMPLE PROBLEM DEFINITION
  • Slide 43
  • We want to design the software for a simple Point of Sale Terminal that operates as follows: Displays that amount of money to pay for the goods to be purchased Asks the user to insert a financial card (debit or credit) If the user inserts a debit card, he or she is asked to choose the account type Asks the user to enter a pin number Verifies the pin number against the one stored on the chip Contacts the bank associated with the card in order to perform the transaction
  • Slide 44
  • EXAMPLE DICTIONARY OF TERMS (1) Point of Sale Terminal: machine that allows a retail transaction to be completed using a financial card Credit card: payment card issued to users as a system of payment. It allows the cardholder to pay for goods and services based on the holder's promise to pay for them Debit card: plastic payment card that provides the cardholder electronic access to his or her bank account
  • Slide 45
  • EXAMPLE DICTIONARY OF TERMS (2) Bank: financial institution that issues financial cards and where the user has at least one account into which he or she can withdraw or deposit money Bank Account: is a financial account between a user and a financial institution User: client of that possesses a debit card and benefits from the use of a point of sale terminal Pin number: personal identification number (PIN, pronounced "pin"; often erroneously PIN number) is a secret numeric password shared between a user and a system that can be used to authenticate the user to the system
  • Slide 46
  • EXAMPLE PROBLEM DOMAIN PinNumber FinancialCard Bank User DebitCardCreditCard BankAccount * 1 1..2 * 1 1..* PosTerminal *
  • Slide 47
  • EXAMPLE MAIN INTERACTIONS Inputs to POS Terminal: Insertion of Financial Card, Pin Number, Specify Account, Confirm purchase Outputs from POS Terminal: Error Message (regarding pin or funds), Confirmation of Purchase,
  • Slide 48
  • SECTION 2 REQUIREMENTS BEHAVIORAL MODELING
  • Slide 49
  • TOPICS Review of some notions regarding requirements Client requirements Functional requirements Non-functional requirements Introduction to Behavioral Modeling Activity Diagrams
  • Slide 50
  • TOPIC 1 REQUIREMENTS
  • Slide 51
  • We will describe three types of requirements: Customer requirements (a.k.a informal or business requirements) Functional requirements Non-functional requirements
  • Slide 52
  • CUSTOMER REQUIREMENTS We have completed the domain analysis, we are ready to get our hands dirty We need to figure out exactly what the customer wants: Customer Requirements This is where the expectations of the customer are captured Composed typically of high level, non-technical statements Example Requirement 1: We need to develop an online customer portal Requirement 2: The portal must list all our products
  • Slide 53
  • FUNCTIONAL REQUIREMENTS Capture the intended behavior of the system May be expressed as services, tasks or functions the system performs Use cases have quickly become a widespread practice for capturing functional requirements This is especially true in the object-oriented community where they originated Their applicability is not limited to object-oriented systems
  • Slide 54
  • USE CASES A use case defines a goal-oriented set of interactions between external actors and the system under consideration Actors are parties outside the system that interact with the system An actor may be a class of users or other systems A use case is initiated by a user with a particular goal in mind, and completes successfully when that goal is satisfied It describes the sequence of interactions between actors and the system necessary to deliver the service that satisfies the goal
  • Slide 55
  • USE CASE DIAGRAMS
  • Slide 56
  • Include relationship: use case fragment that is duplicated in multiple use cases Extend relationship: use case conditionally adds steps to another first class use case Example:
  • Slide 57
  • USE CASE ATM EXAMPLE Actors: ATM Customer ATM Operator Use Cases: The customer can withdraw funds from a checking or savings account query the balance of the account transfer funds from one account to another The ATM operator can Shut down the ATM Replenish the ATM cash dispenser Start the ATM
  • Slide 58
  • USE CASE ATM EXAMPLE
  • Slide 59
  • Validate PIN is an Inclusion Use Case It cannot be executed on its own Must be executed as part of a Concrete Use Case On the other hand, a Concrete Use Case can be executed
  • Slide 60
  • USE CASE VALIDATE PIN (1) Use case name: Validate PIN Summary: System validates customer PIN Actor: ATM Customer Precondition: ATM is idle, displaying a Welcome message.
  • Slide 61
  • USE CASE VALIDATE PIN (2) Main sequence: 1.Customer inserts the ATM card into the card reader. 2.If system recognizes the card, it reads the card number. 3.System prompts customer for PIN. 4.Customer enters PIN. 5.System checks the card's expiration date and whether the card has been reported as lost or stolen. 6.If card is valid, system then checks whether the user- entered PIN matches the card PIN maintained by the system. 7.If PIN numbers match, system checks what accounts are accessible with the ATM card. 8.System displays customer accounts and prompts customer for transaction type: withdrawal, query, or transfer.
  • Slide 62
  • USE CASE VALIDATE PIN (3) Alternative sequences: Step 2: If the system does not recognize the card, the system ejects the card. Step 5: If the system determines that the card date has expired, the system confiscates the card. Step 5: If the system determines that the card has been reported lost or stolen, the system confiscates the card. Step 7: If the customer-entered PIN does not match the PIN number for this card, the system re-prompts for the PIN. Step 7: If the customer enters the incorrect PIN three times, the system confiscates the card. Steps 4-8: If the customer enters Cancel, the system cancels the transaction and ejects the card. Postcondition: Customer PIN has been validated.
  • Slide 63
  • USE CASE WITHDRAW FUNDS (1) Use case name: Withdraw Funds Summary: Customer withdraws a specific amount of funds from a valid bank account. Actor: ATM Customer Dependency: Include Validate PIN use case. Precondition: ATM is idle, displaying a Welcome message.
  • Slide 64
  • USE CASE WITHDRAW FUNDS (2) Main sequence: 1.Include Validate PIN use case. 2.Customer selects Withdrawal, enters the amount, and selects the account number. 3.System checks whether customer has enough funds in the account and whether the daily limit will not be exceeded. 4.If all checks are successful, system authorizes dispensing of cash. 5.System dispenses the cash amount. 6.System prints a receipt showing transaction number, transaction type, amount withdrawn, and account balance. 7.System ejects card. 8.System displays Welcome message.
  • Slide 65
  • USE CASE WITHDRAW FUNDS (3) Alternative sequences: Step 3: If the system determines that the account number is invalid, then it displays an error message and ejects the card. Step 3: If the system determines that there are insufficient funds in the customer's account, then it displays an apology and ejects the card. Step 3: If the system determines that the maximum allowable daily withdrawal amount has been exceeded, it displays an apology and ejects the card. Step 5: If the ATM is out of funds, the system displays an apology, ejects the card, and shuts down the ATM. Postcondition: Customer funds have been withdrawn.
  • Slide 66
  • NON-FUNCTIONAL REQUIREMENTS Functional requirements define what a system is supposed to do Non-functional requirements define how a system is supposed to be Usually describe system attributes such as security, reliability, maintainability, scalability, usability
  • Slide 67
  • NON-FUNCTIONAL REQUIREMENTS Non-Functional requirements can be specified in a separate section of the use case description In the previous example, for the Validate PIN use case, there could be a security requirement that the card number and PIN must be encrypted Non-Functional requirements can be specified for a group of use cases or the whole system Security requirement: System shall encrypt ATM card number and PIN. Performance requirement: System shall respond to actor inputs within 5 seconds.
  • Slide 68
  • TOPIC 2 BEHAVIORAL MODELING
  • Slide 69
  • SOFTWARE MODELING UML defines thirteen basic diagram types, divided into two general sets: Structural Modeling Behavioral Modeling Structural Models define the static architecture of a model They are used to model the things that make up a model the classes, objects, interfaces and physical components In addition they are used to model the relationships and dependencies between elements
  • Slide 70
  • BEHAVIORAL MODELING Behavior Models capture the dynamic behavior of a system as it executes over time They provide a view of a system in which control and sequencing are considered Either within an object (by means of a finite state machine) or between objects (by analysis of object interactions).
  • Slide 71
  • UML ACTIVITY DIAGRAMS In UML an activity diagram is used to display the sequence of actions They show the workflow from start to finish Detail the many decision paths that exist in the progression of events contained in the activity Very useful when parallel processing may occur in the execution of some activities
  • Slide 72
  • UML ACTIVITY DIAGRAMS An example of an activity diagram is shown below (We will come back to that diagram)
  • Slide 73
  • ACTIVITY An activity is the specification of a parameterized sequence of behavior Shown as a round-cornered rectangle enclosing all the actions and control flows
  • Slide 74
  • ACTIONS AND CONSTRAINS An action represents a single step within an activity Constraints can be attached to actions
  • Slide 75
  • CONTROL FLOW Shows the flow of control from one action to the next Its notation is a line with an arrowhead. Initial Node Final Node, two types: Activity Final NodeFlow Final Node
  • Slide 76
  • OBJECTS FLOW An object flow is a path along which objects or data can pass An object is shown as a rectangle A short hand for the above notation
  • Slide 77
  • DECISION AND MERGE NODES Decision nodes and merge nodes have the same notation: a diamond shape The control flows coming away from a decision node will have guard conditions
  • Slide 78
  • FORK AND JOIN NODES Forks and joins have the same notation: either a horizontal or vertical bar They indicate the start and end of concurrent threads of control Join synchronizes two inflows and produces a single outflow The outflow from a join cannot execute until all inflows have been received
  • Slide 79
  • PARTITION Shown as horizontal or vertical swim lane Represents a group of actions that have some common characteristic
  • Slide 80
  • UML ACTIVITY DIAGRAMS Coming back to our initial example
  • Slide 81
  • ISSUE HANDLING IN SOFTWARE PROJECTS Courtesy of uml-diagrams.org
  • Slide 82
  • MORE ON ACTIVITY DIAGRAMS Interruptible Activity Regions Expansion Regions Exception Handlers
  • Slide 83
  • INTERRUPTIBLE ACTIVITY REGION Surrounds a group of actions that can be interrupted Example below: Process Order action will execute until completion, when it will pass control to the Close Order action, unless a Cancel Request interrupt is received, which will pass control to the Cancel Order action.
  • Slide 84
  • EXPANSION REGION An expansion region is an activity region that executes multiple times to consume all elements of an input collection Example of books checkout at a library modeled using an expansion region: Checkout Books Find Books to Borrow Checkout Book Show Due Date Place Books in Bags
  • Slide 85
  • EXPANSION REGION Another example: Encoding Video Encode Video Capture Video Extract Audio from Frame Encode Video Frame Save Encoded Video Attach Audio to Frame
  • Slide 86
  • EXCEPTION HANDLERS An exception handler is an element that specifies what to execute in case the specified exception occurs during the execution of the protected node In Java Try block corresponds to Protected Node Catch block corresponds to the Handler Body Node
  • Slide 87
  • SECTION 3 BEHAVIORAL MODELING
  • Slide 88
  • TOPICS We will continue with the subject of Behavioral Modeling Introduce the various components of UML state machines
  • Slide 89
  • ACTIVITY DIAGRAMS VS STATE MACHINES In Activity Diagrams Vertices represent Actions Edges (arrows) represent transition that occurs at the completion of one action and before the start of another one (control flow) Vertex representing an Action Arrow implying transition from one action to another
  • Slide 90
  • ACTIVITY DIAGRAMS VS STATE MACHINES In State Machines Vertices represent states of a process Edges (arrows) represent occurrences of events Vertex representing a State Arrow representing an event
  • Slide 91
  • UML STATE MACHINES Used to model the dynamic behaviour of a process Can be used to model a high level behaviour of an entire system Can be used to model the detailed behaviour of a single object All other possible levels of detail in between these extremes is also possible
  • Slide 92
  • UML STATE MACHINE EXAMPLE Example of a garage door state machine (We will come back to this example later)
  • Slide 93
  • STATES Symbol for a state A system in a state will remain in it until the occurrence of an event that will cause it to transition to another one Being in a state means that a system will behave in a predetermined way in response to a given event Symbols for the initial and final states
  • Slide 94
  • STATES Numerous types of events can cause the system to transition from one state to another In every state, the system behaves in a different matter Names for states are usually chosen as: Adjectives: open, closed, ready Present continuous verbs: opening, closing, waiting
  • Slide 95
  • TRANSITIONS Transitions are represented with arrows
  • Slide 96
  • TRANSITIONS Transitions represent a change in a state in response to an event Theoretically, it is supposed to occur in a instantaneous manner (it does not take time to execute) A transition can have Trigger: causes the transition; can be an event of simply the passage of time Guard: a condition that must evaluate to true for the transition to occur Effect: an action that will be invoked directly on the system of the object being modeled (if we are modeling an object, the effect would correspond to a specific method)
  • Slide 97
  • STATE ACTIONS An effect can also be associated with a state If a destination state is associated with numerous incident transitions (transitions arriving a that state), and every transition defines the same effect: The effect can therefore be associated with the state instead of the transitions (avoid duplications) This can be achieved using an On Entry effect (we can have multiple entry effects) We can also add one or more On Exit effect
  • Slide 98
  • SELF TRANSITION State can also have self transitions These self transition are more useful when they have an effect associated with them Timer events are usually popular with self transitions Below is a typical example:
  • Slide 99
  • COMING BACK TO OUR INITIAL EXAMPLE Example of a garage door state machine
  • Slide 100
  • DECISIONS Just like activity diagrams, we can use decisions nodes (although we usually call them decision pseudo-states) Decision pseudo-states are represented with a diamond We always have one input transition and multiple outputs The branch of execution is decided by the guards associated with the transitions coming out of the decision pseudo-state
  • Slide 101
  • DECISIONS
  • Slide 102
  • COMPOUND STATES A state machine can include several sub-machines Below is an example of a sub-machine included in the compound state Connected Connected Waiting ProcessingByte receiveByte byteProcessed disconnect Disconnected connect closeSession
  • Slide 103
  • COMPOUND STATES EXAMPLE
  • Slide 104
  • Same example, with an alternative notation The link symbol in the Check Pin state indicates that the details of the sub-machine associated with Check Pin are specified in an another state machine
  • Slide 105
  • ALTERNATIVE ENTRY POINTS Sometimes, in a sub-machine, we do not want to start the execution from the initial state We want to start the execution from a name alternative entry point PerformActivity
  • Slide 106
  • ALTERNATIVE ENTRY POINTS Heres the same system, from a higher level Transition from the No Already Initialized state leads to the standard initial state in the sub-machine Transition from the Already Initialized state is connected to the named alternative entry point Skip Initializing
  • Slide 107
  • ALTERNATIVE EXIT POINTS It is also possible to have alternative exit points for a compound state Transition from Processing Instructions state takes the regular exit Transition from the Reading Instructions state takes an "alternative named exit point
  • Slide 108
  • USE CASE VALIDATE PIN (1) Use case name: Validate PIN Summary: System validates customer PIN Actor: ATM Customer Precondition: ATM is idle, displaying a Welcome message.
  • Slide 109
  • USE CASE VALIDATE PIN (2) Main sequence: 1.Customer inserts the ATM card into the card reader. 2.If system recognizes the card, it reads the card number. 3.System prompts customer for PIN. 4.Customer enters PIN. 5.System checks the card's expiration date and whether the card has been reported as lost or stolen. 6.If card is valid, system then checks whether the user- entered PIN matches the card PIN maintained by the system. 7.If PIN numbers match, system checks what accounts are accessible with the ATM card. 8.System displays customer accounts and prompts customer for transaction type: withdrawal, query, or transfer.
  • Slide 110
  • USE CASE VALIDATE PIN (3) Alternative sequences: Step 2: If the system does not recognize the card, the system ejects the card. Step 5: If the system determines that the card date has expired, the system confiscates the card. Step 5: If the system determines that the card has been reported lost or stolen, the system confiscates the card. Step 7: If the customer-entered PIN does not match the PIN number for this card, the system re-prompts for the PIN. Step 7: If the customer enters the incorrect PIN three times, the system confiscates the card. Steps 4-8: If the customer enters Cancel, the system cancels the transaction and ejects the card. Postcondition: Customer PIN has been validated.
  • Slide 111
  • ATM MACHINE EXAMPLE Validate PIN:
  • Slide 112
  • ATM MACHINE EXAMPLE Funds withdrawal:
  • Slide 113
  • SECTION 4 BEHAVIORAL MODELING
  • Slide 114
  • TOPICS We will continue to talk about UML State Machine We will go through a complete example of a simple software construction case study with emphasis on UML State Machines End this section with some final words of wisdom!
  • Slide 115
  • LAST LECTURE We have talked about UML State Machines States and transitions State effects Self Transition Decision pseudo-states Compound states Alternative entry and exit points Today, we will tackle more advanced UML State Machines Concepts
  • Slide 116
  • HISTORY STATES A state machine describes the dynamic aspects of a process whose current behavior depends on its past A state machine in effect specifies the legal ordering of states a process may go through during its lifetime When a transition enters a compound state, the action of the nested state machine starts over again at its initial state Unless an alternative entry point is specified There are times you'd like to model a process so that it remembers the last substate that was active prior to leaving the compound state
  • Slide 117
  • HISTORY STATES Simple washing machine state diagram: Power Cut event: transition to the Power Off state Restore Power event: transition to the active state before the power was cut off to proceed in the cycle
  • Slide 118
  • CONCURRENT REGIONS Sequential sub state machines are the most common kind of sub machines In certain modeling situations, concurrent sub machines might be needed (two or more sub state machines executing in parallel) Brakes example:
  • Slide 119
  • CONCURRENT REGIONS Example of modeling system maintenance using concurrent regions Idle Maintenance Testing devices Self diagnosing Waiting Processing Command Testing Commanding command commandProcessed [continue] commandProcessed [not continue] maintain diagnosisCompleted testingCompleted shutDown
  • Slide 120
  • ORTHOGONAL REGIONS Concurrent Regions are also called Orthogonal Regions These regions allow us to model a relationship of And between states (as opposed to the default or relationship) This means that in a sub state machine, the system can be in several states simultaneously Let us analyse this phenomenon using an example of computer keyboard state machine
  • Slide 121
  • KEYBOARD EXAMPLE (1) Keyboard example without Orthogonal Regions
  • Slide 122
  • KEYBOARD EXAMPLE (2) Keyboard example with Orthogonal Regions
  • Slide 123
  • GARAGE DOOR CASE STUDY Background Company DOORS inc. manufactures garage door components Nonetheless, they have been struggling with the embedded software running on their automated garage opener Motor Unit that they developed in house This is causing them to loose business They decided to scrap the existing software and hire a professional software company to deliver bug free software
  • Slide 124
  • CLIENT REQUIREMENTS Client (informal) requirements: Requirement 1: When the garage door is closed, it must open whenever the user presses on the button of the wall mounted door control or the remote control Requirement 2: When the garage door is open, it must close whenever the user presses on the button of the wall mounted door control or the remote control Requirement 3: The garage door should not close on an obstacle Requirement 4: There should be a way to leave the garage door half open Requirement 5: System should run a self diagnosis test before performing any command (open or close) to make sure all components are functional
  • Slide 125
  • CLIENT REQUIREMENTS Motor Unit (includes a microcontroller where the software will be running) Wall Mounted Controller (a remote controller is also supported) Sensor Unit(s) (detects obstacles, when the door is fully open and when it is fully closed)
  • Slide 126
  • USE CASE DIAGRAM Open Door Close Door Run Diagnosis Use Case Diagram include Garage Door User Garage Door System
  • Slide 127
  • RUN DIAGNOSIS USE CASE Use Case Name: Run Diagnosis Summary: The system runs a self diagnosis procedure Actor: Garage door user Pre-Condition: User has pressed the remote or wall mounted control button Sequence: 1.Check if the sensor is operating correctly 2.Check if the motor unit is operating correctly 3.If all checks are successful, system authorizes the command to be executed Alternative Sequence: Step 3: One of the checks fails and therefore the system does not authorize the execution of the command Postcondition: Self diagnosis ensured that the system is operational
  • Slide 128
  • OPEN DOOR USE CASE Use Case Name: Open Door Summary: Open the garage the door Actor: Garage door user Dependency: Include Run Diagnosis use case Pre-Condition: Garage door system is operational and ready to take a command Sequence: 1.User presses the remote or wall mounted control button 2.Include Run Diagnosis use case 3.If the door is currently closing or is already closed, system opens the door Alternative Sequence: Step 3: If the door is open, system closes door Step 3: If the door is currently opening, system stops the door (leaving it half open) Postcondition: Garage door is open
  • Slide 129
  • CLOSE DOOR USE CASE Use Case Name: Close Door Summary: Close the garage the door Actor: Garage door user Dependency: Include Run Diagnosis use case Pre-Condition: Garage door system is operational and ready to take a command Sequence: 1.User presses the remote or wall mounted control button 2.Include Run Diagnosis use case 3.If the door is currently open, system closes the door Alternative Sequence: Step 3: If the door is currently closing or is already closed, system opens the door Step 3: If the door is currently opening, system stops the door (leaving it half open) Postcondition: Garage door is closed
  • Slide 130
  • HIGH LEVEL BEHAVIORAL MODELING
  • Slide 131
  • HIGH LEVEL STRUCTURAL MODEL
  • Slide 132
  • REFINED STRUCTURAL MODEL
  • Slide 133
  • REFINE BEHAVIORAL MODEL MOTOR UNIT buttonPressed(), obstacleDetected() [isFunctioning()] Open Closing Closed Opening HalfOpen buttonPressed() buttonPressed() [isFunctioning()] doorOpen() doorClosed() buttonPressed() [isFunctioning()] Running WaitingForRepair buttonPressed(), [! isFunctioning()] Timer (180 s) [! isFunctioning()] Timer (180 s) [isFunctioning()]
  • Slide 134
  • REFINE BEHAVIORAL MODEL SENSOR UNIT CheckingForObstacles CheckingIfDoorOpen CheckingIfDoorClosed [!isObstacleDetected()] [isObstacleDetected()] [!isDoorOpen()] [isDoorOpen()] Sleeping [!isDoorClosed()] [isDoorClosed()] Time (20 ms) SendingObstacleEvent SendingOpenDoorEvent SendingDoorClosedEvent
  • Slide 135
  • DO NOT FALL ASLEEP YET!
  • Slide 136
  • CODING Whenever we are satisfied with the level of detail in our behavioral models, we can proceed to coding Some of the code can be generated directly by tools from the behavioral model Some tweaking might be necessary (do not use the code blindly) Humans are still the smartest programmers
  • Slide 137
  • EVENT GENERATOR CLASS
  • Slide 138
  • SENSOR CLASS
  • Slide 139
  • Sensor State machine Implementation
  • Slide 140
  • UMPLE ONLINE DEMO UMPLE is a modeling tool to enable what we call Model- Oriented Programming This is what we do in this course You can use it to create class diagrams (structural models) and state machines (behavioral models) The tool was developed at the university of Ottawa Online version can be found at: http://cruise.eecs.uottawa.ca/umpleonline/ Theres also an eclipse plugin for the tool
  • Slide 141
  • UMPLE CODE FOR MOTOR UNIT STATE MACHINE class Motor { status { Running { Open {buttonPressed[isFunctioning()]->Closing; } Closing { buttonPressed()[isFunctioning()]->Opening; ObstacleDetected()[isFunctioning()]->Opening; doorClosed()->Closed;} Closed { buttonPressed()[isFunctioning()]->Opening; } Opening { buttonPressed()->HalfOpen; doorOpen()->Open; } HalfOpen{buttonPressed()->Opening;} buttonPressed()[!isFunctioning()]->WaitingForRepair; } WaitingForRepair{ timer()[!isFunctioning()]->WaitingForRepair; timer()[isFunctioning()]->Running;} }
  • Slide 142
  • MOTOR CLASS SNIPPETS Switching between high level states Switching between nest states inside the Running compound state
  • Slide 143
  • WHEN TO USE STATE MACHINES? When an object or a system progresses through various stages of execution (states) The behavior of the system differs from one stage to another When you can identify clear events that change the status of the system They are ideal for event driven programming (less loops and branches, more events generated and exchanged) Lots of event are being exchanged between objects When using even driven programming Make sure you follow Observable or Event Notifier patterns Both are pretty simple (similar to what we have done for the garage door example)
  • Slide 144
  • BEHAVIORAL OVER- MODELING Please model responsibly!! Do not get carried out with modeling every single detail to the point where you run behind schedule You sell code, not models
  • Slide 145
  • BEHAVIORAL OVER- MODELING Now, be careful, you do not want over-model Modern software development processes are all about only doing just enough modeling for a successful product Therefore, start with a high level model of the behavior This model should give a clear overview of some (not necessary all) of the important functionality of the system This would be similar to the first garage door state machine we created
  • Slide 146
  • BEHAVIORAL OVER- MODELING Identify potential complex areas that require further understanding We minimize the risk if we understand these components well before we start programing Model these complex areas in more details until you are satisfied that they are well understood Use tools to generate code from your existing models Do not rely blindly on tools (at least not yet!)
  • Slide 147
  • DESIGNING CLASSES WITH STATE DIAGRAMS Keep the state diagram simple State diagrams can very quickly become extremely complex and confusing At all time, you should follow the aesthetic rule: Less is More If the state diagram gets too complex consider splitting it into smaller classes Think about compound states instead of a flat design
  • Slide 148
  • EXAMPLE OF A CD PLAYER WITH A RADIO On Displaying Current Time Displaying Alarm Time Display Alarm Timer (3 s) Playing Radio Playing CD off Off On H Play CD Play Radio
  • Slide 149
  • MORE UML STATE MACHINES EXAMPLES Flight State Machine
  • Slide 150
  • MORE UML STATE MACHINES EXAMPLES Flight State Machine Nested
  • Slide 151
  • SECTION 5 PETRI NETS THESE SLIDES ARE BASED ON LECTURE NOTES FROM: DR. CHRIS LING (HTTP://WWW.CSSE.MONASH.EDU.AU/~SLING/) 151 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 152
  • TOPICS Today we will discuss another type of state machine: Petri nets (this will be just an introduction) This will be the last behavioral modeling topic we cover We will start the next section of the course next week 152 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 153
  • OK, LETS START 153 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 154
  • INTRODUCTION First introduced by Carl Adam Petri in 1962. A diagrammatic tool to model concurrency and synchronization in systems They allow us to quickly simulate complex concurrent behavior (which is faster than prototyping!) Fairly similar to UML State machines that we have seen so far Used as a visual communication aid to model the system behavior Based on strong mathematical foundation 154 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 155
  • EXAMPLE: POS TERMINAL (UML STATE MACHINE) 155 (POS= Point of Sale) SEG2106 Winter 2014 Hussein Al Osman idled1d2 OK pressed 1 digit d3 d4 OK approve Approved Rejected OK Reject
  • Slide 156
  • EXAMPLE: POS TERMINAL (PETRI NET) Initial 1 digit d1d2d3 d4 OK pressed approve approved OK Reject Rejected! 156 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 157
  • POS TERMINAL Scenario 1: Normal Enters all 4 digits and press OK. Scenario 2: Exceptional Enters only 3 digits and press OK. 157 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 158
  • EXAMPLE: POS SYSTEM (TOKEN GAMES) Initial 1 digit d1d2d3 d4 OK pressed approve approved OK Reject Rejected! 158 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 159
  • A PETRI NET COMPONENTS The terms are bit different than UML state machines Petri nets consist of three types of components: places (circles), transitions (rectangles) and arcs (arrows): Places represent possible states of the system Transitions are events or actions which cause the change of state (be careful, transitions are no longer arrows here) Every arc simply connects a place with a transition or a transition with a place. SEG2106 Winter 2014 Hussein Al Osman 159
  • Slide 160
  • CHANGE OF STATE A change of state is denoted by a movement of token(s) (black dots) from place(s) to place(s) Is caused by the firing of a transition. The firing represents an occurrence of the event or an action taken The firing is subject to the input conditions, denoted by token availability SEG2106 Winter 2014 Hussein Al Osman 160
  • Slide 161
  • CHANGE OF STATE A transition is firable or enabled when there are sufficient tokens in its input places. After firing, tokens will be transferred from the input places (old state) to the output places, denoting the new state 161 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 162
  • EXAMPLE: VENDING MACHINE The machine dispenses two kinds of snack bars 20c and 15c Only two types of coins can be used 10c coins and 5c coins (ah the old days!!) The machine does not return any change SEG2106 Winter 2014 Hussein Al Osman 162
  • Slide 163
  • EXAMPLE: VENDING MACHINE (UML STATE MACHINE) SEG2106 Winter 2014 Hussein Al Osman 163 0 cent inserted 5 cents inserted 10 cents inserted 15 cents inserted 20 cents inserted Deposit 5c Deposit 10c Deposit 5c Take 20c snack bar Take 15c snack bar
  • Slide 164
  • EXAMPLE: VENDING MACHINE (A PETRI NET) SEG2106 Winter 2014 Hussein Al Osman 164 5c Take 15c bar Deposit 5c 0c Deposit 10c Deposit 5c 10c Deposit 10c Deposit 5c Deposit 10c 20c Deposit 5c 15c Take 20c bar
  • Slide 165
  • EXAMPLE: VENDING MACHINE (3 SCENARIOS) Scenario 1: Deposit 5c, deposit 5c, deposit 5c, deposit 5c, take 20c snack bar. Scenario 2: Deposit 10c, deposit 5c, take 15c snack bar. Scenario 3: Deposit 5c, deposit 10c, deposit 5c, take 20c snack bar. 165 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 166
  • EXAMPLE: VENDING MACHINE (TOKEN GAMES) SEG2106 Winter 2014 Hussein Al Osman 166 5c Take 15c bar Deposit 5c 0c Deposit 10c Deposit 5c 10c Deposit 10c Deposit 5c Deposit 10c 20c Deposit 5c 15c Take 20c bar
  • Slide 167
  • MULTIPLE LOCAL STATES In the real world, events happen at the same time A system may have many local states to form a global state. There is a need to model concurrency and synchronization 167 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 168
  • EXAMPLE: IN A RESTAURANT (A PETRI NET) SEG2106 Winter 2014 Hussein Al Osman 168 Waiter free Customer 1 Customer 2 Take order Take order Order taken Tell kitchen wait Serve food eating
  • Slide 169
  • EXAMPLE: IN A RESTAURANT (TWO SCENARIOS) Scenario 1: Waiter 1.Takes order from customer 1 2.Serves customer 1 3.Takes order from customer 2 4.Serves customer 2 Scenario 2: Waiter 1.Takes order from customer 1 2.Takes order from customer 2 3.Serves customer 2 4.Serves customer 1 169 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 170
  • EXAMPLE: IN A RESTAURANT (SCENARIO 2) Waiter free Customer 1 Customer 2 Take order Take order Order taken Tell kitchen wait Serve food eating 170 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 171
  • EXAMPLE: IN A RESTAURANT (SCENARIO 1) Waiter free Customer 1 Customer 2 Take order Take order Order taken Tell kitchen wait Serve food eating 171 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 172
  • NET STRUCTURES A sequence of events/actions: Concurrent executions: e1 e2e3 e1 e2 e3 e4 e5 172 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 173
  • NET STRUCTURES Non-deterministic events - conflict, choice or decision: A choice of either e1, e2 or e3, e4... e1e2 e3e4 173 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 174
  • NET STRUCTURES Synchronization e1 174 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 175
  • NET STRUCTURES Synchronization and Concurrency e1 175 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 176
  • ANOTHER EXAMPLE A producer-consumer system, consist of: One producer Two consumers One storage buffer With the following conditions: The storage buffer may contain at most 5 items; The producer sends 3 items in each production; At most one consumer is able to access the storage buffer at one time; Each consumer removes two items when accessing the storage buffer SEG2106 Winter 2014 Hussein Al Osman 176
  • Slide 177
  • A PRODUCER- CONSUMER SYSTEM ready p1 t1 produce idle send p2 t2 k=1 k=5 Storage p3 32 t3t4 p4 p5 k=2 accept accepted consume ready ProducerConsumers 177 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 178
  • A PRODUCER-CONSUMER EXAMPLE In this Petri net, every place has a capacity and every arc has a weight. This allows multiple tokens to reside in a place to model more complex behavior. SEG2106 Winter 2014 Hussein Al Osman 178
  • Slide 179
  • SHORT BREAK? SEG2106 Winter 2014 Hussein Al Osman 179 Are you here yet?
  • Slide 180
  • BEHAVIORAL PROPERTIES Reachability Can we reach one particular state from another? Boundedness Will a storage place overflow? Liveness Will the system die in a particular state? SEG2106 Winter 2014 Hussein Al Osman 180
  • Slide 181
  • RECALLING THE VENDING MACHINE (TOKEN GAME) SEG2106 Winter 2014 Hussein Al Osman 181 5c Take 15c bar Deposit 5c 0c Deposit 10c Deposit 5c 10c Deposit 10c Deposit 5c Deposit 10c 20c Deposit 5c 15c Take 20c bar
  • Slide 182
  • A MARKING IS A STATE... SEG2106 Winter 2014 Hussein Al Osman 182 t8 t1 p1 t2 p2 t3 p3 t4 t5 t6 p5 t7t7 p4 t9 M0 = (1,0,0,0,0) M1 = (0,1,0,0,0) M2 = (0,0,1,0,0) M3 = (0,0,0,1,0) M4 = (0,0,0,0,1) Initial marking:M0
  • Slide 183
  • REACHABILITY SEG2106 Winter 2014 Hussein Al Osman 183 t8 t1 p1 t2 p2 t3 p3 t4 t5 t6 p5 t7t7 p4 t9 Initial marking:M0 M0 M1M2M3M0M2M4 t3t1t5t8t2t6 M0 = (1,0,0,0,0) M1 = (0,1,0,0,0) M2 = (0,0,1,0,0) M3 = (0,0,0,1,0) M4 = (0,0,0,0,1)
  • Slide 184
  • REACHABILITY M2 is reachable from M1 and M4 is reachable from M0. In fact, in the vending machine example, all markings are reachable from every marking. SEG2106 Winter 2014 Hussein Al Osman 184 M0 M1M2M3M0M2M4 t3t1t5t8t2t6 A firing or occurrence sequence :
  • Slide 185
  • BOUNDEDNESS A Petri net is said to be k-bounded or simply bounded if the number of tokens in each place does not exceed a finite number k for any marking reachable from M0. The Petri net for vending machine is 1-bounded. 185 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 186
  • LIVENESS A Petri net with initial marking M0 is live if, no matter what marking has been reached from M0, it is possible to ultimately fire any transition by progressing through some further firing sequence. A live Petri net guarantees deadlock-free operation, no matter what firing sequence is chosen. 186 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 187
  • LIVENESS The vending machine is live and the producer-consumer system is also live. A transition is dead if it can never be fired in any firing sequence. 187 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 188
  • AN EXAMPLE SEG2106 Winter 2014 Hussein Al Osman 188 A bounded but non-live Petri net p1 p2 p3 p4 t1 t2 t3t4 M0 = (1,0,0,1) M1 = (0,1,0,1) M2 = (0,0,1,0) M3 = (0,0,0,1)
  • Slide 189
  • ANOTHER EXAMPLE p1 t1 p2p3 t2t3 p4 p5 t4 An unbounded but live Petri net M0 = (1, 0, 0, 0, 0) M1 = (0, 1, 1, 0, 0) M2 = (0, 0, 0, 1, 1) M3 = (1, 1, 0, 0, 0) M4 = (0, 2, 1, 0, 0) 189 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 190
  • OTHER TYPES OF PETRI NETS Object-Oriented Petri nets Tokens can either be instances of classes, or states of objects. Net structure models the inner behaviour of objects. SEG2106 Winter 2014 Hussein Al Osman 190
  • Slide 191
  • AN O-O PETRI NET ready produce Storage accepted consume ready ProducerConsumer send accept Producer state: ProducerState Item produce( ) send(i: Item): void Consumer state: ConsumerState accept( i: Item): void consume(i: Item) : void 191 SEG2106 Winter 2014 Hussein Al Osman
  • Slide 192
  • PETRI NET REFERENCES Murata, T. (1989, April). Petri nets: properties, analysis and applications. Proceedings of the IEEE, 77(4), 541-80. Peterson, J.L. (1981). Petri Net Theory and the Modeling of Systems. Prentice-Hall. Reisig, W and G. Rozenberg (eds) (1998). Lectures on Petri Nets 1: Basic Models. Springer-Verlag. The World of Petri nets: http://www.daimi.au.dk/PetriNets/ SEG2106 Winter 2014 Hussein Al Osman 192
  • Slide 193
  • SECTION 6 INTRODUCTION TO COMPILERS
  • Slide 194
  • TOPICS Natural languages Lexemes or lexical entities Syntax and semantics Computer languages Lexical analysis Syntax analysis Semantic analysis Compilers Compilers basic requirements Compilation process
  • Slide 195
  • NATURAL LANGUAGES BASICS In a (natural) language: A sentence is a sequence of words A word (also called lexemes of lexical units) is a sequence of characters (possibly a single one) The set of characters used in a language is finite (know as the alphabet) The set of possible sentences in a language is infinite A dictionary lists all the words (lexemes) of a language The words are classified into different lexical categories: verb, noun, pronoun, preposition.
  • Slide 196
  • NATURAL LANGUAGES BASICS A grammar (also considered the set of syntax rules) to determine which sequences of words are well formed Sequences must have a structure that obeys the grammatical rules Well formed sentences, usually have a meaning that humans understand We are trying to teach our natural languages to machines With mixed results!!
  • Slide 197
  • ANALYSIS OF SENTENCES Lexical Analysis: identification of words made up of characters Words are classified into several categories: articles, nouns, verbs, adjectives, prepositions, pronouns Syntax analysis: rules for combining words to form sentences Analysis of meaning: difficult to formalize Easily done by humans Gives machines a hard time (although natural language processing is evolving) Big research field for those interested in graduate studies
  • Slide 198
  • COMPUTER LANGUAGE PROCESSING In computer (or programming) languages, one speaks about a program (corresponding to a long sentence or paragraph) Sequence of lexical units or lexemes Lexical units are sequences of characters Lexical rules of the language determine what the valid lexical units of the language are There are various lexical categories: identifier, number, character string, operator Lexical categories are also known as tokens
  • Slide 199
  • COMPUTER LANGUAGE PROCESSING Syntax rules of the language determine what sequences of lexemes are well-formed programs Meaning of a well-formed program is also called its semantics A program can be well-formed, but its statements are nonsensical Example: int x = 0; x = 1; x = 0; Syntactically, the above code is valid, but what does it mean??
  • Slide 200
  • COMPUTER LANGUAGE PROCESSING Compilers should catch and complain about lexical and syntax errors Compilers might complain about common semantic errors: public boolean test (int x){ boolean result; if (x > 100) result = true; return result; } Your coworkers or the client will complain about the rest!! Error message: The local variable result may have not been initialized
  • Slide 201
  • COMPILERS What is a compiler? Program that translates an executable program in one language into an executable program in another language We expect the program produced by the compiler to be better, in some way, than the original What is an interpreter? Program that reads an executable program and produces the results of running that program We will focus on compilers in this course (although many of the concepts apply to both)
  • Slide 202
  • BASIC REQUIREMENTS FOR COMPILERS Must-Dos: Produce correct code (byte code in the case of Java) Run fast Output must run fast Achieve a compile time proportional to the size of the program Work well with debuggers (absolute must) Must-Haves: Good diagnostics for lexical and syntax errors Support for cross language calls (checkout Java Native Interface if you are interested)
  • Slide 203
  • ABSTRACT VIEW OF COMPILERS A compiler usually realizes the translation in several steps; correspondingly, it contains several components. Usually, a compiler includes (at least) separate components for verifying the lexical and syntax rules:
  • Slide 204
  • COMPILATION PROCESS Machine Code Source Program Lexical AnalyserSyntax AnalyserSemantic AnalyserIntermediate Code GeneratorCode OptimizerCode Generator
  • Slide 205
  • COMPILATION PROCESS More than one course is required to cover the details of the various phases In this course, we will scratch the surface We will focus on lexical and syntax analysis
  • Slide 206
  • SOME IMPORTANT DEFINITIONS These definitions, although sleep inducing, are important in order to understand the concepts that will be introduced in the next lectures So here we go
  • Slide 207
  • ALPHABET Recall from beginning of the lecture (or kindergarten): an alphabet is the set of characters that can be used to form a sentence Since mathematicians love fancy Greek symbols, we will refer to an alphabet as
  • Slide 208
  • ALPHABET is an alphabet, or set of terminals Finite set and consists of all the input characters or symbols that can be arranged to form sentences in the language English: A to Z, punctuation and space symbols Programming language: usually some well-defined computer set such as ASCII
  • Slide 209
  • STRINGS OF TERMINALS IN AN ALPHABET ={a,b,c,d} Possible strings of terminals from include aaa aabbccdd d cba abab ccccccccccacccc Although this is fun, I think you get the idea
  • Slide 210
  • FORMAL LANGUAGES : alphabet, it is a finite set consisting of all input characters or symbols * : closure of the alphabet, the set of all possible strings in , including the empty string A (formal) language is some specified subset of *
  • Slide 211
  • SECTION 7 LEXICAL ANALYSIS
  • Slide 212
  • TOPICS The role of the lexical analyzer Specification of tokens Finite state machines From a regular expressions to an NFA
  • Slide 213
  • THE ROLE OF LEXICAL ANALYZER Lexical analyzer is the first phase of a compiler Task: read input characters and produce a sequence of tokens that the parser uses for syntax analysis Remove white spaces Lexical Analyser (scanner) Syntax Analyser (parser) token Get next token Source Program
  • Slide 214
  • LEXICAL ANALYSIS There are several reasons for separating the analysis phase of compiling into lexical analysis and syntax analysis (parsing): Simpler (layered) design Compiler efficiency Specialized tools have been designed to help automate the construction of both separately
  • Slide 215
  • LEXEMES Lexeme: sequence of characters in the source program that is matched by the pattern for a token A lexeme is a basic lexical unit of a language Lexemes of a programming language include its Identifiers: names of variables, methods, classes, packages and interfaces Literals: fixed values (e.g. 1, 17.56, 0xFFE ) Operators: for Maths, Boolean and logical operations (e.g. +, -, &&, | ) Special words: keywords (e.g. if, for, public )
  • Slide 216
  • TOKENS, PATTERNS, LEXEMES Token: category of lexemes A pattern is a rule describing the set of lexemes that can represent as particular token in source program
  • Slide 217
  • EXAMPLES OF TOKENS double pi = 3.1416; The substring pi is a lexeme for the token identifier. TokenSample Lexemes Informal Description of Pattern typedouble if booelan_operator, >= or >= idpi, count, d2Letter followed by letters and digits literal3.1414, testAny alpha numeric string of characters
  • Slide 218
  • LEXEME AND TOKEN (MORE DETAILED CATEGORIES) Index = 2 * count +17; semicolon; int_literal17 plus_op+ variableCount multi_op* int_literal2 equal_sign= variableIndex TokensLexemes
  • Slide 219
  • LEXICAL ERRORS Few errors are discernible at the lexical level alone Lexical analyzer has a very localized view of a source program Let some other phase of compiler handle any error
  • Slide 220
  • SPECIFICATION OF TOKENS We need a powerful notation to specify the patterns for the tokens Regular expressions to the rescue!! In process of studying regular expressions, we will discuss: Operation on languages Regular definitions Notational shorthands
  • Slide 221
  • RECALL: LANGUAGES : alphabet, it is a finite set consisting of all input characters or symbols * : closure of the alphabet, the set of all possible strings in , including the empty string A (formal) language is some specified subset of *
  • Slide 222
  • OPERATIONS ON LANGUAGES
  • Slide 223
  • Non-mathematical format: Union between languages L and M: the set of strings that belong to at least one of both languages Concatenation of languages L and M: the set of all strings of the form st where s is a string from L and t is a string from M Intersection between languages L and M: the set of all strings which are contained in both languages Kleene closure (named after Stephen Kleene): the set of all strings that are concatenations of 0 or more strings from the original language Positive closure : the set of all strings that are concatenations of 1 or more strings from the original language
  • Slide 224
  • REGULAR EXPRESSIONS Regular expression is a compact notation for describing string. In Java, an identifier is a letter followed by zero or more letter or digits letter (letter | digit)* | : or * : zero or more instance of
  • Slide 225
  • RULES is a regular expression that denotes { }, the set containing empty string If a is a symbol in , then a is a regular expression that denotes {a}, the set containing the string a Suppose r and s are regular expressions denoting the language L and M, then (r) |(s) is a regular expression denoting L M. (r)(s) is regular expression denoting LM (r) * is a regular expression denoting (L)*.
  • Slide 226
  • PRECEDENCE CONVENTIONS The unary operator * has the highest precedence and is left associative. Concatenation has the second highest precedence and is left associative. | has the lowest precedence and is left associative. (a)|(b)*(c) a|b*c
  • Slide 227
  • EXAMPLE OF REGULAR EXPRESSIONS
  • Slide 228
  • PROPERTIES OF REGULAR EXPRESSION
  • Slide 229
  • REGULAR DEFINITIONS If is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: d 1 r 1 d 2 r 2... d n r n Where each d i is a distinct name, and each r i is a regular expression over the symbols in {d 1,d 2,,d i-1 }, i.e., the basic symbols and the previously defined names.
  • Slide 230
  • EXAMPLE OF REGULAR DEFINITIONS
  • Slide 231
  • NOTATIONAL SHORTHANDS Certain constructs occur so frequently in regular expressions that it is convenient to introduce notational short hands for them We have already seen some of these short hands: 1.One or more instances: a+ denotes the set of all strings of one or more as 2.Zero or more instances: a* denotes all the strings of zero or more as 3.Character classes: the notation [abc] where a, b and c denotes the regular expresssion a | b | c 4.Abbreviated character classes: the notation [a-z] denotes the regular expression a | b | . | z
  • Slide 232
  • NOTATIONAL SHORTHANDS Using character classes, we can describe identifiers as being strings described by the following regular expression: [A-Za-z][A-Za-z0-9]*
  • Slide 233
  • FINITE STATE AUTOMATA Now that we have learned about regular expressions How can we tell if a string (or lexeme) follows a regular expression pattern or not? We will again use state machines! This time, they are not UML state machines or petri nets We will call them: Finite Automata The program that executes such state machines is called a Recognizer
  • Slide 234
  • SHORT BREAK
  • Slide 235
  • FINITE AUTOMATA A recognizer for a language is a program that takes as input a string x and answers Yes if x is a lexem of the language No otherwise We compile a regular expression into a recognizer by constructing a generalized transition diagram called a finite automaton A finite automaton can be deterministic or nondeterministic Nondeterministic means that more than one transition out of a state may be possible on the same input symbol
  • Slide 236
  • NONDETERMINISTIC FINITE AUTOMATA (NFA) A set of states S A set of input symbols that belong to alphabet A set of transitions that are triggered by the processing of a character A single state s 0 that is distinguished as the start (initial) state A set of states F distinguished as accepting (final) states.
  • Slide 237
  • EXAMPLE OF AN NFA The following regular expression (a|b)*abb Can be described using an NFA with the following diagram:
  • Slide 238
  • EXAMPLE OF AN NFA The previous diagram can be described using the following table as well Remember the regular expression was: (a|b)*abb
  • Slide 239
  • ANOTHER NFA EXAMPLE NFA accepting the following regular expression: aa*|bb*
  • Slide 240
  • DETERMINISTIC FINITE AUTOMATA (DFA) A DFA is a special case of a NFA in which No state has an -transition For each state s and input symbol a, there is at most one edge labeled a leaving s
  • Slide 241
  • ANOTHER DFA EXAMPLE For the same regular expression we have seen before (a|b)*abb
  • Slide 242
  • NFA VS DFA Always with the regular expression: (a|b)*abb NFA: DFA:
  • Slide 243
  • EXAMPLE OF A DFA Recognizer for identifier:
  • Slide 244
  • TABLES FOR THE RECOGNIZER To change regular expression, we can simply change tables
  • Slide 245
  • CODE FOR THE RECOGNIZER
  • Slide 246
  • SECTION 8 FINITE STATE AUTOMATA
  • Slide 247
  • TOPICS Algorithm to create NFAs from regular expressions Algorithm to convert from NFA to DFA Algorithm to minimize DFA Many examples.
  • Slide 248
  • CREATING DETERMINISTIC FINITE AUTOMATA (DFA) In order to create a DFA, we have to perform the following: Create a Non-deterministic Finite Automata (NFA) out of the regular expression Convert the NFA into a DFA
  • Slide 249
  • NFA CREATION RULES A | B AB A* 12 A B 3 23 41 A A B 23 45 16
  • Slide 250
  • NFA CREATION EXAMPLES x | yz According to precedence rules, this is equivalent to: x | (yz) This has the same form as A | B: And B can be represented as: Putting all together: 12 y z 3 16 A B 23 45 y z 1 x 23 46 5 7
  • Slide 251
  • NFA CREATION EXAMPLES (x | y)* We have seen A*: Therefore, (x | y)*: A 8 1 27 x y 34 56
  • Slide 252
  • NFA CREATION EXAMPLES abb
  • Slide 253
  • NFA CREATION EXAMPLES a*bb 23 41 a 5 b b 6
  • Slide 254
  • NFA CREATION EXAMPLES (a|b)*bc 70 16 a b 23 45 cb 8 9 9
  • Slide 255
  • CONVERSION OF AN NFA INTO DFA Subset construction algorithm is useful for simulating an NFA by a computer program In the transition table of an NFA, each entry is a set of states In the transition table of a DFA, each entry is just a single state. General idea behind the NFA-to-DFA conversion: each DFA state corresponds to a set of NFA states
  • Slide 256
  • SUBSET CONSTRUCTION ALGORITHM Algorithm: Subset Construction - Used to construct a DFA from an NFA Input: An NFA N Output: A DFA D accepting the same language
  • Slide 257
  • SUBSET CONSTRUCTION ALGORITHM Method: Let s be a state in N and T be a set of states, and using the following operations:
  • Slide 258
  • SUBSET CONSTRUCTION (MAIN ALGORITHM)
  • Slide 259
  • SUBSET CONSTRUCTION (-CLOSURE COMPUTATION)
  • Slide 260
  • CONVERSION EXAMPLE Dstates={A,B,C}, where A = (1,2,3,5,8) B = (2,3,4,5,7,8) C = (2,3,5,6,7,8) 8 1 27 x y 34 56 xy ABC BBC CBC Regular Expression : (x | y)*
  • Slide 261
  • CONVERSION EXAMPLE Regular Expression : (x | y)* x A B C B C A y x x y y xy ABC BBC CBC
  • Slide 262
  • ANOTHER CONVERSION EXAMPLE Regular Expression: (a | b)*abb
  • Slide 263
  • ANOTHER CONVERSION EXAMPLE Regular Expression: (a | b)*abb
  • Slide 264
  • ANOTHER CONVERSION EXAMPLE Regular Expression: (a | b)*abb
  • Slide 265
  • MINIMIZING THE NUMBER OF STATES IN DFA Minimize the number of states of a DFA by finding all groups of states that can be distinguished by some input string Each group of states that cannot be distinguished is then merged into a single state
  • Slide 266
  • MINIMIZING THE NUMBER OF STATES IN DFA Algorithm: Minimizing the number of states of a DFA Input: A DFA D with a set of states S Output: A DFA M accepting the same language as D yet having as few states as possible
  • Slide 267
  • MINIMIZING THE NUMBER OF STATES IN DFA Method: 1.Construct an initial partition of the set of states with two groups: The accepting states group All other states group 2.Partition to new (using the procedure shown on the next slide) 3.If new != , repeat step (2). Otherwise, repeat go to step (4) 4. Choose one state in each group of the partition as the representative of the group 5.Remove dead states
  • Slide 268
  • CONSTRUCT NEW PARTITION PROCEDURE for each group G of do begin Partition G into subgroups such that two states s and t of G are in the same subgroup if and only if for all input symbols a, states s and t have transitions on a to states in the same group of ; /* at worst, a state will be in a subgroup by itself*/ Replace G in new by the set of all subgroups formed end
  • Slide 269
  • EXAMPLE OF NFA MINIMIZATION BC a DA c a a E F b a b b b F == A, B, C, D, E F A B, C, D, E F A B, C, D, E F
  • Slide 270
  • EXAMPLE OF NFA MINIMIZATION Minimized DFA, where: 1: A 2: B, C, D, E 3: F 21 c a 3 b 3
  • Slide 271
  • SECTION 9 PRACTICAL REGULAR EXPRESSIONS
  • Slide 272
  • TOPICS Practical notations that are often used with regular expression Few practice exercises
  • Slide 273
  • PRACTICAL REGULAR EXPRESSIONS TRICKS We will see practical regular expressions tricks that are supported by most regex libraries Remember, regular expressions are not only used in the context of compilers We often use them to extract information from text Example: imagine looking in a log file that has been accumulating entries for the past two months for a particular error pattern Without regular expressions, this would be a tedious job Sooner or later, when you work in the industry, you will encounter such issues regular expressions will come in handy
  • Slide 274
  • MATCHING DIGITS To match a single digit, as we have seen before, we can use the following regular expression: [0-9] Nonetheless, since matching a digit is a common operation, we can use the following notation: \d Slash is an escape character used to distinguish it from the letter d Similarly, to match a non-digit character, we can use the notation: \D
  • Slide 275
  • ALPHANUMERIC CHARACTERS To match an alphanumeric character, we can use the notation: [a-zA-Z0-9] Or we can use the following shortcut \w Similarly, we can represent any non-alphanumeric character as follows: \W
  • Slide 276
  • WILDCARD A wildcard is defined to match any single character (letter, digit, whitespace ) It is represented by the. (dot) character Therefore, in order to match a dot, you have to use the escape character: \.
  • Slide 277
  • EXCLUSION We have seen that [abc] is equivalent to (a | b | c) But sometimes we want to match everything except a set of characters To achieve this, we can use the notation: [^abc] This matches any single character other than a, b or c This notation can also be used with abbreviated character classes [^a-z] matches any character other than a small letter
  • Slide 278
  • REPETITIONS How can we match a letter or a string that repeats several times in a row: E.g. ababab So far, we have implemented repetitions through three mechanisms: Concatenation: simply concatenate the string or character with itself (does not work if you do not know the exact number of repetitions) Kleene star closure: to match letters or strings repeated 0 or more times Positive closure: to match letters or strings repeated 1 or more times
  • Slide 279
  • REPETITIONS We can also specify a range of how many times a letter or string can be repeated Example, if we want to match strings of repetition of the letter a between 1 and 3 times, we can use the notation: a {1,3} Therfore, a {1,3} matches the string aaa We can also specify an exact number of repetitions instead of a range ab {3} matches the string ababab
  • Slide 280
  • OPTIONAL CHARACTERS The concept of the optional character is somewhat similar to that of the kleene star The star operator matches 0 or more instances of the operand The optional operator, denoted as ? (question mark), matches 0 or one instances of the operand Example: the pattern ab?c will match either the strings "abc" or "ac" because the b is considered optional.
  • Slide 281
  • WHITE SPACE Often, we want to easily detect white spaces Either to remove them or to detect the beginning or end of words Most common forms of whitespace used with regular expressions: Space _, the tab \t, the new line \n and the carriage return \r A whitespace special character \s will match any of the specific whitespaces above Similarly, you can match any non-white space character using the notation \S
  • Slide 282
  • FEW EXERCISES Given the sentence: Error, computer will not shut down Provide a regular expression that will match all the words in the sentence Answer: \w*
  • Slide 283
  • FEW EXERCISES Given the sentence: Error, computer will not shut down Provide a regular expression that will match all the non- alphanumeric characters Answer: \W*
  • Slide 284
  • FEW EXERCISES Given the log file: [Sunday Feb. 2 2014] Program starting up [Monday Feb. 3 2014] Entered initialization phase [Tuesday Feb. 4 2014] Error 5: cannot open XML file [Thursday Feb. 6 2014] Warning 5: response time is too slow [Friday Feb. 7 2014] Error 9: major error occurred, system will shut down Match any error or warning message that ends with the term shut down Answer: (Error|Warning).*(shut down)
  • Slide 285
  • FEW EXERCISES Given the log file: [Sunday Feb. 2 2014] Program starting up [Monday Feb. 3 2014] Entered initialization phase [Tuesday Feb. 4 2014] Error 5: cannot open XML file [Thursday Feb. 6 2014] Warning 5: response time is too slow [Friday Feb. 7 2014] Error 9: major error occurred, system will shut down Match any Error or Warning before between 1 and 6 th February Answer: \[\w* Feb\. [1-6] 2014\] (Error|Warning)
  • Slide 286
  • SECTION 10 INTRODUCTION TO SYNTAX ANALYSIS 286
  • Slide 287
  • TOPICS Context free grammars Derivations Parse Trees Ambiguity Top-down parsing Left recursion 287
  • Slide 288
  • THE ROLE OF PARSER 288
  • Slide 289
  • CONTEXT FREE GRAMMARS A Context Free Grammar (CFG) consists of Terminals Nonterminals Start symbol Productions A language that can be generated by a grammar is said to be a context-free language 289
  • Slide 290
  • CONTEXT FREE GRAMMARS Terminals: are the basic symbols from which strings are formed These are the tokens that were produced by the Lexical Analyser Nonterminals: are syntactic variables that denote sets of strings One nonterminal is distinguished as the start symbol The productions of a grammar specify the manner in which the terminal and nonterminals can be combined to form strings 290
  • Slide 291
  • EXAMPLE OF GRAMMAR The grammar with the following productions defines simple arithmetic expressions expr ::= expr op expr expr ::= id expr ::= num op ::= + op ::= - op ::= * op ::= / In this grammar, the terminal symbols are num, id + - * / The nonterminal symbols are expr and op , and expr is the start symbol 291
  • Slide 292
  • DERIVATIONS expr expr op expr is read expr derives expr op expr expr expr op expr id op expr id * expr id*id is called a derivation of id*id from expr. 292
  • Slide 293
  • DERIVATIONS If A::= is a production and and are arbitrary strings of grammar symbols, we can say: A If 1 2 ... n, we say 1 derives n. 293
  • Slide 294
  • DERIVATIONS means derives in one step. means derives in zero or more steps. if and then means derives in one or more steps. If S , where may contain nonterminals, then we say that is a sentential form If does no contains any nonterminals, we say that is a sentence * * * + * 294
  • Slide 295
  • DERIVATIONS G: grammar S: start symbol L(G): the language generated by G Strings in L(G) may contain only terminal symbols of G A string of terminals w is said to be in L(G) if and only if S w The string w is called a sentence of G A language that can be generated by a grammar is said to be a context-free language If two grammars generate the same language, the grammars are said to be equivalent + 295
  • Slide 296
  • DERIVATIONS We have already seen the following production rules: expr ::= expr op expr | id | num op ::= + | - | * | / The string id+id is a sentence of the above grammar because expr expr + expr id + expr id + id We write expr id+id * 296
  • Slide 297
  • PARSE TREE expr op id+ This is called: Leftmost derivation 297
  • Slide 298
  • TWO PARSE TREES Let us again consider the arithmetic expression grammar. For the line of code: x+2*y (we are not considering the semi colon for now) Grammar: expr ::= expr op expr | id | num op ::= + | - | * | / Lexical Analyser x+z*y Syntax Analyser id+id*idparse tree 298
  • Slide 299
  • TWO PARSE TREES Let us again consider the arithmetic expression grammar. The sentence id + id * id has two distinct leftmost derivations: expr expr op expr id op expr id + expr id + expr op expr id + id op expr id + id * expr id + id * id expr expr op expr expr op expr op expr id op expr op expr id + expr op expr id + id op expr id + id * expr id + id * id Grammar: expr ::= expr op expr | id | num op ::= + | - | * | / 299
  • Slide 300
  • TWO PARSE TREES expr op expr + id * expr op expr id expr op expr id + expr op * id Equivalent to: id+(id*id) Equivalent to: (id+id)*id Grammar: expr ::= expr op expr | id | num op ::= + | - | * | / 300
  • Slide 301
  • PRECEDENCE The previous example highlights a problem in the grammar: It does not enforce precedence It has not implied order of evaluation We can expand the production rules to add precedence 301
  • Slide 302
  • APPLYING PRECEDENCE UPDATE The sentence id + id * id has only one leftmost derivation now: expr expr + term term + term factor + term id + term id + term * factor id + factor * factor id + id * factor id + id * id Grammar: expr ::= expr + term | expr - term | term term ::= term * factor | term / factor | factor factor ::= num | id factor expr term expr + id termfactor* id 302
  • Slide 303
  • AMBIGUITY A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Example: Consider the following statement: It has two derivations It is a context free ambiguity 303
  • Slide 304
  • AMBIGUITY A grammar that produces more than one parse tree for some sentence is said to be ambiguous 304
  • Slide 305
  • ELIMINATING AMBIGUITY Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. E.g. match each else with the closest unmatched then This is most likely the intention of the programmer 305
  • Slide 306
  • MAPPING THIS TO A JAVA EXAMPLE In Java, the grammar rules are slightly different then the previous example Below is a (very simplified) version of these rules ::= | ::= if ( ) else | other stmnts ::= if ( ) | if ( ) else 306
  • Slide 307
  • MAPPING THIS TO A JAVA EXAMPLE For the following piece of code if (x==0) if (y==0) z = 0; else z = 1; After running the lexical analyser, we get the following list of tokens: if ( id == num ) if (id == num) id = num ; else id = num ; 307
  • Slide 308
  • MAPPING THIS TO A JAVA EXAMPLE Token input string: if ( id == num ) if (id == num) id = num ; else id = num ; Building the parse tree: stmnt ()exprmatchedifelsematched()exprstmntif umatched matched 308
  • Slide 309
  • MAPPING THIS TO A JAVA (ANOTHER) EXAMPLE Token input string: if ( id == num ) else id = num ; Building the parse tree: ()exprmatchedifelsematched stmnt matched 309
  • Slide 310
  • TOP DOWN PARSING A top-down parser starts with the root of the parse tree, labelled with the start or goal symbol of the grammar To build a parse tree, we repeat the following steps until the leafs of the parse tree matches the input string 1.At a node labelled A, select a production A::= and construct the appropriate child for each symbol of 2.When a terminal is added to the parse tree that does not match the input string, backtrack 3.Find the next nonterminal to be expanded 310
  • Slide 311
  • TOP DOWN PARSING Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string Example: Input string cad Grammar: We need to backtrack! ::= c d ::= ab | a 311
  • Slide 312
  • EXPRESSION GRAMMAR Recall our grammar for simple expressions: Consider the input string: id num * id 312
  • Slide 313
  • EXAMPLE Reference Grammar 313
  • Slide 314
  • EXAMPLE Another possible parse for id num * id If the parser makes the wrong choices, expansion does not terminate This is not a good property for a parser to have Parsers should terminate, eventually 314
  • Slide 315
  • LEFT RECURSION A grammar is left recursive if: It has a nonterminal A such that there is a derivation A A for some string Top down parses cannot handle left- recursion in a grammar + 315
  • Slide 316
  • ELIMINATING LEFT RECURSION Consider the grammar fragment: Where and do not start with foo We can re-write this as: Where bar is a new non-terminal This Fragment contains no left recursion 316
  • Slide 317
  • EXAMPLE Our expression grammar contains two cases of left-recursion Applying the transformation gives With this grammar, a top-down parser will Terminate (for sure) Backtrack on some inputs 317
  • Slide 318
  • PREDICTIVE PARSERS We saw that top-down parsers may need to backtrack when they select the wrong production Therefore, we might need predictive parsers to avoid backtracking This is where predictive parsers come in useful LL(1): left to right scan, left-most derivation, 1-token look ahead LR(1): left to right scan, right most derivation, 1-token look ahead 318
  • Slide 319
  • SECTION 11 LL(1) PARSER 319
  • Slide 320
  • TOPICS LL(1) Grammar Eliminating Left Recursion Left Factoring FIRST and FOLLOW sets Parsing tables LL(1) parsing Many examples 320
  • Slide 321
  • REVIEW: ROLE OF PARSER 321
  • Slide 322
  • P