Code Generation- An Introduction to Typed EBNF

7/26/2019 Code Generation- An Introduction to Typed EBNF

http://slidepdf.com/reader/full/code-generation-an-introduction-to-typed-ebnf 1/73

Utah State University

DigitalCommons@USU

A G#&#' P# B #& *' R'3 G#&#' S&+'3

4-28-2015

Code Generation: An Introduction to TypedEBNF

Jason Gerald Young

F6 *+3 #& #&&++# 63 #: *://&++#%3.3.'&/#&'3

+3 R' + 3 $* '' #& ' #%%'33 $ *' G#&#'

S&+'3 # D++#C3@!S!. I *#3 $'' #%%''& +%3+ + A

G#&#' P# B #& *' R'3 $ # #*+'& #&++3#

D++#C3@!S!. F ' +#+, '#3' %#%

$'%.*3@3.'&.

R'%'&'& C+#+ , J#3 G'#&, "C&' G''#+: A I&%+ '& EBNF" (2015). All Graduate Plan B and other Reports. P#' 497.

http://digitalcommons.usu.edu/?utm_source=digitalcommons.usu.edu%2Fgradreports%2F497&utm_medium=PDF&utm_campaign=PDFCoverPages

http://digitalcommons.usu.edu/gradreports?utm_source=digitalcommons.usu.edu%2Fgradreports%2F497&utm_medium=PDF&utm_campaign=PDFCoverPages

http://digitalcommons.usu.edu/gradstudies?utm_source=digitalcommons.usu.edu%2Fgradreports%2F497&utm_medium=PDF&utm_campaign=PDFCoverPages


mailto:[email protected]

http://library.usu.edu/?utm_source=digitalcommons.usu.edu%2Fgradreports%2F497&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]


http://digitalcommons.usu.edu/gradstudies?utm_source=digitalcommons.usu.edu%2Fgradreports%2F497&utm_medium=PDF&utm_campaign=PDFCoverPages


http://digitalcommons.usu.edu/?utm_source=digitalcommons.usu.edu%2Fgradreports%2F497&utm_medium=PDF&utm_campaign=PDFCoverPages



CODE GENERATION: AN INTRODUCTION TO TYPED EBNF

by

Jason G. Young

A report submitted in partial fulfillmentof the requirements for the degree

of

MASTER OF SCIENCE

in

Computer Science

Approved:

Dr. Vicki H. Allan Dr. Kenneth SundbergCo-Major Professor Co-Major Professor

Dr. Curtis DyresonCommittee Member

UTAH STATE UNIVERSITYLogan, Utah

2015



ii

Copyright c Jason G. Young 2015

All Rights Reserved



iii

ABSTRACT

Code Generation: An Introduction to Typed EBNF

by

Jason G. Young, Master of Science

Utah State University, 2015

Major Professors: Dr. Vicki H. Allan and Dr. Kenneth SundbergDepartment: Computer Science

Errors and inconsistencies between code components can be very costly in a software

project. Efforts to reduce these costs can include the use of tools that limit human interac-

tion with code by generating it from a description. This paper introduces two new works

to address these issues: (1) an input specification called Typed EBNF (TEBNF), and (2)

a prototype tool that demonstrates how TEBNF can be used to generate code. The tool

generates code for a console application as described by a TEBNF grammar. An application

built from the generated code will be able to receive input data, parse it, process it, and

output it as needed.

(72 pages)



iv

CONTENTS

Page

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

LIST OF LISTINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

CHAPTER

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 TEBNF Input Specification and Code Generation Tool . . . . . . . . . . . . 2

2 Background and Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Traditional Scanner and Parser Code Generation . . . . . . . . . . . . . . . 42.2 Model-Based Parser Code Generation . . . . . . . . . . . . . . . . . . . . . 8

3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1 The TEBNF Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 The TEBNF Code Generation Tool . . . . . . . . . . . . . . . . . . . . . . . 153.4 Lexical Analysis and Parsing Code Generation . . . . . . . . . . . . . . . . 173.5 State Machine and Actions Code Generation . . . . . . . . . . . . . . . . . 18

4 Implementing Tools with TEBNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1 NITF File Parsing: A Real World Example . . . . . . . . . . . . . . . . . . 214.2 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 Comparing C++ to TEBNF . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.1 Optimizing the TEBNF Code Generation Tool . . . . . . . . . . . . . . . . 40

5.2 New I/O Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3 Runtime Graphical User Interface Generation . . . . . . . . . . . . . . . . . 41

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Appendix A TEBNF Syntax and Usage . . . . . . . . . . . . . . . . . . . . . . 50Appendix B TEBNF Turing Completeness Proof . . . . . . . . . . . . . . . . . 61



v

LIST OF TABLES

Table Page

4.1 Sample input and output data for the calculator. . . . . . . . . . . . . . . . 27

4.2 Sizes of sample NITF 2.1 files sent to the test client. . . . . . . . . . . . . . 30

4.3 Calculator Test Case Implementation Times . . . . . . . . . . . . . . . . . . 34

4.4 NITF File Client Test Case Implementation Times . . . . . . . . . . . . . . 35

4.5 Analysis results for the NITF file client test cases. . . . . . . . . . . . . . . 36

4.6 Analysis results for the calculator test cases. . . . . . . . . . . . . . . . . . . 36

4.7 P-values calculated from the analysis data of each test case. . . . . . . . . . 36

A.1 TEBNF Elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

A.2 TEBNF symbols, production rules, non-typed terminals, and typed terminals. 52

A.3 TEBNF production rule operators. . . . . . . . . . . . . . . . . . . . . . . . 54

A.4 TEBNF arithmetic and comparison operators. . . . . . . . . . . . . . . . . . 54

A.5 TEBNF inter-element operator. . . . . . . . . . . . . . . . . . . . . . . . . . 55

A.6 TEBNF input/output specifiers. . . . . . . . . . . . . . . . . . . . . . . . . 57

B.1 List of 5-tuples describing UTM(4,6). . . . . . . . . . . . . . . . . . . . . . . 62



vi

LIST OF LISTINGS

Listing Page

A.1 A TEBNF Actions element. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

A.2 A TEBNF Console Input element. . . . . . . . . . . . . . . . . . . . . . . . 57

A.3 TEBNF grammar describing a calculator. . . . . . . . . . . . . . . . . . . . 58

A.4 TEBNF grammar describing a UDP NITF 2.1 file transfer client. . . . . . . 60

B.1 Rogozhin’s UTM(4,6) implemented in TEBNF. . . . . . . . . . . . . . . . . 63



vii

LIST OF FIGURES

Figure Page

2.1 Example Yacc grammar rule matched to input. . . . . . . . . . . . . . . . . 5

2.2 Lex and Yacc Compilation Sequence . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Traditional code generation process. . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Model-based code generation process. . . . . . . . . . . . . . . . . . . . . . 9

3.1 Architecture of the TEBNF prototype code generation tool. . . . . . . . . . 16

3.2 Order of execution for code generated by the TEBNF code generation tool. 19

4.1 Concept of operations for NITF 2.1 format files. . . . . . . . . . . . . . . . 22

4.2 TEBNF Implementation of the NITF 2.1 USE00A TRE . . . . . . . . . . . 23

4.3 State machine representing a calculator. . . . . . . . . . . . . . . . . . . . . 26

4.4 State machine representing a NITF 2.1 UDP/IP file transfer client . . . . . 29

4.5 TEBNF code generation tool output for the calculator test case. . . . . . . 31

4.6 TEBNF code generation tool output for the UDP NITF 2.1 file client test case. 31

4.7 Running the calculator test case. . . . . . . . . . . . . . . . . . . . . . . . . 32

4.8 Running the file server for the NITF 2.1 file client test case. . . . . . . . . . 33

4.9 Running the NITF 2.1 file client test case. . . . . . . . . . . . . . . . . . . . 33

4.10 Function token count for the file client test cases. . . . . . . . . . . . . . . . 37

4.11 Function token count for the calculator test cases. . . . . . . . . . . . . . . 38

4.12 Function CCN for the file client test cases. . . . . . . . . . . . . . . . . . . . 39

4.13 Function CCN for the calculator test cases. . . . . . . . . . . . . . . . . . . 39



1

CHAPTER 1

Introduction

Various defects in software can result from multiple developers working on the same

code within a project [1]. Some errors introduced by the coding activities of developers may

consist of behavioral differences within and between components and their interfaces [2–4].

Other errors stem from inadequate comprehension of project code, supported by the fact

that developers spend 50% to 80% of their time understanding that code [5]. Furthermore,

these kinds of misunderstandings can lead to bug fixes that inject more faults into code [3].

Comprehensive requirements analysis, design, and other good software development

practices can prevent many post-delivery software problems [6], but cannot completely

eliminate the possibility of human error. Even well designed projects can get mired in code

implementation details that hinder or prevent developers from delivering on time and within

budget.

1.1 Code Generation

Code generation tools can improve the software development process by generating well-

organized code reducing implementation errors and inconsistencies [7]. Code generation has

been shown to improve the productivity of developers, guarantee correctness of syntax, and

decrease errors [8]. A compiler is a type of code generation tool that generates binary

executables from code written in higher-level programming languages like C and C++.

Some tools generate higher-level code instead of binary given a set of rules known as a

grammar. This high-level generated code is then fed to a compiler to create a binary

executable.

Domain specific code generation tools specialize in generating code for one part of an

application’s overall implementation. Popular targets for specialized code generation include



2

lexical analyzers and parsers. One of the earliest code generation tools is the compiler-

compiler (parser generator), a term first presented in the early 1960s [9].

1.2 TEBNF Input Specification and Code Generation Tool

The goal of this paper is to present a new input specification and code generation tool

that addresses some of the issues not handled by commonly used code generators. This tool

accepts a new input specification based on Extended Backus-Naur Form (EBNF) called

Typed-EBNF (TEBNF) (see appendix A). The tool validates key features of TEBNF:

• Can describe input patterns that include a mixture of strings, numbers, and/or raw

groupings of bytes.

• Integrates grammar rules with states and actions.

• Can specify different methods of receiving input and sending output.

• Can declare how raw input data should be unmarshaled into well-known types of

specific sizes (in bytes).

• Can declare how well-known types should be marshaled back into their original format.

• Provides typed and non-typed EBNF terminals.

• Supports arithmetic and non-arithmetic operations inside grammar rules.

• Grammars match specific pieces of a given input to well-known types of varied sizes

(in bits or bytes).

• Supports the use of variables.

• Is Turing complete (see appendix B).

The prototype code generation tool will demonstrate that it can generate a console

application that can:

• Parse data from different kinds of inputs



3

• Process data to produce specific output(s)

• Direct output(s) to a network destination (UDP/IP), file, or console-based user-

interface

• Support custom network protocol handling and interaction (UDP/IP)

• Provide a console-based user interface for the application



4

CHAPTER 2

Background and Related Works

The purpose of code generation tools is to help developers improve their productivity,

ensure correctness of code syntax, and lessen the number of errors in software [8]. The

input specifications of these tools introduce additional levels of indirection to solve these

problems [10]. A range of tools have been created to generate code that can be complex

and tedious to write by hand:

• Scanners

• Parsers

• Interpreters

• Compilers

• Graphical user interfaces

The TEBNF input specification shares some similarities to those accepted by other

tools. In order to understand how TEBNF and the TEBNF code generation tool improves

upon these these approaches, a review of pertinent code generation tools and their input

specifications is helpful.

2.1 Traditional Scanner and Parser Code Generation

Yacc (Yet Another Compiler Compiler) [11] is a popular tool first introduced in 1975

that generates LALR parsers given (1) a context-free grammar to describe input, (2) an

action (code snippets) to run for each token grouping that matches a grammar rule, and

(3) code to provide input tokens to the parser. This input specification is a hybrid between

a declarative domain-specific language (i.e. a grammar) and an imperative programming



5

language like C [12–14]. Nevertheless, Yacc lacks the ability to read an input stream and

convert it into tokens for parsing.

Lexical analyzer generators such as Lex and Flex must be used in conjunction withYacc to accomplish this separate task of lexical analyzer generation [11, 15]. The practice

of using separate tools to generate a lexical analyzer and parser means that input formats

for each tool must be learned and used correctly to achieve the desired result. A benefit

of the TEBNF input specification is that it describes how data is received and parsed

(appendix A), allowing the TEBNF code generation tool to generate both lexical analysis

and parsing code.

Figure 2.1: Example Yacc grammar rule matched to input.

Figure 2.1 shows an example Yacc grammar rule matched to input formatted as military

time. In this example, military hour, minute, and second (all defined somewhere else in the

grammar) describe specific parts of input data as military time. When tokens match this

rule, a code action is executed [11,16].

Similar to Yacc is Bison [17, 18], a Yacc-compatible parser generator that accepts any

properly written Yacc grammar. Like Yacc, Bison-generated parsers read a sequence of

tokens from a scanner generated by a lexical analyzer generator like Lex or flex.

To illustrate the steps of traditional parser generation using Lex and Yacc (figure

2.2) [11,15,16], a file is provided by the developer containing a set of patterns that define

how to separate strings found in source data. This file is read by Lex, which uses these

patterns to generate the C source code of a lexical analyzer. This newly generated lexical

analyzer uses the patterns to identify specific strings in the input and split them into tokens

to simplify processing.



6

Figure 2.2: Lex and Yacc Compilation Sequence [16].

A file containing grammar rules is provided by the developer to Yacc, which uses those

rules to generate C source code for a syntax analyzer (i.e. parser). The syntax analyzer

uses this grammar to transform the tokens output by the lexical analyzer into a syntax

tree. The structure of this syntax tree implies the precedence and associativity of operators

found within the tokens. The syntax tree is then traversed in depth-first order to generate

the desired source code (figure 2.2) [16].

A predicated-LL(k) parser called ANTLR [19] was introduced by Parr and Quong in

1995. The ANTLR parser generator was advertised to be easier to use than generators like

Yacc or Bison. An LL(k) parser is a top-down parser that parses from left to right, utilizing

a look-ahead of k tokens. All parsing decisions are made solely from the next k tokens,

which means that it does no backtracking.



7

The Hyacc (Hawaii Yacc) parser generator first released in 2008 supports complete

LR(0), LALR(1), LR(1), and partial LR(k) [20, 21]. Hyacc is compatible with Yacc and

Bison input grammars and works with Lex. The Hyacc parser generator is notable becauseit can resolve reduce/reduce conflicts through its implementation of the LR(1) parser gen-

eration algorithm [20]. Reduce/reduce conflicts occur when two or more rules in an input

grammar apply to the same input sequence [22]. These conflicts are typically the result of

a serious problem with an input grammar [22].

The traditional code generation process using a lexical analyzer generator in conjunc-

tion with a parser generator and actions code is shown in figure 2.3. This process illustrates

the pattern used by many parser generation tools.

Figure 2.3: Traditional code generation process.

All of these parser generators provide an effective means to reduce human interaction

with code. They have the added benefit of generating logical and syntactically correct code



8

as long as the grammar is correct. On the other hand, the input grammars used by these

parser generators cannot be used for lexical analysis of the parser input. Code actions

that generate output must be manually written and inserted into the generated code. Theinconvenience of editing generated code is avoidable in TEBNF because actions are defined

directly within TEBNF grammars (appendix A).

2.2 Model-Based Parser Code Generation

Model-based parser generators provide an alternative to traditional parser generators

using a model-based language specification. This kind of specification is explained by [23]

and [24] as starting with an abstract syntax model (ASM) embodying the main concepts of

a given language. One or more concrete syntax models (CSMs) are created from this ASM.

Each CSM defines specific details about the language being modelled. Elements within the

ASM can then be converted into their concrete representation using a mapping of the ASM

to its CSM(s). This mapping is created by annotating the ASM with pertinent constraint

metadata. With this mapping in place, any changes to the ASM by the user will cause the

language processor to automatically update to reflect those changes. Figure 2.4 illustrates

the model-based parser code generation process.Since grammar specifications are not needed by model-based parser generators, they

can offer several advantages over traditional parser generators [24]:

• An easier language design process. Language design is decoupled from language pro-

cessing because the language grammar is automatically generated.

• Non-tree structures can be modeled. This is different from traditional parser genera-

tors that force users to model a tree structure.

• Some semantic checks like reference resolution can be automated.

• Handles references between language elements, as opposed to the traditional way of

resolving references manually using a symbol table.



9

Figure 2.4: Model-based code generation process, based on a figure from [24].

Model-based parser generators can behave in one of two ways based on the ASM

used [24]: (1) as a traditional parser generator when an ASM representing a tree structure

is used, or (2) as a model-based parser generator when an ASM representing a non-tree

structure is used. The second possibility causes a model-based parser generator to begin

the reference resolution process, resulting in an abstract syntax graph.

ModelCC is a model-based parser generator that accepts an annotated conceptual

model as its input rather than a context-free grammar [23]. ModelCC uses this annotated

model to generate a parser written in Java that automatically instantiates the language

conceptual model [23,25].

The process of generating a parser using ModelCC as explained by [24] begins with

an ASM that is defined by classes representing language elements and the relationships

between them. Metadata annotations are added to these language elements and their

corresponding relationships to produce an ASM to CSM mapping. Reference resolution is

performed to match referenced ob jects to their equivalent ob ject instantiations. A parser

is then generated that automatically instantiates the conceptual model.



10

ModelCC decreases human interaction with code by generating grammar code and

lexical analysis code from an input model specification. Yet it is similar to other parser

generators because it does not implicitly support specific input and output methods such asUDP/IP, etc. The TEBNF code generation tool differs from ModelCC (and the other tools

reviewed in this chapter) because it is able to specify several input and output methods

(appendix A).



11

CHAPTER 3

Design

The TEBNF language makes it possible to convert a software design into a specification

that can be provided to the TEBNF code generation tool. The TEBNF code generation

tool generates C++ code from TEBNF grammars that can be built by a C++ compiler

into a functioning console application.

As mentioned previously, TEBNF abstracts some of the complexities of creating soft-

ware with the aim of reducing defects in code. This chapter covers important design consid-

erations and decisions that were critical to achieving the stated goals of TEBNF. Specifically,

this chapter discusses the TEBNF language, design decisions, the TEBNF code generation

tool, the architecture of the tool, and the code generation process.

3.1 The TEBNF Language

The TEBNF specification is composed of language constructs called elements. A

TEBNF grammar is the combination of these elements and their contents to define an

application. The primary function of each element type is to convey a specific type of func-

tionality. Enforcing boundaries between different types of functionality (elements) results

in greater scalability. There are five kinds of elements in the TEBNF specification:

• Input methods: a method of receiving data as input.

• Output methods: a method of outputting data.

• Grammar sections: parses input data.

• Actions: action code to execute using data from other elements.

• State transition tables: defines the behavior of the program.



12

Subelements are language constructs that exist inside elements. Different kinds of

subelements exist for each element type, and are enumerated in detail in appendix A. The

glue that brings these elements together to describe an application is two-fold: 1) staticvariables that can be defined within grammar elements and used by other elements to

get and store information, and 2) state transition table elements that use all of the other

elements in a grammar to express the order of task execution.

The structure provided by elements and subelements in a TEBNF grammar are meant

to help bridge the gap between design and implementation. From the perspective of a

developer, a typical application might be represented as classes in an object-oriented lan-

guage like C++. Frequently used input/output (I/O) functions in the program might beencapsulated within one or more classes that wrap frequently used I/O functionality. Data

received and sent using these I/O functions would be parsed and processed by different

classes structured according to a design modeled by a state machine. This kind of design

could easily be represented in TEBNF using the appropriate I/O method elements to han-

dle I/O, grammar elements to deal with lexical analysis and parsing of incoming and/or

outgoing data, and a state transition table element to describe the state machine outlined

in the design.

3.2 Design Decisions

Several important design decisions were made in order to improve function and reduce

complexity in TEBNF. This section discusses why these design decisions were made, how

these decisions influenced the design of TEBNF, and how these decisions affected the code

generated from it.

3.2.1 Singleton Paradigm

One of the primary goals of TEBNF is to reduce the difficulty of translating high-level

designs into grammars. With this goal in mind, elements have been designed to behave as

singletons in TEBNF, as well as the code generated from them.



13

Multiple reasons and advantages exist for treating TEBNF elements and the C++

classes generated from them as singletons:

• Lazy instantiation. Singleton C++ classes make it possible to avoid allocating memory

for element class objects until they are actually used in the generated code. This is

different from static initialization that allocates memory to a variable at the time of

declaration.

• Thread synchronization. Singletons can yield the best results in situations where

multiple threads try to access the same resource. Thread-safe access to the singleton

object is achieved by locking a synchronization mechanism (e.g. mutex) inside it’s

getter function as it returns the static singleton object. This applies to code generated

from a TEBNF grammar because it is possible for multiple state table threads to need

access to the same subelement or static variable class members at the same time.

• Instance control. A singleton prevents other objects from instantiating their own

copies of the singleton object, ensuring that all objects access the same instance. This

simplifies the design process in TEBNF because grammar writers know exactly what

they are accessing in their grammar.

• Flexibility. Generated singleton element classes control the instantiation process.

This control over the instantiation process gives them the flexibility to change the

process. Since TEBNF purposely abstracts these details from grammar writers, it is

advantageous for each element class to control the way it is instantiated based on how

it is used/interacts in the generated application code.

3.2.2 Input and Output Elements

Console applications typically get string or numerical input from users by displaying

questions and waiting for the user to type an answer. Attempting to determine the type

of input information in generated code could lead to incorrect interpretations of the input

type at run time. One technique that was considered to avoid this problem is to directly tie



14

grammar subelements or static variables to input subelements. This method proved to be

unwieldy because grammar elements change during the natural process of writing a TEBNF

grammar. Simply changing the name of a grammar subelement would require that nameto be changed in every input subelement tied to it. This problem could be compounded

as other input elements are created that contain subelements tied to the same grammar

subelements or static variables. To resolve this problem, console input subelements tie each

question string to a type (appendix A). The responsibility of knowing the expected input

type falls to the grammar writer rather than a risky prediction.

Usage of console output elements in TEBNF was found to be simpler than console

input. This is due to the fact that generated console output code performs a simple passingof output data to a C++ std::cout statement. Because std::cout easily handles outputting

of numerical and string data, there is no need for special handling in TEBNF.

Support for sending and receiving data over UDP is achieved in TEBNF using UDP

elements. The TEBNF language uses UDP I/O elements to abstract most of the details

involved with setting up and using UDP sockets. Generated UDP I/O element classes

prompt users for an IP address and port number before initiating UDP communication. It

became apparent that a way was needed to indicate that a UDP output element shouldsend data on the same socket instance used by an existing UDP input element to receive

data. The ”AS” keyword expresses this relationship between two I/O elements of the same

type. The ”AS” relationship defines an element that shares all of the same characteristics

as another I/O element of the same type. Because TEBNF elements and their respective

C++ element classes are singletons, all that is needed to represent this case in generated

C++ code is a type definition of the I/O element in question (typedef). This reduces the

number of C++ classes generated.

3.2.3 Actions Elements

Actions elements in TEBNF share two similarities with C++ functions. First, actions

elements can have parameters, allowing arguments to be passed to them when called inside

a state transition table element. Second, actions elements can contain multiple instructions.



15

Actions elements are essentially C++ code blocks that allow direct reference to subelements

and static variables found in grammar elements. This means they have loose parsing re-

quirements compared to other TEBNF elements so that the chosen C++ compiler can catchcomplex errors in actions element code. The syntax for accessing subelements defined inside

other grammar elements can be found in appendix A.

Unlike C++ functions, actions elements cannot return values. This limitation serves

to maintain the clear purpose of their usage in state tables and makes it clear that they are

not intended to be used as the input or condition of a state.

3.3 The TEBNF Code Generation Tool

This report presents a prototype code generation tool that generates the lexical analy-

sis, parsing, and actions code of a basic console application using a single TEBNF grammar

as input. Generated applications accept user input where necessary and can provide mean-

ingful status. The tool outputs a set of classes that:

• Accept input data through console, file, or UDP/IP.

• Provide a set of functions that unmarshal raw data into human-readable types such

as numbers and strings, and can marshal it back into its original form.

• Use these unmarshal functions to match input data to pattern(s) specified in the

grammar and convert them to human-readable values.

• Run one or more state machines with each on its own thread to receive data through

input methods described in the grammar. As input data arrives, the state machine

finds matches to grammar patterns and executes actions that produce the desired

output.

• Provide a console-based user interface that prompts for input as needed and provides

status.

The architecture of the TEBNF code generation tool consists of four stages, shown in

figure 3.1. The TEBNF code generation tool is a console application that accepts three



16

arguments in order: (1) the path of the file containing a TEBNF grammar, (2) the path of

directory where the tool will write generated code, and (3) the name to give to the generated

application.

Figure 3.1: Architecture of the TEBNF prototype code generation tool.

Upon receiving these arguments, the tool reads the grammar file at the location pro-

vided to the tool. The grammar is lexically analyzed by the TEBNF scanner to produce a

list of tokens, with some metadata attached to some tokens as needed.

The TEBNF parser syntactically analyzes these tokens using a recursive descent algo-

rithm to verify correctness of format. A pointer to each element discovered by the parser

is then added to a table for quick look up and inserted into a parse tree. Subelements

discovered during this parsing stage are added to their parent elements and subelements as



17

they are found in the parse table and parse tree.

After all of the elements and subelements have been added to the parse table and parse

tree, the TEBNF resolver traverses the subelements in the parse tree and their descendantsto their furthest extent (leaves). This ensures that all subelements resolve to a terminal

type or literal value.

Once all elements and their subelements have been resolved, the TEBNF code generator

iterates through each element in order of declaration in the input grammar, and generates a

C++ class or other appropriate C++ code. The name of each C++ source file corresponds

to the element it was generated from. A CMakeLists.txt file is generated with these C++

source files so that CMake can be used to generate Microsoft Visual Studio 2013 solution andproject files. For convenience, a clang-format file is generated so that any clang-formatting

(optional) follows the intended format.

3.4 Lexical Analysis and Parsing Code Generation

The TEBNF code generation tool generates a class for each I/O element (method)

defined in a TEBNF grammar (see appendix A). These I/O classes perform no lexical

analysis. TEBNF supports the receiving of input and/or sending of output through theconsole, files, or UDP/IP.

These I/O methods are a powerful feature of TEBNF and the TEBNF code generation

tool because the tool automatically integrates the code to do I/O using these methods. The

complexities of their usage are abstracted by the tool, which is one of the key advantages

of TEBNF and the TEBNF code generation tool. Contrast this to the most common code

generation tools, which do not provide this built-in I/O capability.

Lexical analysis and parsing is performed by the generated code in one step ratherthan separate steps. This is possible because of the way patterns are described in TEBNF

grammar elements (see appendix A). A typical grammar element pattern is composed

of groupings of bytes broken into sub-groupings of bytes. These sub-groupings can be

translated into specific types (e.g. numbers, and strings) and literal values. The size in bits

or bytes is defined in the grammar element based on its type.



18

Matching user-defined TEBNF grammar patterns to incoming data requires that one

or more literal values be defined somewhere in the grammar. The value of a literal value

makes it possible to find it in the input data. The size and type of that initial literal valueis defined implicitly or explicitly in the grammar (e.g. 4-byte integer, etc.). Given the

initial reference offset αf of the literal f and its size zf , the offset of the literal or type

immediately following it is defined as αf +1, where αf +1 = αf + zf . The offset of the literal

or type defined immediately before f can be defined as αf −1, where αf −1 = αf −zf −1 and

zf −1 ≥ αf . The data offsets of subsequent literals and/or types found before and after the

initial reference offset are calculated based on the one after and before it, respectively.

When multiple patterns must be matched to incoming data, a separate grammar ele-ment must be written to describe each pattern. This design makes it possible to refer to any

given pattern using its grammar element name, making it easy to distinguish from other

grammar patterns in the same TEBNF grammar.

The prototype code generation tool translates each grammar element into a class that

can (1) unmarshal raw input data into specific user-defined types, (2) verify matches to

byte patterns in the raw input data, and (3) marshal the values stored in the grammar class

back into their original form and byte ordering. Patterns in data arriving through TEBNFinput methods are located by the state machine using unmarshal functions of grammar

element classes. As data arrives through an input method, patterns are recognized in the

data by the grammar classes and simultaneously unmarshaled into the data types of those

classes. This means lexical analysis and parsing are performed in the same step, using a

single TEBNF input (figure 3.2). This approach is different from traditional parser code

generation methods. Traditional methods require the use of a lexical analyzer generator tool

and parser generator tool; each with their own input specification formats (see figure 2.3).

3.5 State Machine and Actions Code Generation

Program behavior is defined in TEBNF using state machines, as shown in figure 3.2.

State machines are represented in TEBNF using state transition table elements (see ap-

pendix A). A state machine in TEBNF describes the order of tasks executed by a single



19

Figure 3.2: Order of execution for code generated by the TEBNF code generation tool.

application thread. States are divided into a series of six steps. These six steps are given be-

low, in the order they are expressed in TEBNF, which is the same order they are evaluated

by the generated code:

1. A unique name that identifies the state and allows other states to reference it

2. A Boolean condition that can be either a grammar to match with input data or an

explicit Boolean condition.

3. The method of input for receiving the input data (console, file, or UDP/IP).



20

4. The next state to jump to if the Boolean condition provided in the second step eval-

uates to true.

5. An output to send through the output method provided in the sixth step, an explicit

code action to execute, or nothing.

6. The output method to send output specified in the fifth step. If an action or nothing

was provided for the fifth step, then this step is empty as well.

These state steps shape the way TEBNF elements interact with each other and input

data by determining (1) what input methods data is received on, (2) what parts of the

received data match defined grammar patterns, (3) what method is used to output that

data, and (4) what code actions are executed.



21

CHAPTER 4

Implementing Tools with TEBNF

This chapter focuses on the implementation of applications using TEBNF and the

TEBNF code generation tool. The chapter is divided into four sections. The first section

will showcase a real world example using a TEBNF grammar to parse data from a National

Imagery Transmission Format (NITF) version 2.1 file. The second section will walk through

the implementation of two test cases. The third section will cover the testing and verification

of these test cases. The fourth section of the chapter will cover the results of testing.

4.1 NITF File Parsing: A Real World Example

Software developers are oftentimes tasked to write software that parses and processes

data in different formats. A well-known example is word processing software. Word pro-

cessing software must be able to open and read documents structured in its own proprietary

format and others (e.g. pdf, txt, etc.). Creating software that reads specific data formats

requires access to documentation detailing the exact structure of each format.

An example data format read by different applications is the NITF 2.1 file format.

The NITF 2.1 file format is part of a suite of standards established by the United States

(US) Government for formatting digital imagery [26]. Documentation detailing the NITF

2.1 standard is freely available for download from the Geospatial Intelligence Standards

Working Groups website [27]. This file format is compatible with software used by members

of the intelligence community, which includes the US Department of Defense (DoD) [26].

The VANTAGETM software suite produced by the Space Dynamics Laboratory and the

SOCET Services suite produced by BAE Systems are all capable of reading and exploiting

NITF 2.1 formatted files [28,29].

As explained by [26], the stated purpose of the NITF 2.1 file format is to provide the



22

Figure 4.1: Concept of operations for NITF 2.1 format files.

intelligence community an interoperable means of transmission and/or storage of electronic

imagery data. The NITF 2.1 format is intended to be used in the dissemination of imagery

derived intelligence data. A general concept of operations involving NITF data is shown in

figure 4.1. First, imagery data in NITF 2.1 format is requested by a member of the intel-

ligence community. Second, the NITF data is received and combined with other collateral

information to create intelligence report file(s) and/or products containing the requested

information of interest. Third, these report files and/or products are given to the requestor

of that intelligence information.

Exploiting NITF 2.1 files requires a detailed understanding of the format. With this

understanding, it is possible to create software than can parse NITF 2.1 files to find infor-

mation of interest, generate report files, and/or products.

The NITF 2.1 file format is composed of a file header and one or more segments. Each

segment contains a subheader with data fields. Each data field has a specific size and format

(depending on the type of field) and is located at a specific byte offset within the file.

Conditional data and/or data characteristics can be added to NITF 2.1 files. This



23

flexibility to extend the format is done using conditional fields in the file header and sub-

headers indicating the existence of Tagged Record Extensions (TREs) and Data Extension

Segments. Tagged Record Extensions contain data fields, while extension segments cancontain data in new formats.

Figure 4.2: TEBNF Implementation of the NITF 2.1 USE00A TRE (TRE informationtaken from [30]).

As stated previously, various applications such as Vantage and SOCET glean data from

NITF 2.1 formatted files for exploitation (figure 4.1). This can also be done using a TEBNF

grammar as demonstrated in figure 4.2. The figure shows a side-by-side comparison of the

NITF 2.1 USE00A (Exploitation Usability Extension Format) TRE [30] to its equivalent

implementation in TEBNF. Each data field for this TRE is compared to its equivalent



24

TEBNF grammar expression. This comparison highlights several important advantages of

using TEBNF to implement parsers of specifications like NITF:

• Convenience. There is no need to keep track of the offset of each data field because it

is determined based on the size of each type.

• Readability. TEBNF grammar syntax looks very similar to actual specifications as

shown in figure 4.2, making it is easier to understand.

• Self-documenting. Since TEBNF grammar syntax ties the size of each data field to

the size of the subelement type, the grammar helps to document itself.

4.2 Test Cases

In order to verify the functionality of the TEBNF language, two case studies were

implemented. Both case studies showcase the strengths of TEBNF and the TEBNF code

generation tool. Further case studies are possible because TEBNF is Turing complete (see

appendix B).

4.2.1 Basic CalculatorThe first test case was to create a basic calculator console application that supports

addition, subtraction, multiplication, and division of integers. Producing this calculator

required the tool to generate code that:

1. Parses numeric data received over a console input.

2. Makes calculations based on the data received from that console input.

3. Sends the results of those calculations to console output.

4. Allows the user to exit when finished.

The calculator begins with a step executed once at the beginning of the program. This

initial step prompts users to enter a number, which is then saved as an initial result value

because all subsequent math operations are executed against it. From this point onward,



25

the calculator repeats a cycle that 1) asks for a number, 2) asks for a math operator, 3)

applies that math operator against the saved result and the last number entered, 4) saves

the result, and 5) displays that result. This cycle then repeats until the user enters a =operator, which then displays the result and exits the program.

Input for the calculator is achieved with a single TEBNF console input element con-

taining two prompt values. The first is used for prompting the user to enter a number,

accepting a signed 64-bit integer. The second one is used for prompting the user to enter a

single character (math operator). Outputting the result of math operations is accomplished

with a single TEBNF console output element.

Acceptable input values for the calculator are limited to integers or one of five mathoperator characters (+, -, *, /, =). TEBNF grammar elements are defined for each math

operator. Another grammar element was created to represent a signed 64-bit integer, pro-

viding a place to store integers as they are unmarshaled from input. The saved result value

is represented as a static variable within the number grammar element, serving as a place

to store the result of each math operation. An actions element was created for each of the

supported math operators. Each actions element adds, subtracts, multiplies, divides, or

sets the saved result using the last unmarshaled integer.The last TEBNF element included in the grammar describes the calculator as a state

machine utilizing the elements described above to define the execution path of the calculator.

The TEBNF grammar that implements this calculator is provided in listing A.3.

The calculator state machine is represented with five states, as shown in figure 4.3. The

first state of the machine is a special case entered only once at the beginning of execution.

This initial state is a necessary special case that sets the ongoing result value for subsequent

math operations. In this state the user is prompted to enter the initial number via the

console input element. The number grammar element unmarshals this input value as a

signed 64-bit integer and retains a copy for later use. The state table then executes an

actions element. This actions element sets the result static variable equal to the unmarshaled

value while the machine transitions to the next state.



26

Figure 4.3: State machine representing a calculator (includes pseudocode).

After setting the result value while transitioning to the first state of the main cycle,

the machine uses the console input element to prompt the user to enter a number again.

This number is also unmarshaled from input and stored as before. The state machine then

transitions to the next state.

This state uses the console input element to prompt the user to enter one of the

supported math operators. The state machine does this by moving through a series of

else-if states. Each state uses one of the math operator grammar elements to check the

same console input for a match to one of the supported math operations. A match occurs

when a grammar successfully unmarshals the input value. Upon matching an operator

the machine transitions to the print state and executes an actions element applying that

operator to the result and unmarshaled number, then assigns it to the result. As the print

state transitions to the first state of the main cycle, an actions element is executed that

writes the result to the console. The cycle then continues to repeat until the user enters a



27

’=’ to exit. Otherwise, if a ’=’ operator was matched, the result is written to the console

as the state machine transitions to the done state. If the input matched nothing, the state

machine transitions to the done state.Table 4.1: Sample input and output data for the calculator.

Operand Operator Result

5

10 + 15

95 - -80

110 + 30

10 / 3

111 * 333

33 - 300

20 / 155 * 75

0 = 75

Suppose this calculator is run using the sample data in table 4.1. The first line in the

table is the initial value 5. The second operand entered by the user is the number 10. The

’+’ operator is entered by the user, and the result displayed is 15. Entering the next value

of 95 followed by the operator ’-’ displays the number -80. This cycle continues until the

user enters the ’=’ operator.

4.2.2 NITF 2.1 UDP/IP File Transfer Client

The second test case is a file transfer client that receives NITF 2.1 files over UDP/IP

upon requesting them from a file server. The server tells the client when it is done sending

files so the client knows when to exit. This requires the tool to generate code for a file client

that:

1. Asks the server to send it a file by sending it a message over a UDP/IP output.

2. Receives the NITF 2.1 file from the server over a UDP/IP input.

3. Uses the file length data field in the NITF 2.1 file to determine that it has received

the entire file from the server as it was sent over UDP/IP.



28

4. Writes that file to disk using a file output.

5. Quits upon receiving a message from the server that says it is done sending files.

The NITF 2.1 file transfer client starts by prompting the user to enter the IP address

and port that will be used for sending and receiving data over UDP/IP. After entering this

information, the client immediately sends the message ”send” to the server to request that

it send a file. Once the server receives the ”send” message from the client, it reads a NITF

2.1 file from disk and begins sending it to the client over UDP/IP. As the client receives data

from the server, it continually checks to see if it has received the entire NITF 2.1 file from

the server. The client knows a file transfer is complete when it has received the amount of

data specified in the NITF files file length data field. The client then prompts the user to

enter a path including the file name specifying where the file will be written to disk. After

writing the file to disk, the client requests the next file from the server.

Allowed input values for the client are limited to a string for the IP address, an unsigned

integer for the port, and strings for file paths. All other input to the client comes from the

server through a TEBNF UDP/IP input element. A UDP/IP output element is used by the

client to send data to the server. One file output element is used by the client for writing

files to disk received from the server.

A grammar element was created to find NITF 2.1 files in data received over UDP/IP.

The beginning of each NITF 2.1 file can be found by searching for the byte sequence

”NITF02.10”, which is always found at the beginning of each file. The NITF 2.1 file header

has a data field containing the length of the file. This is leveraged by the TEBNF grammar

element to calculate the end offset of each file. A second grammar element was created for

finding the ”done” message in incoming data. A third grammar was created to define the

send message. The TEBNF grammar that implements this file transfer client is provided

in listing A.4.

The file transfer client can be represented as a state machine with three states, as

shown in figure 4.4. While transitioning from the first state to the second, the send message

is sent via the UDP/IP output element to the file server.



29

Figure 4.4: State machine representing a NITF 2.1 UDP/IP file transfer client (includespseudocode).

The second state looks at data received from the file server over the UDP/IP input

element to determine if it contains a NITF 2.1 file or the ”done” message. If the incoming

data contains a NITF file, the state machine writes it to the file output element while

transitioning to the back to the first state which tells the server to send the next file, which

prompts the user for a path to write the file to. The state machine then transitions back

to the first state which tells the server to send the next file. If the incoming data does not

contain a file, an ”else-if” case in this same state checks for the done message. If the done

message was received, the machine writes a message to the console output telling the user

that all files have been received, after which the machine transitions to the quit state and

exits.



30

In a hypothetical case, assume there are five NITF 2.1 files to be sent to a test case file

client. Each file has a specific file size, as shown in table 4.2.

Table 4.2: Sizes of sample NITF 2.1 files sent to the test client.

NITF 2.1 File Size (bytes)

828710

4021118

912294

3516054

998822

The NITF 2.1 file server would be started and begin listening on its UDP/IP socket for

the ”send” request from a client. The test case file client is then started and immediately

sends a ”send” request to the server, and a file transfer begins. The UDP server reports

the size in bytes of each file sent. The file client writes each file to local disk. The size in

bytes of each file received corresponds to the size the files sent by the server.

4.3 Results

C++ code was successfully generated by the TEBNF code generation tool for each of

the test cases. The TEBNF code generation tool provides useful output when it generates

code from a supplied input grammar. The tool displays information indicating when each

stage of the code generation process finishes. Status for the generation stage displays the

classes generated for each element. After the generation stage finishes and all of the code

has been generated, the tool reports success and the number of element files generated.

Console output from generating code for the calculator test case and the NITF 2.1 file

client test case is shown in figures 4.5 and 4.6.

A CMakeLists.txt file was correctly generated by the TEBNF code generation tool for

each test case, and CMake version 3.0.2 was run using those CMakeLists files to generate

Microsoft Visual Studio 2013 solution and project files. The project files generated by

CMake for both test cases were successfully opened and built in Microsoft Visual Studio

2013.



31

Figure 4.5: TEBNF code generation tool output for the calculator test case.

Figure 4.6: TEBNF code generation tool output for the UDP NITF 2.1 file client test case.



32

4.3.1 Calculator

The calculator test case executable was run using the test data from table 4.1. As

expected, the input and output of the calculator (figure 4.7) matched what is defined intable 4.1. This means the behavior of the calculator test case matches the behavior defined

in the TEBNF grammar it was generated from.

Figure 4.7: Running the calculator test case.

4.3.2 NITF 2.1 File Client

The NITF 2.1 file client test case executable was run using the five sample NITF 2.1

files whose sizes are found in table 4.2. A NITF 2.1 file server was created that listened

for client ”send” requests over a UDP/IP socket on localhost port 10042. This server was



33

started, then the test case client was started and configured to send requests and receive

file transfers on localhost port 10042. The expected outcome occurred, with the server

successfully sending five files as indicated by the first five send messages output by theserver in figure 4.8. This was followed by the last four byte ”done” message is sent to

notify the client it was done sending. The same five files were received by the client which

prompted for a file name to save each file as shown in figure 4.9. After saving the files, the

client exited because the server had sent the ”done” message.

Figure 4.8: Running the file server for the NITF 2.1 file client test case.

Figure 4.9: Running the NITF 2.1 file client test case.

The sizes of the files received were an exact match to the file sizes listed in table 4.2,

as the output of the server and client test case executables show in figures 4.8 and 4.9,

respectively. This verifies that the behavior of the NITF 2.1 client test case matches the



34

behavior defined in the TEBNF grammar it was generated from.

4.4 Comparing C++ to TEBNF

A non-generated C++ implementation of each test case was also written. The state

machines described in chapter 3 for both test cases served as the starting point of each im-

plementation. Using the same design as the generated test cases, a baseline was established

for more direct comparisons.

4.4.1 Test Case Experience

The non-generated C++ implementation of both test cases was verified by following

the procedures outlined in subsections 4.3.1 and 4.3.2. Each was verified to have the same

behavior and output as the test cases generated from TEBNF.

The TEBNF grammar for each test case was significantly shorter than its manual C++

counterpart. The non-generated C++ implementations had fewer lines of code than their

counterparts generated from TEBNF. These results were expected because the TEBNF

code generation tool does not yet perform optimization.

Writing code to handle I/O and input verification took longer in C++ than TEBNF,

which only requires a declaration to setup any of the supported I/O methods. Though this

inconvenience was observed while writing the calculator test case in C++, it was far more

noticeable while writing the client. The client uses three different I/O methods whereas

the calculator only uses one. It is evident that a more complicated application can be more

convenient to write in TEBNF than C++. The time it took to implement both test cases

in C++ and TEBNF are listed in tables 4.3 and 4.4.

Table 4.3: Calculator Test Case Implementation Times (measured in hours).

Item to Implement C++ TEBNF

Console I/O 1 0.1

State Machine 1 0.5

Total 2 0.6



35

Table 4.4: NITF File Client Test Case Implementation Times (measured in hours).

Item to Implement C++ TEBNF

Console I/O 1 0.1

File I/O 1 0.1UDP/IP I/O 4 0.5

State Machine 4 1

Total 10 1.7

Another disparity in coding effort was observed while writing the NITF file parser in

C++. Creating NITF parsing code in C++ took a lot more effort than in TEBNF. The

non-generated implementation required calculation of an offset whenever it was necessary

to jump to a specific location in the NITF file. Calculation of this offset requires knowledge

beforehand of the size and position of each field. On top of this, fields in the NITF standard

can vary in length and be of different formats, as explained in section 4.1. Compare this

effort to the TEBNF implementation found in listing A.4 which made it possible to write a

parser resembling the actual specification document itself.

4.4.2 Test Case Analysis

A widely accepted metric for measuring code complexity is McCabe’s Cyclomatic Com-

plexity [31]. This metric was derived from graph theory by McCabe to measure the number

of control flow paths in a code module [31]. McCabe’s Cyclomatic Complexity is e − n + 2,

where e is the number edges in the control flow graph and n is the number of nodes [31].

An example of its use proposed by [32] employs McCabe’s Cyclomatic Complexity metric

to effectively predict defects in code modules.

The Lizard code complexity analyzer [33] tool was used to analyze each test case. The

analysis results from the tool contained several valuable metrics, including the average cy-

clomatic complexity number (CCN) [33] of every function in the test cases. The analysis

results of each test case are provided in tables 4.5 and 4.6. Values related to the interquar-

tile range (IQR) were calculated from the analysis data of each test case, including the



36

lower (first) quartile (Q1) corresponding to the 25th percentile, the higher quartile (Q3)

corresponding to the 75th percentile, and the median. In order to focus on the most rele-

vant analysis data, the upper limit (U CCN ) and lower limit (LCCN ) of each data set wascalculated as Q3 + 1.5(IQR) and Q1 − 1.5(IQR), respectively.

Table 4.5: Analysis results for the NITF file client test cases.

Test Case Avg. CCN UCCN Avg. Token Count Func. Count

Generated 2.00 3 54.49 136

Non-generated 3.38 6 101.81 16

Table 4.6: Analysis results for the calculator test cases.

Test Case Avg. CCN UCCN Avg. Token Count Func. Count

Generated 1.89 3 49.50 135

Non-generated 3.42 7 81.42 12

The data in tables 4.5 and 4.6 show differences between the generated and non-

generated test cases. Compared to the non-generated C++ implementations, the test cases

generated from TEBNF have (1) more functions, (2) a lower average number of tokens per

function, and (3) a lower average CCN per function.

Table 4.7: P-values calculated from the analysis data of each test case.

Test Case PCCN PTokenCount

File Client 0.041236 0.007916

Calculator 0.017212 0.043956

Tables 4.5 and 4.6 illustrate the similarity in CCN values measured by the analysis tool

for generated and non-generated code. According to [34], CCN values ≤ 10 fit inside the low

risk category. Given that (1) the non-generated test cases were built specifically to mimic

the behavior of the generated test cases, (2) the average CCN values for all of the test cases

fall into the low risk category, and (3) are close to one another in value, it can be concluded

that the non-generated test cases are comparable to their generated counterparts.



37

Analysis data for the generated and non-generated test cases also show that they are

provably different despite their similarities. Table 4.7 contains the p-values calculated from

the data sets of each test case. All of the p-values are significant at p < 0.05, supportingthe conclusion that the data sets (the test cases) are different from one another.

Figure 4.10: Function token count for the file client test cases (omits outliers).

Tables 4.5 and 4.6 show that the generated code has a lower average token count per

function than the non-generated code (figures 4.10 and 4.11). The first conclusion to be

drawn from the data as plotted in these figures is that the generated code is splitting larger

computational problems into smaller tasks. This conclusion can be drawn based on the fact

that lower median values within smaller data spreads point to a greater number of functions

with lower token counts.

The lower average CCN for generated code (tables 4.5 and 4.6) is the strongest indicator

that larger computational problems are being broken down into smaller ones. Figures 4.12

and 4.13 show that the median CCN for the generated test cases are at or below the lowest

values measured for the non-generated test cases. Given that [32] was able to use CCN

values to detect the presence of defects in code, and that the generated code has shorter in-

terquartile ranges paired with medians at or below the lowest quartiles of the non-generated



38

Figure 4.11: Function token count for the calculator test cases (omits outliers).

code, it can be concluded that a greater number of functions in the code generated from

TEBNF have a lower number of defects than the non-generated code. This correlation of

fewer defects per function in generated code means it is likely that as applications increase in

size and scope, the number of defects will remain lower than application code not generated

from TEBNF.

More than one participant is needed in order to conduct tests that are statistically

significant. Nevertheless, the trends observed in the analysis are interesting and worthy of

future investigation.



39

Figure 4.12: Function CCN for the file client test cases (omits outliers).

Figure 4.13: Function CCN for the calculator test cases (omits outliers).



40

CHAPTER 5

Future Work

Code generation using TEBNF creates several possibilities for future work. As men-

tioned in subsection 4.4, the TEBNF code generation tool can likely be optimized to produce

better code. Another possibility is to add more I/O methods to TEBNF and the TEBNF

code generation tool. There are many different ways to receive input and send output be-

yond what is currently supported by TEBNF and the tool. The last improvement covered

in this chapter is to add support into the TEBNF code generation tool to automatically

generate graphical user interfaces at runtime.

5.1 Optimizing the TEBNF Code Generation Tool

Many different compiler optimizations could be applied to the code generated by the

TEBNF code generation tool. A likely side-effect of optimization is a reduction in the

number of lines of code produced by the tool.

Some of the optimizations described by [35] could be applied to code generated by

the tool. The C++ compiler that builds the generated code will likely apply many if not

all of the optimizations to the actions code. The side-effects of poorly defined grammars

is another factor to take into consideration. For non-actions code, other optimizations

described by [35] could be applied including instruction combining, expression simplification,

branch elimination, and dead code elimination.

Instruction combining is the process of combining two or more instructions into one.

Expression simplification is the process of replacing expressions with more efficient ones.

Both of these optmizations could be applied to marshaling/unmarshaling code generated

from grammar elements that are poorly defined by the user.

Branch elimination is an optimization that removes unnecessary branches to other



41

branches. A form of branch elimination could be applied to generated state machine code

when states transition to other states unnecessarily.

In some cases, the code generation tool might generate empty actions function(s).When this happens, dead code elimination could be employed to avoid generating them or

calling them.

5.2 New I/O Methods

A wide range of I/O methods could be added to TEBNF. This could include adding

support for interacting with relational databases. Support for commonly used database

query languages like MySQL and PostgreSQL would allow TEBNF to interact with a wide

range of systems that utilize databases.

The TCP/IP protocol is widely used and could be adapted into a set of I/O methods.

Due to the fact that TCP/IP is connection-oriented, I/O methods would be needed that

support both server and client I/O.

5.3 Runtime Graphical User Interface Generation

Support for automatic GUI generation could be integrated into the TEBNF code gener-

ation tool. Console user interfaces for generated applications would be replaced with a web-

based GUI that communicates with the generated application. The code generation process

would need to employ a runtime data mining technique called software mining [36,37], which

is a form of data mining that focuses on the inspection of static and runtime software in-

formation characteristics. Some examples of static characteristics include source code files

and database schemas, while runtime characteristics include polymorphic data-types, data

values, and the reading and modification of an instantiated objects current state.

Software mining [36, 37] would probably take place during all phases of the TEBNF

code generation process. Static characteristics could be gleaned from tokens as they are

read from the input grammar and used to identify runtime characteristics in the parse

tree. These characteristics would then be used to map element and subelement objects

to their appropriate GUI controls during the syntactic analysis phase. Once this mapping



42

is complete, the control(s) of the GUI are determined and the appropriate code can be

generated.



43

CHAPTER 6

Conclusion

Errors and inconsistencies between code components can be very costly in a software

project. It has been shown that the use of code generation tools can simplify and possibly

reduce these errors and inconsistencies. Many tools generate only one of the many compo-

nents found in a typical application. This report explored the benefits of using TEBNF as

the sole input specification for the TEBNF code generation tool.

The TEBNF input specification integrates lexical analysis and parsing input specifi-

cations into one format. TEBNF also simplifies the implementation of applications that

receive, parse, process, and output data through various I/O methods. Many complexities

involved with designing and coding applications can be abstracted using TEBNF.

The prototype TEBNF code generation tool demonstrated the effectiveness of TEBNF

as an input specification. The tool generates C++ classes that handles I/O over console, file,

or UDP/IP, thereby abstracting the complexities of their usage through TEBNF. Parsing

documented formats like the NITF 2.1 format is greatly simplified using TEBNF.



44

REFERENCES

[1] B. Livshits and T. Zimmermann, “Dynamine: Finding common error patterns by

mining software revision histories,” in Proceedings of the 10th European Software

Engineering Conference Held Jointly with 13th ACM SIGSOFT International

Symposium on Foundations of Software Engineering , ser. ESEC/FSE-13, pp.

296–305. New York, NY, USA: ACM, 2005 [Online]. Available: http:

//doi.acm.org/10.1145/1081706.1081754.

[2] N. G. Leveson, “Software engineering: Stretching the limits of complexity,”

Commun. ACM , vol. 40, no. 2, pp. 129–131, Feb. 1997 [Online]. Available:

http://doi.acm.org/10.1145/253671.253754 .

[3] C. Smidts, “A stochastic model of human errors in software development: impact of re-

pair times,” in Software Reliability Engineering, 1999. Proceedings. 10th International

Symposium on , pp. 94–103. IEEE, 1999.

[4] T. Nakajo and H. Kume, “A case history analysis of software error cause-effect rela-

tionships,” Software Engineering, IEEE Transactions on , vol. 17, no. 8, pp. 830–838,

1991.

[5] V. Sinha, Using diagrammatic explorations to understand code . Ph.D. dissertation,

Massachusetts Institute of Technology, 2008.

[6] B. Boehm and V. R. Basili, “Software defect reduction top 10 list,” Foundations of

empirical software engineering: the legacy of Victor R. Basili , p. 426, 2005.

[7] K. P. Boysen and G. T. Leavens, “Automatically generating consistent graphical user

interfaces using a parser generator,” 2005.



45

[8] I. Groher and S. Schulze, “Generating aspect code from uml models,” in The 4th AOSD

Modeling With UML Workshop, 2003.

[9] R. Brooker, I. MacCallum, D. Morris, and J. Rohl, “The compiler compiler,” Annual

review in automatic programming , vol. 3, pp. 229–275, 1963.

[10] D. Spinellis, “Another level of indirection,” Beautiful code: leading programmers explain

how they think , pp. 279–291, 2007.

[11] S. C. Johnson, Yacc: Yet another compiler-compiler . Bell Laboratories Murray Hill,

NJ, 1975, vol. 32.

[12] C. Demetrescu, I. Finocchi, and A. Ribichini, “Reactive imperative programming with

dataflow constraints,” ACM SIGPLAN Notices , vol. 46, no. 10, pp. 407–426, 2011.

[13] J. W. Lloyd, “Practical advantages of declarative programming,” in Joint Conference

on Declarative Programming, GULP-PRODE , vol. 94, p. 94, 1994.

[14] D. K. Gifford and J. M. Lucassen, “Integrating functional and imperative

programming,” in Proceedings of the 1986 ACM Conference on LISP and Functional

Programming , ser. LFP ’86, pp. 28–38. New York, NY, USA: ACM, 1986 [Online].

Available: http://doi.acm.org/10.1145/319838.319848.

[15] M. E. Lesk and E. Schmidt, “Lex: A lexical analyzer generator,” 1975.

[16] T. Niemann, “Lex & yacc tutorial” [Online]. Available: http://epaperpress.com/

lexandyacc.

[17] J. Aycock, “Why bison is becoming extinct,” Crossroads , vol. 7, no. 5, pp. 3–3, 2001.

[18] A. Demaille, J. E. Denny, and P. Eggert, “Bison - gnu parser generator,” 2012

[Online]. Available: http://www.gnu.org/software/bison.

[19] T. J. Parr and R. W. Quong, “Antlr: A predicated-ll (k) parser generator,” Software:

Practice and Experience , vol. 25, no. 7, pp. 789–810, 1995.



46

[20] X. Chen and D. Pager, “Full lr (1) parser generator hyacc and study on the performance

of lr (1) algorithms,” in Proceedings of The Fourth International C* Conference on

Computer Science and Software Engineering , pp. 83–92. ACM, 2011.

[21] X. Chen, “Tutorial: Lr(1) parser generator hyacc,” 2014 [Online]. Available:

http://www2.hawaii.edu/~chenx/hyacc_tut.

[22] Free Software Foundation, Inc., “5.6 reduce/reduce conflicts,” 2014 [On-

line]. Available: http://www.gnu.org/software/bison/manual/html_node/Reduce_

002fReduce.html.

[23] L. Quesada, F. Berzal, and J.-C. Cubero, “A language specification tool for model-based parsing,” in Intelligent Data Engineering and Automated Learning-IDEAL 2011.

Springer, 2011, pp. 50–57.

[24] L. Quesada, F. Berzal, and J.-C. Cubero, “A model-driven parser generator, from

abstract syntax trees to abstract syntax graphs,” arXiv preprint arXiv:1202.6593 , 2012.

[25] The ModelCC Development Team. (2015) Modelcc [Online]. Available: http:

//www.modelcc.org.

[26] The NITFS Technical Board (NTB), “Department of defense interface standard

national imagery transmission format version 2.1 for the national imagery transmission

format standard,” October 2006 [Online]. Available: http://www.gwg.nga.mil/ntb/

baseline/docs/2500c.

[27] Geopspatial Intelligence Statndards Working Group, “Geopspatial intelligence

statndards working group,” 2015 [Online]. Available: http://www.gwg.nga.mil/ntb/

baseline/docs/2500c.

[28] Space Dynamics Laboratory. (2015) C4isr deployed programs [Online]. Available:

www.sdl.usu.edu/capabilities/c4isr.

[29] BAE Systems. (2010, March) Socet services [Online]. Available: http://www.

geospatialexploitationproducts.com/content/products/socet-services .



47

[30] The NITFS Technical Board (NTB), “Commercial support data extensions (csde) ver-

sion 1.0,” August 2011.

[31] J. Cardoso, J. Mendling, G. Neumann, and H. A. Reijers, “A discourse on complexity of

process models,” in Business process management workshops , pp. 117–128. Springer,

2006.

[32] H. Zhang, X. Zhang, and M. Gu, “Predicting defective software components from code

complexity measures,” in Dependable Computing, 2007. PRDC 2007. 13th Pacific Rim

International Symposium on , pp. 93–96. IEEE, 2007.

[33] J. Xie. (2015) Lizard code complexity analyzer [Online]. Available: http:

//www.lizard.ws.

[34] R. Charney. (2005, January) Programming tools: Code complexity metrics [Online].

Available: http://www.linuxjournal.com/article/8035.

[35] Nullstone Corporation. (2012) Compiler optimizations [Online]. Available: http:

//www.compileroptimizations.com.

[36] K. Richard and S. Robert, “Application of software mining to automatic user interface

generation,” 2008.

[37] R. Kennard and J. Leaney, “Towards a general purpose architecture for ui generation,”

Journal of Systems and Software , vol. 83, no. 10, pp. 1896–1906, 2010.

[38] D. Lee and M. Yannakakis, “Principles and methods of testing finite state machines-a

survey,” Proceedings of the IEEE , vol. 84, no. 8, pp. 1090–1123, 1996.

[39] V. Kahlon, F. Ivani, and A. Gupta, “Reasoning about threads communicating via

locks,” in Computer Aided Verification , ser. Lecture Notes in Computer Science,

K. Etessami and S. Rajamani, Eds. Springer Berlin Heidelberg, 2005, vol. 3576, pp.

505–518 [Online]. Available: http://dx.doi.org/10.1007/11513988_49.



48

[40] S. Kepser, “A simple proof for the turing-completeness of xslt and xquery.” in Extreme

Markup Languages R. Citeseer, 2004.

[41] E. F. Moore, “A simplified universal turing machine,” in Proceedings of the 1952 ACM

National Meeting (Toronto), ser. ACM ’52, pp. 50–54. New York, NY, USA: ACM,

1952 [Online]. Available: http://doi.acm.org/10.1145/800259.808993.

[42] Y. Rogozhin, “Small universal turing machines,” Theoretical Computer Science , vol.

168, no. 2, pp. 215–240, 1996.

[43] C. E. Shannon, “A universal turing machine with two internal states,” Automata stud-

ies , vol. 34, pp. 157–165, 1957.

[44] T. Neary, Small universal Turing machines . Ph.D. dissertation, Citeseer, 2008.

[45] K. P. Kumar, B. Kiranagi, and C. Bagewadi, “Turing machine operation-a checks and

balances model,” 2012.



49

APPENDICES



50

Appendix A

TEBNF Syntax and Usage

A.1 TEBNF Grammar Syntax

Like standard EBNF, Typed EBNF (TEBNF) is used to express a context-free grammar

that consists of non-terminal production rules and terminal symbols that may or may not

have a type. The typing of terminal symbols makes it possible to describe exactly how

the input is recognized, yet preserves the simple syntax of EBNF. TEBNF provides a set

of input and output methods, grammar rules, and actions, tied together to a set of states

using a state transition table. It allows users to:

• Describe how the input data will be sent to the generated application (UDP, file, or

in-memory).

• Describe when and how responses will be sent to the sender as needed by the specified

protocol.

• Describe what the input data sent to the generated application will look like. This

combines the traditional specification of lexical and syntactic analysis.

• Describe how the data will be processed by the generated application after the lexical

and syntactic analysis stage.

• Describe the expected output of the application after it has been processed.

• Leverage existing knowledge of EBNF grammars rather than require the learning of

a completely new format.

A.2 TEBNF Structure



51

The structure of TEBNF is composed of a set of elements. There are five kinds of

elements in TEBNF: input methods, output methods, grammar sections, actions, and state

transition tables. Collectively, these elements and their contents are known as a TEBNFgrammar. Each TEBNF grammar requires at least one or more of each kind of element.

A.3 TEBNF Elements and Subelements

TEBNF elements (table A.1) contain one or more subelements. Subelements within

TEBNF consist of production rules, typed terminals, non-typed terminals, literal values,

operators, states, and static variables.

Table A.1: TEBNF Elements.

Element Description

INPUT @name specifier Input element of a specific input specifier.

OUTPUT @name specifier Output of a specific output specifier.

GRAMMAR @name Start of grammar section.

ACTIONS @name Contains one or more actions.

STATES @name Start of state transition table section.

END End of element section.

Subelements are declared where they are first used. TEBNF infers what a subelement

is by the way it is used. Each element requires a name be given to it. Element names

are case-sensitive, can only contain visible characters, and are always prefixed by the ’@’

character as shown in example A.1:

GRAMMAR @ packet (A.1)

A.4 TEBNF Scoping RulesElements and subelements are directly accessible at the scope they are created, sim-

ilar to the C++ language. Elements can be declared within the scope of other elements.

Subelements exist within the scope of their respective elements. Each type of element has

specific types of subelements that can be declared only within the scope of that type of

element.



52

A.5 TEBNF Types

Association of types (table A.2) with terminal symbols offers an extra level of precision

when matching specific patterns found in input data. These symbols are called typedterminals. TEBNF can infer the type of a terminal symbol because each type has two

important components.

Table A.2: TEBNF symbols, production rules, non-typed terminals, and typed terminals.

Type Description Example

symbol Production rule alphabet(non-terminal)

symbol Non-typed terminal ’a’, ’b’, ’c’, 0xAB,(1 or more literals) ”String literal”

#comment Single-line comment #This is a comment.

## Multi-line comment ## This is a really,comment really, really long## multi-line comment. ##

$var Static variable $myValue = 42 ;

type Typed terminal CHAR{0,} ;

BYTE Represents a My Kb = BYTE{1024} ;8-bit byte

INT X Represents an integer My Int = INT 64;of size X bits

INT STR X Represents an integer fileLength = UNSIGNED

as a string of size INT STR 96;X bits

CHAR Represents a 8-bit MyChar = CHAR ;character MyString = CHAR{128} ;

FLOAT X Floating-point number MyFloat = FLOAT 64 ;of size X bits

UNSIGNED Unsigned number UNSIGNED INT 16{2} ;terminal

First, each type has an inherent size in bits or bytes. Whenever data is matched to aspecific type the size is immediately known. Because the size of each type is known before-

hand, the offset of the next symbol is immediately known. Second, each type inherently

identifies how it should be used in a given context. TEBNF types are expressed by assigning

a type to a terminal production symbol, as shown in example A.2:



53

payload = IN T 64 (A.2)

TEBNF also has static variables, which are a kind of subelement that can assume the

type of whatever is assigned to them. They are static because they have global visibility −i.e.

they can be declared anywhere and accessed from any other element, regardless of where

they were declared. Static variables are always prefixed by the ’$’ character (example A.3):

$ payloads = payload (A.3)

A static variable can become typed when a typed terminal is assigned to it. Productionrules, terminals, and literals can also be assigned to static variables. This capability makes

static variables the most flexible subelement available in TEBNF.

A.6 TEBNF Operators

There are three categories of operators in TEBNF. First are production rule operators

(table A.3), which are used to build production rules. Second are arithmetic and comparison

operators (table A.4). Arithmetic and comparison operators are a key difference between

TEBNF and standard EBNF, which does not have them. Third is inter-element operators

that perform operations on and/or between elements (table A.5). The only operator that

falls in this category is the ”AS” operator.

A.7 TEBNF Grammar Elements

Grammar elements contain production rules. Production rules are subelements of their

containing grammar element. Terminal and non-terminal symbols have no prefix character,

are made up only of alpha-numeric characters, and are case-sensitive. Examples of grammar

elements are found in listings A.3 and A.4.

Production rule subelements can only be declared within TEBNF grammar elements.

Production rule subelements can be referred to directly using the containing elements name

along with the dot operator followed by the subelement (symbol) name.



54

Table A.3: TEBNF production rule operators.

Operator Description Example

= Definitioni.e. alphaNumericCharacter

is defined as = letter | number ;= Grammar total = fileLength ;

length

, Concatenation twoLetters =letter , letter ;

; Termination twoVals =val1, val2 ;

| Or letter | number

| State Transition Start | packet |table delimiter

. Element member @packet.payloadSize

access{min,max} Occurrence range letter{0,}

len Exact range, letter{26}where len isthe min andmax.

[ ] Array subscript myChar[5]

Table A.4: TEBNF arithmetic and comparison operators.


== Equality $X == $Y

= Assign $x = 45 ;

+ Add $x = $y + $z

- Subtract $x = $g - $h

+= Addition assignment

++ Post-increment

-= Subtraction assignment

– Post-decrement

* Multiply $x = $y * 92

/ Divide X = 1 + (y 2)

@ packet.payloadSize (A.4)



55

Table A.5: TEBNF inter-element operator.


AS Defines a new OUTPUT @UDP Out AS

element to be @UDP In ENDthe same as anexisting element

This format is similar to the way object-oriented languages like C++ and Java provide

access to object members, though it is important to note that TEBNF itself is not object-

oriented. The dot operator permits other grammar elements and state transition table

elements to have access to a given production rule subelement.

A.8 TEBNF State Transition Table Elements

The function of any given application can be described as a finite state machine com-

posed of a finite set of states [38]. The machine is in one state at a time, and movement

from one state to another is triggered by specific inputs [38]. Because applications can be

asynchronousi.e. performing work on multiple threads, it is possible to have a finite state

machine on each thread of execution. Concurrent behavior is becoming common in todays

software [39].

Representation of a finite state machine in TEBNF is done using a state transition

table that can access any TEBNF input, output, grammar symbol, or static variable. This

state transition table ties the various elements and subelements of a TEBNF grammar into

a coherent description of a single thread of execution. This means that TEBNF makes it

possible to represent multiple threads of execution using multiple state transition tables.

Each state transition table consists of six columns delimited by the pipe operator. Columns

flow from left to right in the following order:

1. State

2. Input or condition



56

3. Input Method

4. Next State

5. Output or action

6. Output Method

The first column is the current state. Each state identifies where to go in the finite

state machine when the right conditions are met. The name of each state must be unique

within the scope of the state transition table it belongs to. The second column specifies

a condition that must be met to move to the next state. The condition can be but is not

limited to a logical statement such as checking the value of a static variable or checking if the

input received via a specific input method (specified in the third column) matches a given

grammar production. The fourth column is the state to transition to upon satisfaction of

the input or condition. The fifth column identifies the output to send or action to execute

while transitioning to this next state. If the fifth column is an output instead of an action,

the sixth column is the output method that the output will be sent through.

A.9 TEBNF Actions Elements

Actions elements contain a list of actions to be executed in the order they are listed.

Actions elements can only be used within a state transition table element row. Since they

are not required in a TEBNF grammar, actions can be listed directly inside a state transition

table. Actions elements are similar to macros in C++, and can accept zero or more comma-

delimited parameters, as shown in the example taken from appendix listing B.1. Actions

elements can also contain calls to C or C++ functions, making them one of the most

powerful elements available in TEBNF.

Listing A.1: A TEBNF Actions element.

ACTIONS @r ig ht ( va l )

@tape . elem [ @tape . $i ] = va l ;

@tape . $i ++;



57

END

A.10 TEBNF Input and Output Method Elements

Table A.6 shows all of the possible inputs and outputs available in TEBNF. Each input

method describes a way for the generated application to receive input. No other settings

information is required by an input method because the generated application will provide

a way for the user of to give it the needed information through the user interface. Console

elements allow prompt s to be defined within their scope, which tie a prompt string to a

typed value to be input by a user. Listing A.2 shows a console input element with two

prompt values. The first prompt value will display the prompt ”Number: ” and treat the

input provided through the console as a signed 64-bit integer. The second prompt value

will display the prompt ”Op: ” and treat input as a signed 8-bit integer.

Listing A.2: A TEBNF Console Input element.

INPUT @ConsIn = CONSOLE

num = INT 64 = ”Number : ”;

op = IN T 8 = ”Op : ” ;

END

Table A.6: TEBNF input/output specifiers.

Input/Output Types Description

UDP IP Read/write input from/to from UDP.

FILE Read/write input from/to a file.

CONSOLE Read/write input from/to command line.

Table A.6 shows all of the possible outputs available in TEBNF. Output methodsdescribe a way for the generation application to send or display output.

Networking inputs and outputs in TEBNF are based on UDP/IP. There are many

other networking protocols, but many of them are transport and session layer protocols.

Thus, describing other protocols can be done by using a state transition table to describe

the protocol that will work over UDP/IP.



58

Custom input/output (I/O) is accomplished by using the state transition table to

specify which grammar is used when receiving data as input or for sending data as output.

In the case of a custom input, a GUI is generated that will ask for input to match thedescribed grammar. In the case of output, data will be sent to the desired output following

the format described in the provided grammar.

A.11 TEBNF Example 1: Calculator

A TEBNF example is shown in listing A.3 describing a calculator that supports addi-

tion, subtraction, multiplication, and division of integers.

Listing A.3: TEBNF grammar describing a calculator.

INPUT @ConsIn = CONSOLE

num = INT 64 = ”Number : ”;

op = IN T 8 = ”Op : ” ;

END

OUTPUT @ConsOut = CONSOLE END

GRAMMAR @Number

num = INT 64 ;

$ r e s u l t = I NT 6 4 ;

END

GRAMMAR @addOp op = ’ + ’ ; END

GRAMMAR @subOp op = ’ − ’ ; END

GRAMMAR @mulOp op = ’ ∗ ’ ; END

GRAMMAR @divOp op = ’ / ’ ; END

GRAMMAR @eqOp op = ’ = ’ ; END

ACTIONS @Assi gn @Number . $r e s u l t = @Number . num; END;

ACTIONS @AddAssign @Number . $ r e s u l t += @Number. num ; END



60

and ready to receive more file packets.

Listing A.4: TEBNF grammar describing a UDP NITF 2.1 file transfer client.

INPUT @UDP In = UDP IP END # UDP/IP in pu t so ck et .

OUTPUT @UDP Out AS @UDP In END

OUTPUT @F il e Ou t = FILE END

OUTPUT @C on so le Ou t = CONSOLE END

GRAMMAR @Nitf

# D e s c ri b e t he f i l e t o r e c e i v e .

syncWord = ’N’ , ’ I ’ , ’ T’ , ’ F’ , ’ 0 ’ , ’ 2 ’ , ’ . ’ , ’ 1 ’ , ’ 0 ’ ;

sk ip 1 = BYTE{ 3 3 3 } ; # FL o f f s e t i s 3 42 .

f i l e L e n g t h = UNSIGNED I NT ST R 96 ; # NITF f i l e l e n g t h f i e l d = 1 2 b y t e s .

sk ip 2 = BYTE{ , } ;

h e a d e r = s ync Wo rd , s k i p 1 , f i l e L e n g t h ;

f i l e = h e a de r , s k i p 2 ;

= f i l e Le n g th ; # O v e ra ll s i z e o f t h i s f i l e .

END

GRAMMAR @Send s t ar t Se n d = ” se nd ” ; END

GRAMMAR @Status recvMsg = ” All f i l e s re ce iv ed f rom ser ver , ex it in g . ” ; END

GRAMMAR @Done do ne = ” do ne ” ; END

STATES @F il e Tr an sf er

#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

# S t a te | I n pu t o r | Input | Next | Output or | Output |

# | C o n d i t i o n | Method | S t a t e | Action | Method |

#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

I n i t | | | S t a r t | @Send . sta rtS end | @UDP Out ;

S t a r t | @Nitf | @UDP In | I n i t | @Ni tf . f i l e | @File Out ;

| @Done | @UDP In | Quit | @Status . recvMsg | @Console Out ;

Quit | | | | | ;

END



61

Appendix B

TEBNF Turing Completeness Proof

B.1 Turing Completeness

Turing machines provide the most powerful computational model known to exist [40].

The Turing completeness of a programming language is important because anything com-

putable can be computed using that language [40].

A Turing machine that can perform any operation of any other ordinary Turing machine

is known as a universal Turing machine [41]. Therefore, a programming language that can

simulate a universal Turing machine is Turing complete.

B.2 Proof

Multiple examples of universal Turing machines have been presented [42–44]. A uni-

versal Turing machine that simulates a 2-tag system can be implemented with relativelyfew states and symbols. Tag systems simulate the game of tag, where the goal is to see if

it will ever terminate by reaching the end of the sequence of symbols.

Rogozhin proved the universality of several classes of tag systems including a 4-state

6-symbol universal Turing machine called UTM(4,6) [42]. The tag system simulated by

UTM(4,6) consists of 22 commands and is the lowest known number of commands for a

universal Turing machine [42]. The machine is comprised of:

• A set of states: q 1, q 2, q 3, q 4

• Input symbols: 0 (blank), b, x, y, c (mark)

• Tape symbols

• An initial state: q 1



62

• A transition function (executed in order):

1. Print (write) a symbol.

2. Move head left L, right R, none −.

3. Go to the next state.

UTM(4,6) [42] is described as a list of 5-tuples (table B.1). These 5-tuples are formatted

in order of evaluation using the Turing/Davis convention (q i, S j , S k, left L, right R, none

−, q m) [45]:

1. Current state q i.

2. Scanned symbol S j.

3. Print symbol S k.

4. Move tape head left L, right R, none −.

5. Next state q m.

Since Rogozhin proved the universality of UTM(4,6) [42], an implementation of this

machine in TEBNF is provided in listing B.1 to demonstrate the capabilities and Turing

Completeness of TEBNF.

Table B.1: List of 5-tuples describing UTM(4,6).

q 11xLq 1 q 210Rq 3 q 311Rq 3 q 410Rq 4

q 1byRq 1 q 2byLq 3 q 3bxRq 4 q 4bcLq 2

q 1ybLq 1 q 2yxRq 2 q 3ybRq 3 q 4yxRq 4

q 1xbRq 1 q 2xyLq 2 q 3x− q 4x−q 10xLq 1 q 201Lq 2 q 30cRq 1 q 40cLq 2

q 1c0Rq 4 q 2cbRq 2 q 3c1Rq 1 q 4cbRq 4

B.3 Stepping Through the Machine

The tag system simulated by UTM(4,6) [42] has three stages. The 5-tuples each stage

refers to are shown in table B.1. The corresponding TEBNF implementation is given in



63

listing B.1. The TEBNF implementation starts on the first line of the transition table at

the begin state. The begin state reads the contents of a file into the array @tape.elements

that functions as the tape.Stage 1. The first stage is complete when the head of the machine moves right and

meets the mark. The mark is deleted and the first stage ends at q 1c0Rq 4. The end of this

stage corresponds to the seventh line in the TEBNF state table.

Stage 2. The machine executes a series of jumps to arrive at q 40cLq 2. If the head

reaches pair xb, the machine jumps to q 2byLq 3 and halts at q 3x−. Otherwise, the second

stage ends upon reaching pair 1b.

Stage 3. The machine jumps to q 3ybRq 3, then q 311Rq 3. Upon moving to the right,the machine head reaches c (the mark), deletes it, and jumps to q 3c1Rq 1 to begin a new

cycle.

Upon reaching one of the halt states, the TEBNF implementation of the machine writes

the contents of the tape to a text file.

Listing B.1: Rogozhin’s UTM(4,6) implemented in TEBNF.

# TEBNF impl ement atio n of UTM(4 ,6 ) (a 4− s t a t e 6−symbol

# u n i v e r s a l T ur in g ma ch in e ) p r e s en t e d by Y. R og oz hi n i n

# ” S ma ll u n i v e r s a l T ur in g m ac hi ne s ” , 1 9 9 6.

INPUT @TpIn = FILE END; # Fo r r e a d i n g t ap e f ro m f i l e .

OUTPUT @TpOut AS @Tape In END; # For wr it in g to tap e f i l e .

GRAMMAR @tape

el em = BYTE{ , } ; # A rr ay w i th no min o r max number o f e l e m e n ts .

$ i = 0 ; # Index f o r moving l e f t o r r i g h t on tape .

END

ACTIONS @r ig ht ( va l )


@tape . $i ++;



64

END

ACTIONS @l ef t ( va l )


@tape . $i −− ;

END

# Read t h e t a p e and r un t h e u n i v e r s a l T ur i ng m ac hi ne . Movi ng r i g h t

# and l e f t on t he t ap e ( r e p r e s e n t e d by t he @tape . e le m a r ra y ) i s

# r e p r e s e n t e d by i n c re m en t i n g and d e cr e me nt i ng $ in de x , r e s p e c t i v e l y .

## Symbol s : 0 ( b l an k ) , 1 , b , x , y , c

# S t a t e s : q1 , q2 , q3 , q4

#

STATES @UTM 4 6

#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

# State | I n pu t o r C o n d it i o n | Input | Next | Output or | Output |

# | | Method | S t a t e | Action | Method |

#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−b e g i n | @tape | @TpIn | q1 | | ;

q1 | @tape . elem [ @tape . $i ]== ’ 1 ’ | | q1 | @ l e f t ( ’ 1 ’ ) | ;

| @tape . elem [ @tape . $ i]== ’b ’ | | q1 | @right ( ’0 ’ ) | ;

| @tape . elem [ @tape . $ i]== ’y ’ | | q1 | @ l e f t ( ’ b ’ ) | ;

| @tape . elem [ @tape . $ i]== ’x ’ | | q1 | @right ( ’0 ’ ) | ;

| @tape . elem [ @tape . $i ]== ’ 0 ’ | | q1 | @ l e f t ( ’ x ’ ) | ;

| @tape . elem [ @tape . $i ]== ’ c ’ | | q4 | @ l e f t ( ’ 0 ’ ) | ;

q2 | @tape . elem [ @tape . $i ]== ’ 1 ’ | | q2 | @right ( ’0 ’ ) | ;

| @tape . elem [ @tape . $ i]== ’b ’ | | q3 | @ l e f t ( ’ y ’ ) | ;

| @tape . elem [ @tape . $ i]== ’y ’ | | q2 | @ l e f t ( ’ x ’ ) | ;

| @tape . elem [ @tape . $ i]== ’x ’ | | q2 | @ l e f t ( ’ y ’ ) | ;

| @tape . elem [ @tape . $i ]== ’ 0 ’ | | q2 | @ l e f t ( ’ 1 ’ ) | ;

| @tape . elem [ @tape . $i ]== ’ c ’ | | q2 | @ri ght ( ’b ’ ) | ;



65


| @tape . elem [ @tape . $ i]== ’b ’ | | q4 | @ri ght ( ’x ’ ) | ;

| @tape . elem [ @tape . $ i]== ’y ’ | | q3 | @ri ght ( ’b ’ ) | ;

| @tape . elem [ @tape . $ i]== ’x ’ | | | @tape | @TpOut;

| @tape . elem [ @tape . $i ]== ’ 0 ’ | | q1 | @ri ght ( ’ c ’ ) | ;

| @tape . elem [ @tape . $i ]== ’ c ’ | | q1 | @right ( ’1 ’ ) | ;


| @tape . elem [ @tape . $ i]== ’b ’ | | q2 | @ l e f t ( ’ c ’ ) | ;

| @tape . elem [ @tape . $ i]== ’y ’ | | q4 | @ri ght ( ’x ’ ) | ;

| @tape . elem [ @tape . $ i]== ’x ’ | | | @tape | @TpOut;

| @tape . elem [ @tape . $i ]== ’ 0 ’ | | q2 | @ l e f t ( ’ c ’ ) | ;| @tape . elem [ @tape . $i ]== ’ c ’ | | q4 | @ri ght ( ’b ’ ) | ;

END

Code Generation- An Introduction to Typed EBNF

Documents

Transcript of Code Generation- An Introduction to Typed EBNF