What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of...

29
Student: Alexandru Iliescu A unification – based syntactic parser PART

Transcript of What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of...

Page 1: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

Student: Alexandru Iliescu

A unification – based syntactic parser

PART

Page 2: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

What it’s ?“parsing”

Page 3: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammer. The term parsing comes from Latin pars, meaning part (of speech).

Page 4: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a pedagogical exercise, especially in inflected languages such as the Romance languages or Latin, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Page 5: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

Parsing a computer language

with two levels of grammar:

lexical and syntactic.

Page 6: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

The first stage is the token

generation, or lexical analysis,

by which the input character

stream is split into meaningful

symbols defined by a grammar

of regular expressions.

Page 7: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

For example, a calculator

program would look at an input

such as "12*(3+4)^2" and split

it into the

tokens 12, *, (, 3, +, 4, ), ^, 2,

each of which is a meaningful

symbol in the context of an

arithmetic expression.

Page 8: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

The next stage is parsing or

syntactic analysis, which is

checking that the tokens form

an allowable expression.

Page 9: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PART PC-PART

Page 10: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTD-PART is a development environment

for unification-based grammers on Xerox 1100 series work stations.

The first version of D-PART, was written at the Scandinavian Summer Workshop for Computational Linguistics in Helsinki, Finland, in 1985.

Page 11: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PART

This formalism is suitable for

encoding a wide variety of grammers.

Page 12: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PART

D-PART consists of four basic parts:

A unification package;

Interpreter for rules and lexical items;

Input/output routines for directed

graphs;

An Earley style chart parser.

Page 13: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTParsing and Unification

x

y

unify copyz z’

restore x

restore y

The method entails making only one copy, not

two, when the operation succeds. In the event of

failure, D-PART simply restores the original structures

without copying anything.

Page 14: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTRules

A rule in D-PART is a list of atomic

constituent labels that may be followed by

specifications.

Page 15: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTRules

Example of a rule:

S -> NP VPIn D-PART notation is written as

(S NP VP)

Page 16: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTRules

Before a rule is used by the parser, D-

PART compiles it to a feature set. A feature

set can be displayed in different ways – for

example, as a matrix or as a direct graph.

Page 17: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTLexical Rules

A lexical rule is a special kind of

template with two attributes: in and out.

Page 18: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PARTLexical Rules

In applying a lexical rule to a graph, the

latter is first unified with the value of in. If

the operation succeds, the value of out is

passed on as the result.

Page 19: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

D-PART

D-PART is not a commercial product. It is

made available to users outside SRI who

might wish to develop unification-based

grammars.

Page 20: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

PC-PART is a implementation of PART-II

computational linguistic formalism for

personal computers, available for MS-DOS,

Microsoft Windows, Macintosh and Unix,

and is still under devlopment.

Page 21: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

PC –PART has the following parts:

Chart parser;

Unification package;

Interpreter for grammar and lexical

rules;

Page 22: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

PC-PATR uses a left corner chart parser

with these characteristics: bottom-up parse with top-down filtering based on

the categories;

left-to-right order-after each word is added to the

chart.

Page 23: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

Unification

Unification is the basic operation applied to

feature structures in PC-PATR. It consists of the

merging of the information from two feature

structures. Two feature structures can unify if their

common features have the same values, but do not

unify if any feature values conflict.

Page 24: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

Grammar rulesA PC-PATR grammar rule has these parts, in the

order listed:1. the keyword Rule;2. an optional rule identifier enclosed in braces ({});3. the nonterminal symbol to be expanded;4. an arrow (->) or equal sign (=);5. zero or more terminal or nonterminal symbols;6. an optional colon (:);7. zero or more feature constraints;8. an optional period (.).

Page 25: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

Grammar rules

The optional rule identifier consists of

one or more words enclosed in braces.

Page 26: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

Grammar rules

For example, this rule says that any category in the grammar rules can be replaced by two copies of the same category separated by a CJ.

Rule X -> X_1 CJ X_2 <X cat> = <X_1 cat> <X cat> = <X_2 cat> <X arg1> = <X_1 arg1> <X arg1> = <X_2 arg1>

Page 27: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

Lexical rules

A PC-PATR lexical rule has these parts, in the order listed:1. the keyword Define;2. the name of the lexical rule;3. the keyword as;4. the rule definition;5. an optional period (.).

Page 28: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

PC-PART

Several people have contributed to the

development of PC-PATR over the past few

years.Alan Buseman, Jim Skon, Bob Kasper,

and Nathan Miles all contributed to an earlier

program named SILPATR that contained the

same basic parsing and unification functions.

Page 29: What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,

Bilbliography:

D-PART: A Development Environment for

Unification-Based Grammars, Lauri Karttunen;

PC-PART Reference Manual, Stephen McConnel;

Internet.