Interlingua Methodology Directly obtain the meaning of the source sentence. Do target sentence...
-
Upload
vivien-park -
Category
Documents
-
view
257 -
download
2
Transcript of Interlingua Methodology Directly obtain the meaning of the source sentence. Do target sentence...
Interlingua Methodology
Directly obtain the meaning of the source sentence. Do target sentence generation from the meaning
representation. John gave the book to Mary.
Meaning representation: give-action: agent: john object: the book receiver: mary
Competing approaches
Direct
Transfer based
Direct approach
Word replacementsI like mangoesmaOM AcCa laga AamaI like (root) mangoes
MorphologymaOM AcCa lagata AamaI like mangoes
Syntactic re-arrangement maOM Aama AcCa lagata hO
I mangoes like Semantic embellishment
mauJao Aama AcCa lagata hOI (dative) mangoes like
Transfer BasedSource sentence processed for parsing, chunking etc.
SS
NPNPVPVP
VV NPNP
IIlikelike
mangoesmangoes
Transfer Based
Transfer structures obtained for the target sentence.
SS
NPNPVPVP
VV
IIlikelike
NPNP
mangoesmangoes
Transfer Based
Morphology and language specific modifications
SS
NPNPVPVP
VV
mauJaomauJaoAcCa lagataa hOAcCa lagataa hO
NPNP
AamaAama
MT Architectures: Vauquois' triangle
Deep understanding level
Interlingual le vel
Logico-semantic level
Syntactico-functio nal level
Morpho-syntac tic level
Syntagmatic level
Graphemic leve l Direct translation
Syntactic transfer (surface )
Syntactic transfer (deep)
Conceptual transfer
Semantic transfer
Multilevel transfer
Ontological interlingua
Semantico-linguistic interlingua
SPA-structures (semantic& predicate-argument)
F-structures (functional)
C-structures (constituent)
Tagged tex t
Text
Mixing levels Multilevel descriptio n
Semi-direct translatio n
Relation Between the Transfer and the Interlingua Models
Interpretation generation
transfer
Parsing generation
Interlingua
Source languageParse tree
Target LanguageParse tree
source languagewords
Target language words
State of Affairs
Systran reports 19 different language
pairs. 8 alright for intended use. Even fewer are capable of quality written
or spoken text translation.
ENGLISH-SPANISH-ENGLISH
...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province
... en ese imperio, el arte de la cartografía logró tal perfección que el mapa de una sola provincia ocupó la totalidad de una ciudad, y el mapa del imperio, la totalidad de una provincia
... in that empire, the art of the cartography obtained such perfection that the map of a single province occupied the totality of a city, and the map of the empire, the totality of a province
Provided by Systran on 19/11/02
ENGLISH-KOREAN-ENGLISH
...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province
저 제국안에 , 단순한 지방의 지도가 도시의 완전을 점유했다 고 Cartography 의 예술은 같은 얀벽 , 및 제국 , 지방의 완전의 지도 를 달성했다
Inside that empire, the map of the region where it is simple occupied the perfection of the city the art of the Cartography is same, yan it attained the map of of perfection of the wall and
empire and region
Provided by Systran on 19/11/02
UNL Based MT: the scenario
UNL
ENGLISH
HINDIFRENCH
RUSSIANENCONVERSION
DECONVERSION
Common language for computers to express
information written in natural language (Uchida et. al. 2000)
Application: Electronic language to overcome language barrier
Information Distribution System
Universal Networking Language
UNL Example
agt obj plc
arrangearrange
JohnJohn meetingmeeting residenceresidence
Components of the UNL System
Universal Word
Relation Labels
Attributes
Universal Word
[saayaa] "shadow(icl>darkness)"; the place was now in shadow
[laoSamaa~] "shadow(icl>iota)"; not a shadow of doubt about his guilt
[saMkot] "shadow(icl>hint)" ; the shadow of the things to come
[Cayaa] "shadow(icl>deterrant)"; a shadow over his happiness
Universal Word (foreign concepts)
[aput] "snow(icl>thing)";
[pukak] "snow(aoj<salt like)";
[mauja] "snow(aoj<soft, aoj<deep)";
[massak] "snow(aoj<soft)";
[mangokpok] "snow(aoj<watery)";
Relation
agt (agent) Agt defines a thing which initiates an action.agt (do, thing)Syntax
agt[":"<Compound UW-ID>] "(" {<UW1>|":"<Compound UW-ID>} "," {<UW2>|":"<Compound UW-ID>} ")"
Detailed DefinitionAgent is defined as the relation between:UW1 - do, andUW2 - a thingwhere:
UW2 initiates UW1, or
UW2 is thought of as having a direct role in making UW1 happen. Examples and readings
agt(break(icl>do), John(icl>person)) John breaksagt(translate(icl>do), computer(icl>machine)) computer translates
Attributes
Used to describe what is said from the speaker's point of view.
In particular captures number, tense, aspect and modality information.
Example Attributes
I see a flower UNL: obj(see(icl>do), flower(icl>thing))
I saw flowers UNL: obj(see(icl>do).@past, flower(icl>thing).@pl)
Did I see flowers? UNL: obj(see(icl>do).@past.@interrogative,
flower(icl>thing).@pl) Please see the flowers?
UNL: obj(see(icl>do).@past.@request, flower(icl>thing).@pl.@definite)
The Analyser Machine
EnconverterAnalysis
RulesDictionary
C CC A A
nini+1 ni+2Node List
A
B E
D
C
Node-net
ni-1 ni+3
Strategy for Analysis
Morphological Analysis
Syntactico-Semantic Analysis
Analysis of a simple sentences
<< A Report of John’s genius reached King’s ears>>
article and noun are combined and attribute@indef is added to the noun.
<<[Report ][of] John’s genius reached king’s ears>>
Right shift to put preposition with the succeeding noun.
<</Report /[of ][John’s] genius reached king’s ears>>
Ram’s being a possessing noun, shift right.
<</Report //of / [John’s] [genius] reached king’s ears>>
These two nouns are resolved into relation pos and first noun is deleted:
Simple sentence (continued)
<</Report /[of][genius] reached King’s ears>> The preposition of is then combined with noun and a dynamic attribute OFRES is added to entry of genius. <<[Report][of genius ] reached King’s ears>> Using the attribute OFRES these two nouns are resolved to relation mod and the second noun is deleted. <<[Report ][reached] King’s ears>> Shift right again and solve King’s ears, relation pof is generated.
<</Report /[reached][ ears]>> Relation obj is generated here and then relation agt is generated between Report and ears <</reached />>
UNL as Interlingua and Language Divergence
(Dave, Parikh, Bhattacharyya, JMT, 2003)
Stands for the discrepancy in representation
due to the inherent characteristics of the
languages. Syntactic Divergence Lexical Semantic Divergence
Issue of free word order
jaIma nao caaorI krnaovaalao laD,ko kao laazI sao maara.
jaIma nao laazI sao caaorI krnaovaalao laD,ko kao maara.
caaorI krnaovaalao laD,ko kao jaIma nao laazI sao maara.
caaorI krnaovaalao laD,ko kao laazI sao jaIma nao maara.
laazI sao jaIma nao caaorI krnaovaalao laD,ko kao maara.
Use made of the fact that in Hindi post positions stay adjacent to nouns (opposed to the preposition stranding divergence).
Flexibility in parsing- hit and preserve the predicate till the end.
Conjunct and Compound verbs
Typical Indian language phenomenon. Conjunct for verb-verb, compound for other POS+verb.
vah gaanao lagaI She started singing
H calao jaaAaoE Go away.H $k jaaAao E Stop there.H Jauk jaaAaoE Bend down.
Possibility of combinatorial explosion in the lexicon. Possiblesolution: wordnet?
Use of Lexical Resources
Automatic Generation of the UW to language dictionary (Verma and Bhattacharyya, Global Wordnet Conference, Czeck Republic, 2004)
Universal Word generation Semantic attribute generation Heavy use of wordnets and ontologies
Languages under Study
Language Analysis Status Generation Status
English D- 60000
R- 5000
D- 60000
R- 400
Hindi D- 75000
R- 5700
D- 75000
R- 6500
Marathi D- 4000
R- 2200
D- 4000
R- 6000
Bengali D- 500
R- 1800
D- 500
R- 2100
Conclusions
Predicate preservation strategy used for
English, Hindi, Marathi, Bengali (Spanish
being added). Focus in marathi on morphology for
Marathi. Focus on kaarak (case) system for Bengali. Extremely lexical knowledge hungry.
Conclusions
Work going on in the creation of Indian language wordnets (Hindi, Marathi in IIT Bombay; Dravidian in Anna University).
Interlingua has a the attractive possibility of being used as a knowledge representation and applying to interesting applications like summarization, text clustering, meaning based multilingual search engines.
Generation of the Hindi Case System in an Interlingua based MT
Framework
Debasri Chakrabarti, Sunil Kumar Dubey, Pushpak Bhattacharyya.Computer Science and Engineering Department,
Indian Institute of Technology, Bombay,Mumbai, 400076, India.
debasri,dubey,[email protected]
Introduction
Role of the case marker in a languageplays an important role in the structure of a sentencehelps to impart the meaning and naturalness
Example* मो�टे� तौ�र पर कृ षि� भू�मिमो कृ� जु�तौ�ई, फसलों� कृ� रुप�ई, कृटे�ई, प�लोंतौ� पशु� प्रजुनन, प�लोंन, दुग्ध- व्यवस�य और वन$कृरण सम्मि'मोलिलोंतौ हो�तौ� हो* । In a broad sense, agriculture includes cultivation of the soil and growing and harvesting crops and breeding and raising livestock and dairying and forestry.
The Case System in Hindi
Hindi is characterized by a rich subsystem of case
Example: र�मो न� रषिव कृ� षिकृतौ�ब दी/। Ram Erg Ravi Dat book Nom give + past Ram
gave a book to Ravi.
Hindi has the following cases nominative, ergative, accusative, instrumental, dative, genitive
locative
Language Universal Case Feature
Case ConditionsNominative
(NOM)case of the subjects. In a language if thereare two distinct cases for the subjects, oneinflected and the other without inflectionthen NOM refers to the uninflected one.
Ergative (ERG) the inflected case associated with the subject
Accusative (ACC) case attached with the object
Language Universal Case Feature
Case Conditions
Dative (DAT) case of goals/ recipients.
Instrumental (INS) case of instruments used to accomplish an action
Genitive (GEN) case of possessors
Locative (LOC) case of physical place
Case features of Hindi
Case Markers Conditions Example
1. Nominative Ө a. Subject र�मो आमो खा� रहो�था�।
b. Inanimate primary object
र�मो आमो खा�रहो� था�।
2. Ergative न� a. Agentive subject with perfective aspect
र�मो न� श्य�मो कृ�षिकृतौ�ब दी/ था$।
b. Simple past र�मो न� षिकृतौ�बपढ़ी/।
Case features of Hindi
Case Markers Conditions Example
3. Accusative कृ� a. Animate primary object र�मो न� स$तौ� कृ�दी�खा�।
b. Definite, Inanimate primary object
र�मो न� उस षिकृतौ�बकृ� पढ़ी�।
4. Dative कृ� Goal of the sentence र�मो न� स$तौ� कृ� षिकृतौ�ब दी/।
5. Ablative स� Source प�ड़ स� पत्ते� षि8र रहो�हो9।
Case features of Hindi
Case Markers Conditions Example
6. Instrumental स�
a. Instrumentर�मो न� चा�कृ� स� फलोंकृ�टे�।
b. Intermediary agent
[cause]
र�मो न� स$तौ� स� लिचाठ्ठी<लिलोंखाव�ई
c. Denoted agent of Passive
र�मो स� खा�न� नहो=खा�य� 8य�।
7. Genitive कृ�, कृ�, कृ�
a. Possessor [involving
ownershipof something]
र�मो कृ� षिकृतौ�ब अच्छी/हो*।
Case features of Hindi
Case Markers Conditions Example
7. Genitive कृ�, कृ�, कृ� b. Relationship to somebody
र�मो कृ� भू�ई अच्छी�हो*।
8. Locative मोD a. In, Within र�मो दिदील्लों$ मोD रहोतौ�हो*।
पर b. On, at र�मो प�ड़ पर चाढ़ी8य�।
Nominative ~ Ergative alternation in the agent position
agent of an action may bear either nominative case or ergative case
ergative case appears in Hindi simple past form perfective aspect
Examples
र�मो न� रषिव कृ� प$टे�। Ram erg Ravi acc beat+past Ram beat Ravi.
र�मो न� रषिव कृ� प$टे� था�। Ram erg Ravi acc beat+past perfect Ram had beaten Ravi.
र�मो न� रषिव कृ� प$टे� हो*। Ram erg Ravi acc beat+present perfect Ram has beaten Ravi.
Observations
There is a correlation between the ergative case and the aspectual property of the main verb
This is morphologically overt on the verb Simple Past Tense: प$टे� Perfective Aspect: प$टे� था�
Morphological Rule Simple Past Tense: V + आ न� Perfective Aspect: V + आ + (Tense morphology) न�
Nominative ~ Ergative Alternation
Some Complex Phenomena nominative case on the agent with the mentioned
aspectual features
IS nominative ~ ergative subject to transitivity?language universally transitivity determines nom
~ erg three types of patterns independent of transitivity
in Hindi
Nominative ~ Ergative Alternation
Three patterns are:only nom agentsonly erg agents either nom or erg agents
Examples of Intransitive verbs Only nom agents
i) र�मो षि8र�। Ram fell down
Ram +nom fall + past.
ii) *र�मो न� षि8र�। Ram erg f all + past
Intransitive Verbs
Only erg agents i) र�मो न� प्रतौ$क्षा� कृ�। Ram waited.
Ram erg wait + past. ii) * र�मो प्रतौ$क्षा� षिकृय�। Ram +nom wait + past.
Either nom or erg agents i) र�मो खा�लों�। Ram played. Ram +nom play + past. ii) र�मो न� खा�लों�। Ram erg play + past.
Transitive Verbs
Only nom agents i) र�मो शु$शु� लों�य�। Ram brought the glass.
Ram +nom glass bring + past.
ii) *र�मो न� शु$शु� लों�य�। Ram erg glass bring + past.
Only erg agents i) र�मो न� शु$शु� तौ�ड़�। Ram broke the glass.
Ram erg glass break + past.
ii) *र�मो शु$शु� तौ�ड़�। Ram +nom glass break + past.
Transitive Verbs
Either nom or erg agents
i) र�मो न� समोझा� षिकृ घर मो�र� हो*। Ram erg think + past that house mine is.
Ram thought that the house is mine. ii) र�मो समोझा� षिकृ घर मो�र� हो*। Ram think + past that house mine is.
Inferences
Ergative case in Hindi is semantically driven action performed deliberately : ergative case action performed non deliberately: nominative case
Examples of deliberate and non-deliberate action
र�मो षि8र�। Ram fell down
Ram +nom fall + past.
र�मो न� मो�होन कृ� षि8र�य�। Ram made Mohan to fall down.
Ram erg Mohan acc cause to fall down
Accusative ~ Nominative Alternation in the Object
Primary objects in Hindieither accusative : कृ� or nom uninflected : Ө
Examples र�मो न� चा�वलों खा�य�। Ram ate rice
Ram erg rice + nom eat+ past.
र�मो न� र�वण कृ� मो�र�। Ram killed Ravan.
Ram erg Ravan acc kill + past.
Accusative ~ Nominative Alternation
Generalizationanimate objects are accusative inanimate objects are nominative
Counter examples of this generalization accusative case with the inanimate objects र�मो न� षिकृतौ�ब कृ� उठा�य� । Ram lifted the book.
Ram erg book acc lift + past.
र�मो न� षिकृतौ�ब उठा�ई । Ram lifted a book. Ram erg book lift + past.
Summarization of the theoretical approach
NOM~ERG Alternationsubject to the semantic property of the verbthis semantic property conscious choice
NOM~ACC Alternationsubject to animacysubject to definiteness
How to generate the Case Markers in the UNL System
Three components to the generation system LexiconRule BaseUNL Expression
Lexiconattribute for a verb is taken from a verb hierarchy attribute of conscious choice is [DLBRT-ACT] [DLBRT-ACT] stands for deliberate action
Rule BaseHindi is a SOV languagea frequently used rule in Hindi is left insertion rulesa child node is mostly always inserted to the left of the
parent node given in the UNL expression
Format for the Rule :"<COND1>:<ACTION1>:<RELATION1>:<ROLE1>"{<COND2>:<ACTION2>:<
RELATION2>:<ROLE2>}
Case Markers in the UNL System
Sachin Rice
<<SHEAD>> eat@entry
G
agt obj
<<STAIL>>
G
Rice
<<SHEAD>> Sachin
G
obj
eat@entry
G
<<STAIL>><<SHEAD>> eat@entry <<STAIL>><<SHEAD>> सलिचान
Rice
obj
Rice
obj
खा�@entry
G GG G
<<STAIL>><<SHEAD>> eat@entry <<STAIL>><<SHEAD>> सलिचान
Rice
obj
Rice
obj
खा�@entry
G GG G
<<STAIL>><<SHEAD>> eat@entry <<STAIL>>सलिचान<<SHEAD>> खा�@entry
G GG G
<<STAIL>>चा�वलों<<SHEAD>> eat@entry <<STAIL>><<SHEAD>> सलिचान खा�@entry
G GG G
<<STAIL>>चा�वलों<<SHEAD>> eat@entry <<STAIL>><<SHEAD>> सलिचान खा�@entry
G GG G
<<STAIL>>चा�वलों<<SHEAD>> eat@entry <<STAIL>><<SHEAD>> सलिचान खा�@entry
G GG G
<<STAIL>>चा�वलों
Interpretation of a rule
Example of a left insertion rule:"<agt:+blk,+agt,+!ne,+sufc:agt:"{V,>agt,@past,DLBRT-ACT,^@progress:+!agt::}P242;
{} indicates parent node“ ” indicates child nodecondition of the parent node
V,>agt,@past,DLBRT-ACT,^@progress condition of the child node
<agt priority is denoted by P followed by a priority number
P242
Generation of the Case marker on the Agent
Rules for the generation system to handle the case of the agent
R1:"<agt:+blk,+agt,+!ne,+sufc:agt:"{V,>agt,@past,DLBRT-
ACT,^@progress:+!agt::}P242;
R2:"<agt:+blk,+agt,+sufc:agt:"{V,>agt,@past,^@progress:!agt::}P241;
Priority plays an important role in generationR1 Ergative R2 Nominative
Generation of the Case marker on the Object
Rules for the generation system to handle the case of the object
R5 :"<obj,INANI,MALE,^V,^SCOPE:+obj,+sufc,
+blk:obj:"{>obj:+!obj,+male::}P160;
R6 :"<obj,ANIMT,MALE,^V,^SCOPE:+obj,+sufc,+blk,+!
ko:obj:"{>obj:+!obj,+male::}P160;
R7 :"<obj,@def,MALE,^V,^SCOPE:+obj,+sufc,+blk,+!
ko:obj:"{>obj:+!obj,+male::}P163;
Conclusion
Resultprovides accuracy in the generation of case markers for the UNL
relations (see table) lends naturalness in the generation of the Hindi sentences This alternation is extended for the pronominal cases
Future Work enhance the study for Dative and Genitive case markers and their corresponding UNL relations
Demo