Natural language processing

Natural Language ProcessingCC-9016

Dr. C. Saravanan, Ph.D., NIT Durgapur

[email protected]

Natural Language Processing, Dr. C. Saravanan, NIT Durgapur.

Introduction

• What is NLP ?• Natural language processing (NLP) can be defined as the automatic (or semi-

automatic) processing of human language.

• Nowadays, alternative terms are often preferred, like `Language Technology' or `Language Engineering'.

• Language is often used in contrast with speech (e.g., Speech and Language Technology).


Introduction cont…

• NLP is essentially multidisciplinary: it is closely related to linguistics.

• It also has links to research in cognitive science, psychology, philosophy and maths (especially logic).

• Within Computer Science, it relates to formal language theory, compiler techniques, theorem proving, machine learning and human-computer interaction.

• Of course it is also related to AI, though nowadays it's not generally thought of as part of AI.



• The HAL 9000 computer in Stanley Kubrick’s film 2001: A Space Odyssey is one of the most recognizable characters in twentieth-century cinema.

• HAL is an artificial agent capable of speaking and understanding English, and even reading lips.

• HAL able to do information retrieval (finding out where needed textual resources reside), information extraction (extracting pertinent facts from those textual resources), and inference (drawing conclusions based on known facts).



Space Odyssey


Other valuable areas of language processing such as

spelling correction,

grammar checking,

information retrieval, and

machine translation.

Example: Is the door open? The door is open.


Online Applications

• Dictation - Online Speech Recognition https://dictation.io/

• SpeechPad - a new voice notebook https://speechpad.pw/

• Speechlogger - https://speechlogger.appspot.com/en/

• Talktyper https://talktyper.com/

(microphone and speaker is required)


https://dictation.io/

https://speechpad.pw/

https://speechlogger.appspot.com/en/

https://talktyper.com/

Knowledge in Language Processing

• Requires knowledge about phonetics and phonology, which can help model how words are pronounced in colloquial speech

• they're- they + are. They're from Canada.

• their- something belongs to "them." This is their car.

• there- in that place. The park is over there.

• two- the number 2. I'll have two hamburgers, please.

• to- this has many meanings. One means "in the direction of." I'm going to South America.

• too- also. I want to go, too.


Knowledge in Language Processing cont…

• Requires knowledge about morphology, which captures information about the shape and behavior of words in context.

• Singular and Plural – Man Men, Boy Boys

• The knowledge needed to order and group words together comes under the heading of syntax.



• Requires knowledge of the meanings of the component words, the domain of lexical semantics

• and knowledge of how these components combine to form larger meanings, compositional semantics.

• No

• No, I won’t open the door.

• I’m sorry and I’m afraid, I can’t. (indirect refusal)



• The knowledge of language needed to engage in complex language behavior can be separated into six distinct categories.

• Phonetics and Phonology – The study of linguistic sounds.

• Morphology – The study of the meaningful components of words.

• Syntax – The study of the structural relationships between words.

• Semantics – The study of meaning.

• Pragmatics – The study of how language is used to accomplish goals.

• Discourse – The study of linguistic units larger than a single utterance.


AMBIGUITY

• Some input is ambiguous if there are multiple alternative linguistic structures than can be built for it.

• Look at the dog with one eye.

• Look at the dog using only one of your eyes.

• Look at the dog that only has one eye.

• The word with is semantically ambiguous in the sentence.


AMBIGUITY

• Some cases, for example the word make means ‘create’ and ‘cook’

• can be solved by word sense disambiguation and

• be resolved by speech act interpretation.

• Example: I make a Cake


Models and Algorithms

• Most important elements are • state machines,

• formal rule systems,

• logic,

• probability theory

• Well known computational algorithms• dynamic programming

• state space search


State Machines

• state machines are formal models that consist of states, transitions among states, and an input representation

• the variations of this basic model are deterministic and non-deterministic finite-state automata, finite-state transducers

• which can write to an output device, weighted automata, Markov models and hidden Markov models which have a probabilistic component.


Formal Rule Systems

• procedural models and more important ones are

• regular grammars

• regular relations

• context-free grammars

• feature-augmented grammars

• probabilistic variants of them all.

• the algorithms that are often used such as depth-first search, best-first, and A* search.


Logic and Probability Theory

• first order logic, also known as the predicate calculus, deals with formalisms as feature-structures, semantic networks, and conceptual dependency.

• Probability theory solves the many kinds of ambiguity problems• given N choices for some ambiguous input, choose the most probable one


Language, Thought, and Understanding

• Alan Turing (1950) introduced Turing Test

• In Turing’s game, there are three participants: 2 people and a computer.

• One of the people is a contestant and plays the role of an interrogator.

• To win, the interrogator must determine which of the other two participants is the machine by asking a series of questions via a teletype.

• The task of the machine is to fool the interrogator into believing it is a person by responding as a person would to the interrogator’s questions.

• The task of the second human participant is to convince the interrogator that the other participant is the machine, and that they are human.


ELIZA

• ELIZA (1966) was an early natural language processing system makes use of pattern-matching capable of carrying on a limited form of conversation with a user.

• User1: You are like my father in some ways.

• ELIZA1: WHAT RESEMBLANCE DO YOU SEE

• User2: You are not very aggressive but I think you don’t want me to notice that.

• ELIZA2: WHAT MAKES YOU THINK I AM NOT AGGRESSIVE

• User3: You don’t argue with me.

• ELIZA3: WHY DO YOU THINK I DON’T ARGUEWITH YOU

• User4: You are afraid of me.

• ELIZA4: DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU


HISTORY 1940’s and 1950’s

• Automata theory was contributed to by Shannon (1948), who applied probabilistic models of discrete Markov processes to automata for language.

• Chomsky (1956) first considered finite-state machines as a way to characterize a grammar, and defined a finite-state language as a language generated by a finite-state grammar.

• These early models led to the field of formal language theory, which used algebra and set theory to define formal languages as sequences of symbols.

• This includes the context-free grammar, first defined by Chomsky (1956) for natural languages but independently discovered by Backus (1959) and Nauret al. (1960) in their descriptions of the ALGOL programming language.

• The sound spectrograph was developed (Koenig et al., 1946)


1957–1970

• Speech and Language Processing had split very cleanly into two paradigms: symbolic and stochastic.

• The first was the work of Chomsky and others on formal language theory and generative syntax throughout the late 1950’s and early to mid 1960’s, and the work of many linguistics and computer scientists on parsing algorithms, initially top-down and bottom-up, and then via dynamic programming.

• One of the earliest complete parsing systems was Zelig Harris’s Transformations and Discourse Analysis Project (TDAP), which was implemented between June 1958 and July 1959 at the University of Pennsylvania (Harris, 1962).


1957–1970 cont…

• The second line of research was the new field of artificial intelligence.

• In the 1956 John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester organized a two month workshop on artificial intelligence.

• The major focus of the new field was the work on reasoning and logic typified by Newell and Simon’s work on the Logic Theorist and the General Problem Solver.

• These were simple systems which worked in single domains mainly by a combination of pattern matching and key-word search with simple heuristics for reasoning and question-answering.


1957–1970 cont…

• Bledsoe and Browning (1959) built a Bayesian system for text-recognition (optical character recognition) that used a large dictionary and computed the likelihood of each observed letter sequence given each word in the dictionary by multiplying the likelihoods for each letter.

• Mosteller and Wallace (1964) applied Bayesian methods to the problem of authorship attribution on The Federalist papers.

• The Brown corpus of American English, a 1 million word collection of samples from 500 written texts from different genres (newspaper, novels, non-fiction, academic, etc.), which was assembled in 1963-64 (Kucera and Francis, 1967; Francis, 1979; Francis and Kucera, 1982), and William S. Y. Wang’s 1967 DOC (Dictionary on Computer), an on-line Chinese dialect dictionary.


1970–1983

• Use of the Hidden Markov Model and the metaphors of the noisy channel and decoding, developed independently by Jelinek, Bahl, Mercer, and colleagues at IBM’s Thomas J. Watson Research Center, and Baker at Carnegie Mellon University.

• The logic-based paradigm was begun by the work of Colmerauer and his colleagues on Q-systems and metamorphosis grammars (Colmerauer, 1970, 1975), the forerunners of Prolog and Definite Clause Grammars (Pereira and Warren, 1980).

• Independently, Kay’s (1979) work on functional grammar, and shortly later, (1982)’s (1982) work on LFG, established the importance of feature structure unification.


1970–1983 cont…

• Terry Winograd’s SHRDLU system which simulated a robot embedded in a world of toy blocks (Winograd, 1972a). The program was able to accept natural language text commands (Move the red block on top of the smaller green one) of a hitherto unseen complexity and sophistication.

• Roger Schank and his colleagues and students built a series of language understanding programs that focused on human conceptual knowledge such as scripts, plans and goals, and human memory organization (Schank and Abelson, 1977; Schank and Riesbeck, 1981; Cullingford, 1981; Wilensky, 1983; Lehnert, 1977).


1970–1983 cont…

• The discourse modeling paradigm focused on four key areas in discourse. Grosz and her colleagues proposed ideas of discourse structure and discourse focus (Grosz, 1977a; Sidner, 1983a), a number of researchers began to work on automatic reference resolution (Hobbs, 1978a), and the BDI (Belief-Desire-Intention) framework for logic-based work on speech acts was developed (Perrault and Allen, 1980; Cohen and Perrault, 1979).


1983-1993

• The finite-state models, which began to receive attention again after work on finite-state phonology and morphology by (Kaplan and Kay, 1981) and finite-state models of syntax by Church (1980).

• Another trend was the rise of probabilistic models throughout speech and language processing, influenced strongly by the work at the IBM Thomas J. Watson Research Center on probabilistic models of speech recognition.


1994-1999

• First, probabilistic and data-driven models had become quite standard throughout natural language processing.

• Algorithms for parsing, part of speech tagging, reference resolution, and discourse processing all began to incorporate probabilities, and employ evaluation methodologies borrowed from speech recognition and information retrieval.

• Second, the increases in the speed and memory of computers had allowed commercial exploitation of a number of subareas of speech and language processing, in particular speech recognition and spelling and grammar checking.

• Finally, the rise of the Web emphasized the need for language-based information retrieval and information extraction.


NLP applications

• spelling and grammar checking optical character recognition (OCR)

• screen readers for blind and partially sighted users / augmentative and alternative communication

• machine aided translation lexicographers' tools

• information retrieval document classification (filtering, routing)

• document clustering information extraction

• question answering summarization

• text segmentation exam marking

• report generation (possibly multilingual) machine translation

• natural language interfaces to databases email understanding

• dialogue system


Some Linguistic Terminology

• Morphology: the structure of words. For instance, unusually can be thought of as composed of a prefix un-, a stem usual, and an affix -ly. composed is compose plus the inflectional affix -ed: a spelling rule means we end up with composed rather than composeed.

• Syntax: the way words are used to form phrases. e.g., it is part of English syntax that a determiner such as the will come before a noun, and also that determiners are obligatory with certain singular nouns.


Some Linguistic Terminology

• Semantics. Compositional semantics is the construction of meaning (generally expressed as logic) based on syntax.

• Pragmatics: meaning in context.


Difficulties

• How fast is the 3G ?

• How fast will my 3G arrive?

• Please tell me when I can expect the 3G I ordered.

• Has my order number 4291 been shipped yet?


Spelling and grammar checking

• All spelling checkers can ag words which aren't in a dictionary.

• * The neccesary steps are obvious.

• The necessary steps are obvious.

• If the user can expand the dictionary, or if the language has complex productive morphology (see x2.1), then a simple list of words isn't enough to do this and some morphological processing is needed.


• * The dog came into the room, it's tail wagging.

• The dog came into the room, its tail wagging.

• For instance, possessive its generally has to be followed by a noun or an adjective noun combination etc.


• But, it sometimes isn't locally clear:

e.g. fair is ambiguous between a noun and an adjective.

• `It's fair', was all Kim said.

• Every village has an annual fair.


• The most elaborate spelling/grammar checkers can get some of these cases right.

• * The tree's bows were heavy with snow.

• The tree's boughs were heavy with snow.

• Getting this right requires an association between tree and bough.

• Attempts might have been made to hand-code this in terms of general knowledge of trees and their parts.


• Checking punctuation can be hard

• Students at Cambridge University, who come from less affluent backgrounds, are being offered up to 1,000 a year under a bursary scheme.

• The sentence implies that most/all students at Cambridge come from less affluent backgrounds. The following sentence makes different meaning.

• Students at Cambridge University who come from less affluent backgrounds are being offered up to 1,000 a year under a bursary scheme.


Information retrieval

• Information retrieval involves returning a set of documents in response to a user query: Internet search engines are a form of IR.

• However, one change from classical IR is that Internet search now uses techniques that rank documents according to how many links there are to them (e.g., Google's PageRank) as well as the presence of search terms.


Information extraction

• Information extraction involves trying to discover specific information from a set of documents.

• The information required can be described as a template.

• For instance, for company joint ventures, the template might have slots for the companies, the dates, the products, the amount of money involved. The slot fillers are generally strings.


Question answering

• Attempts to find a specific answer to a specific question from a set of documents, or at least a short piece of text that contains the answer.

• What is the capital of France?

• Paris has been the French capital for many centuries.

• There are some question-answering systems on theWeb, but most use very basic techniques.


Machine translation (MT)

• MT work started in the US in the early fifties, concentrating on Russian to English. A prototype system was demonstrated in 1954.

• Systran was providing useful translations by the late 60s.

• Until the 80s, the utility of general purpose MT systems was severely limited by the fact that text was not available in electronic form: Systran used teams of skilled typists to input Russian documents.


• Systran and similar systems are not a substitute for human translation:

• They are useful because they allow people to get an idea of what a document is about, and maybe decide whether it is interesting enough to get translated properly.

• Spoken language translation is viable for limited domains: research systems include Verbmobil, SLT and CSTAR.


Natural language interfaces

• Natural language interfaces were the `classic' NLP problem in the 70s and 80s.

• LUNAR is the classic example of a natural language interface to a database (NLID):

• Its database concerned lunar rock samples brought back from the Apollo missions. LUNAR is described by Woods (1978).

• It was capable of translating elaborate natural language expressions into database queries.


SHRDLU

• SHRDLU (Winograd, 1973) was a system capable of participating in a dialogue about a micro world (the blocks world) and manipulating this world according to commands issued in English by the user.

• SHRDLU had a big impact on the perception of NLP at the time since it seemed to show that computers could actually `understand' language:

• The impossibility of scaling up from the micro world was not realized.


• There have been many advances in NLP since these systems were built:

• systems have become much easier to build, and somewhat easier to use, but they still haven't become ubiquitous.

• Natural Language interfaces to databases were commercially available in the late 1970s, but largely died out by the 1990s:

• Porting to new databases and especially to new domains requires very specialist skills and is essentially too expensive.


NLP application architecture

• The input to an NLP system could be speech or text.

• It could also be gesture (multimodal input or perhaps a Sign Language).

• The output might be non-linguistic, but most systems need to give some sort of feedback to the user, even if they are simply performing some action.


Components

• Input Pre Processing: speech / gesture recogniser or text pre-processor• Morphological Analysis• Speech Tagging• Parsing - this includes syntax and compositional semantics• Disambiguation: this can be done as part of parsing• Context Module: this maintains information about the context• Text Planning: the part of language generation / what meaning to convey• Tactical Generation: converts meaning representations to strings.• Morphological Generation• Output Processing: text-to-speech, text formatter, etc.


• Following are required in order to construct the standard components and knowledge sources.

• Lexicon Acquisition

• Grammar Acquisition

• Acquisition of Statistical Information


Note

• NLP applications need complex knowledge sources.

• Applications cannot be 100% perfect. Humans aren't 100% perfect anyway.

• Applications that aid humans are much easier to construct than applications which replace humans.

• Improvements in NLP techniques are generally incremental rather than revolutionary.


Morphology

• Morphology concerns the structure of words.

• Words are assumed to be made up of morphemes, which are the minimal information carrying unit.

• Morphemes which can only occur in conjunction with other morphemes are affixes: words are made up of a stem and zero or more affixes.

• For instance, dog is a stem which may occur with the plural suffix +s i.e., dogs.


• English only has suffixes (affixes which come after a stem) and prefixes (which come before the stem).

• Few languages have infixes (affixes which occur inside the stem) and circumfixes (affixes which go around a stem).

• Some English irregular verbs show a relic of inflection by infixation(e.g. sing, sang, sung)


Inflectional Morphology

• Inflectional morphology concerns properties such as

• tense, aspect, number, person, gender, and case,

• although not all languages code all of these:

• English, for instance, has very little morphological marking of case and gender.


Derivational Morphology

• Derivational affixes, such as un-, re-, anti- etc, have a broader range of semantic possibilities and don't fit into neat paradigms.

• Antidisestablishmentarianism

• Antidisestablishmentarianismization

• Antiantimissile


Zero Derivation

• tango, waltz etc. are words

• which are basically nouns but can be used as verbs.

• An English verb will have a present tense form, a 3rd person singular present tense form, a past participle and a passive participle.

• e.g., text as a verb - texts, texted.


• Stems and afifxes can be individually ambiguous.

• There is also potential for ambiguity in how a word form is split into morphemes.

• For instance, unionised could be union -ise -ed

• or (in chemistry) union -ise -ed.

• Note that union is not a possible form (because un- can't attach to a noun).


• un- that can attach to verbs, it nearly always denotes a reversal of a process (e.g., untie)

• whereas the un- that attaches to adjectives means `not', which is the meaning in the case of union -ise -ed.


Spelling / orthographic rules

• English morphology is essentially concatenative.

• Some words have irregular morphology and their inflectional forms simply have to be listed.

• For instance, the suffix -s is pronounced differently when it is added to a stem which ends in s, x or z and the spelling reflects this with the addition of an e (boxes etc.).


• The `e-insertion' rule can be described as follows:

{ s }

• ɛ → e / { x } ^_s

{ z }

• _ indicating the position in question

• ɛ is used for the empty string and

• ^ for the affix boundary


• Other spelling rules in English include consonant doubling

• e.g., rat, ratted, though note, not *auditted)

• and y/ie conversion (party, parties).


Lexical requirements for morphological processing• There are three sorts of lexical information that are needed for full,

high precision morphological processing:

• affixes, plus the associated information conveyed by the affix

• irregular forms, with associated information similar to that for affixes

• stems with syntactic categories


• On approach to an affix lexicon is for it to consist of a pairing of affix and some encoding of the syntactic/semantic effect of the affix.

• For instance, consider the following fragment of a suffix lexicon

• ed PAST_VERB

• ed PSP_VERB

• s PLURAL_NOUN


• A lexicon of irregular forms is also needed.

• began PAST_VERB begin

• begun PSP_VERB begin

• English irregular forms are the same for all senses of a word. For instance, ran is the past of run whether we are talking about athletes, politicians or noses.


• The morphological analyser is split into two stages.

• The first of these only concerns morpheme forms

• A second stage which is closely coupled to the syntactic analysis


Finite state automata for recognition

• The approach to spelling rules that described involves the use of finite state transducers (FSTs).

• Suppose we want to recognize dates (just day and month pairs) written in the format day/month. The day and the month may be expressed as one or two digits (e.g. 11/2, 1/12 etc).

• This format corresponds to the following simple FSA, where each character corresponds to one transition:


Prediction and part-of-speech tagging

• Corpora

• A corpus (corpora is the plural) is simply a body of text that has been collected for some purpose.

• A balanced corpus contains texts which represent different genres (newspapers, fiction, textbooks, parliamentary reports, cooking recipes, scientific papers etc.):

• Early examples were the Brown corpus (US English) and the Lancaster-Oslo-Bergen (LOB) corpus (British English) which are each about 1 million words: the more recent British National Corpus (BNC) contains approx. 100 million words and includes 20 million words of spoken English.


• Implementing an email answering application:

• it is essential to collect samples of representative emails.

• For interface applications in particular, collecting a corpus requires a simulation of the actual application:

• generally this is done by a Wizard of Oz experiment, where a human pretends to be a computer.

• Corpora are needed in NLP for two reasons.• To evaluate algorithms on real language

• Providing the data source for many machine-learning approaches


Prediction

• The essential idea of prediction is that,

• given a sequence of words,

• to determine what's most likely to come next.

• Used for language modelling for automatic speech recognition.

• Speech recognisers cannot accurately determine a word from the sound signal.

• They cannot reliably tell where each word starts and finishes.

• The most probable word is chosen on the basis of the language model


• The language models which are currently most effective work on the basis of N-grams (a type of Markov chain), where the sequence of the prior n-1 words is used to predict the next.

• Trigram models use the preceding 2 words,

• bigram models use the preceding word and

• unigram models use no context at all, but simply work on the basis of individual word probabilities.


• Word prediction is also useful in communication aids:

• Systems for people who can't speak because of some form of disability.

• People who use text-to-speech systems to talk because of a non-linguistic disability usually have some form of general motor impairment which also restricts their ability to type at normal rates (stroke, ALS, cerebral palsy etc).


• Often they use alternative input devices, such as adapted keyboards, puffer switches, mouth sticks or eye trackers.

• Generally such users can only construct text at a few words a minute, which is too slow for anything like normal communication to be possible (normal speech is around 150 words per minute).

• As a partial aid, a word prediction system is sometimes helpful: • This gives a list of candidate words that changes as the initial letters are

entered by the user. • The user chooses the desired word from a menu when it appears. • The main difficulty with using statistical prediction models in such

applications is in finding enough data: to be useful, the model really has to be trained on an individual speaker's output.


• Prediction is important in estimation of entropy, including estimations of the entropy of English.

• The notion of entropy is important in language modelling because it gives a metric for the difficulty of the prediction problem.

• For instance, speech recognition is much easier in situations where the speaker is only saying two easily distinguishable words than when the vocabulary is unlimited: measurements of entropy can quantify this.

• Other applications for prediction include optical character recognition (OCR), spelling correction and text segmentation.


Bigrams

• A bigram model assigns a probability to a word based on the previous word P(wn|wn-1) where wn is the 𝑛𝑡ℎ word in some string.

• The probability of some string of words 𝑃(𝑊𝑛1) is thus approximated

by the product of these conditional probabilities:

• 𝑃 𝑊𝑛1= 𝑘=1

𝑛 𝑃(𝑤𝑘|𝑤𝑘−1)


Example

• good morning

• good afternoon

• good afternoon

• it is very good

• it is good

• We'll use the symbol <s> to indicate the start of an utterance, so the corpus really looks like:

• <s> good morning <s> good afternoon <s> good afternoon <s> it is very good <s> it is good <s>


• The bigram probabilities are given as

•𝐶(𝑤𝑛−1𝑤𝑛)

𝑤 𝐶(𝑤𝑛−1𝑤)

• the count of a particular bigram, normalized by dividing by the total number of bigrams starting with the same word.


Part of Speech (POS) tagging

• One important application is to part-of-speech tagging (POS tagging), where the words in a corpus are associated with a tag indicating some syntactic information that applies to that particular use of the word.

• For instance, consider the example sentence below:

• They can fish.

• This has two readings: one (the most likely) about ability to Fish and other about putting fish in cans. Fish is ambiguous between a singular noun, plural noun and a verb, while can is ambiguous between singular noun, verb (the `put in cans' use) and modal verb. However, they is unambiguously a pronoun. (I am ignoring some less likely possibilities, such as proper names.)

• These distinctions can be indicated by POS tags:

• they PNP

• can VM0 VVB VVI NN1

• fish NN1 NN2 VVB VVI


• The meaning of the tags above is:

• NN1 singular noun

• NN2 plural noun

• PNP personal pronoun

• VM0 modal auxiliary verb

• VVB base form of verb (except infinitive)

• VVI infinitive form of verb (i.e. occurs with `to')


• A POS tagger resolves the lexical ambiguities to give the most likely set of tags for the sentence. In this case, the right tagging is likely to be:

• They PNP can VM0 sh VVB . PUN

• Note the tag for the full stop: punctuation is treated as unambiguous. POS tagging can be regarded as a form of very basic word sense disambiguation.

• The other syntactically possible reading is:

• They PNP can VVB sh NN2 . PUN


Stochastic POS tagging

• One form of POS tagging applies the N-gram technique that we saw above, but in this case it applies to the POS tags rather than the individual words.

• The most common approaches depend on a small amount of manually tagged training data from which POS N-grams can be extracted.


Generative grammar

• concerned with the structure of phrases

• the dog slept

• can be bracketed

• ((the (big dog)) slept)


Equivalent

• Two grammars are said to be weakly-equivalent if they generate the same strings.

• Two grammars are strongly equivalent if they assign the same bracketing's to all strings they generate.


• In most, but not all, approaches, the internal structures are given labels.

• For instance, the big dog is a noun phrase (abbreviated NP),

• slept, slept in the park and licked Sandy are verb phrases (VPs).

• The labels such as NP and VP correspond to non-terminal symbols in a grammar.


Context free grammars

• A CFG has four components,

1. a set of non-terminal symbols (e.g., S, VP), conventionally written in uppercase;

2. a set of terminal symbols (i.e., the words), conventionally written in lowercase;

3. a set of rules (productions), where the left hand side (the mother) is a single non-terminal and the right hand side is a sequence of one or more non-terminal or terminal symbols (the daughters);

4. a start symbol, conventionally S, which is a member of the set of non-terminal symbols.


• they fish

• (S (NP they) (VP (V fish)))

• they can fish

• (S (NP they) (VP (V can) (VP (V fish))))

• ;;; the modal verb `are able to' reading

• (S (NP they) (VP (V can) (NP fish)))

• ;;; the less plausible, put fish in cans, reading

• they fish in rivers

• (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers))))


Parse Trees

• Parse trees are equivalent to bracketed structures, but are easier to read for complex cases.

• A parse tree and bracketed structure for one reading of they can fish in December is shown below.


Chart Parsing

• A chart is a list of edges.

• In the simplest version of chart parsing, each edge records a rule application and has the following structure:

• [id, left vertex, right vertex, mother category, daughters]

• A vertex is an integer representing a point in the input string, as illustrated below:

. they . can . fish .0 1 2 3


• mother category refers to the rule that has been applied to create the edge.

• daughters is a list of the edges that acted as the daughters for this particular rule application.


Lexical Semantics

1. Meaning postulates

2. Classical lexical relations: hyponymy, meronymy, synonymy, antonymy

3. Taxonomies and WordNet

4. Classes of polysemy: homonymy, regular polysemy, vagueness

5. Word sense disambiguation


Meaning postulates

• Inference rules can be used to relate open class predicates.

• bachelor(x) -> man(x) ^ unmarried(x)

• how to define table, tomato or thought


Hyponymy

• Hyponymy is the classical relation

• e.g. dog is a hyponym of animal

• That is, the relevant sense of dog is the hyponym of animal

• animal is the hypernym of dog


• Classes of words can be categorized by hyponymy

• Some nouns - biological taxonomies, human artefacts, professions etc

• Abstract nouns, such as truth,

• Some verbs can be treated as being hyponyms of one another . e.g. murder is a hyponym of kill

• Hyponymy is essentially useless for adjectives.

• coin a hyponym of money?

• coin might be metal and also money - multiple parents might be possible


Meronymy

• The standard examples of meronymy apply to physical relationships

• arm is part of a body (arm is a meronym of body);

• steering wheel is a merony of car

• a soldier is part of an army


Synonymy and Antonymy

• True synonyms are relatively uncommon

• thin, slim, slender, skinny

• Antonymy is mostly discussed with respect to adjectives

• e.g., big/little, like/dislike


WordNet

• WordNet is the main resource for lexical semantics for English that is used in NLP, primarily because of its very large coverage and the fact that it's freely available.

• The primary organisation of WordNet is into synsets: synonym sets (near-synonyms). To illustrate this, the following is part of what WordNet returns as an `overview' of red:


• wn red -over

• Overview of adj red The adj red has 6 senses (first 5 from tagged texts)

• 1. (43) red, reddish, ruddy, blood-red, carmine, cerise, cherry, cherry-red, crimson, ruby, ruby-red, scarlet -- (having any of numerous bright or strong colors reminiscent of the color of blood or cherries or tomatoes or rubies)

• 2. (8) red, reddish -- ((used of hair or fur) of a reddish brown color; "red deer"; reddish hair")

• Nouns in WordNet are organized by hyponymy


• Sense 6

• big cat, cat

• => leopard, Panthera pardus• => leopardess• => panther

• => snow leopard, ounce, Panthera uncia

• => jaguar, panther, Panthera onca, Felis onca

• => lion, king of beasts, Panthera leo• => lioness• => lionet

• => tiger, Panthera tigris• => Bengal tiger• => tigress

• => liger

• => tiglon, tigon

• => cheetah, chetah, Acinonyx jubatus

• => saber-toothed tiger, sabertooth• => Smiledon californicus• => false saber-toothed tiger


Polysemy

• The standard example of polysemy is bank (river bank) vs bank (financial institution).

• teacher is intuitively general between male and female teachers

• Dictionaries are not much help


Word sense disambiguation

• Word sense disambiguation (WSD) is needed for most NL applications that involve semantics.

• What's changed since the 1980s is that various statistical or machine-learning techniques have been used to avoid hand-crafting rules.

• supervised learning

• unsupervised learning

• Machine readable dictionaries (MRDs)


Discourse

• Utterances are always understood in a particular context. Context-dependent situations include:

1. Referring expressions: pronouns, definite expressions etc.

2. Universe of discourse: every dog barked, doesn't mean every dog in the world but only every dog in some explicit or implicit contextual set.

3. Responses to questions, etc: only make sense in a context:

Who came to the party? Not Sandy.

4. Implicit relationships between events: Max fell. John pushed him.


Rhetorical relations and coherence

• Consider the following discourse: Max fell. John pushed him.

• This discourse can be interpreted in at least two ways:

1. Max fell because John pushed him.

2. Max fell and then John pushed him.

• In 1 the link is a form of explanation, but 2 is an example of narration.

• because and and then are said to be cue phrases.


Coherence

• Discourses have to have connectivity to be coherent:

• Kim got into her car. Sandy likes apples.

• Both of these sentences make perfect sense in isolation, but taken together they are incoherent.

• Adding context can restore coherence:

• Kim got into her car. Sandy likes apples, so Kim thought she'd go to the farm shop and see if she could get some.

• The second sentence can be interpreted as an explanation of the first.


• For example, consider a system that reports share prices. This might generate:

• In trading yesterday: Dell was up 4.2%, Safeway was down 3.2%, Compaq was up 3.1%.

• This is much less acceptable than a connected discourse:

• Computer manufacturers gained in trading yesterday: Dell was up 4.2% and Compaq was up 3.1%. But retail stocks suffered: Safeway was down 3.2%.

• Here but indicates a Contrast.


Factors influencing discourse interpretation

• Cue phrases. These are sometimes unambiguous, but not usually. e.g. and

• Punctuation (also prosody) and text structure.• Max fell (John pushed him) and Kim laughed.• Max fell, John pushed him and Kim laughed.

• Real world content:• Max fell. John pushed him as he lay on the ground.

• Tense and aspect.• Max fell. John had pushed him.• Max was falling. John pushed him.


Referring expressions

• referent a real world entity that some piece of text (or speech) refers to.

• referring expressions bits of language used to perform reference by a speaker

• antecedant the text evoking a referent.

• anaphora the phenomenon of referring to an antecedant


Pronoun agreement

• Pronouns generally have to agree in number and gender with their antecedants.

• A little girl is at the door - see what she wants, please?

• My dog has hurt his foot - he is in a lot of pain.


Reflexives

• Johni cut himself shaving.

(himself = John, subscript notation used to indicate this)

• reflexive pronouns must be coreferential with a preceeding argument of the same verb.


Pleonastic pronouns

• Pleonastic pronouns are semantically empty, and don't refer:

• It is snowing.

• It is not easy to think of good examples.

• It is obvious that Kim snores.


Salience

• There are a number of effects which cause particular pronoun referents to be preferred.

• Recency More recent referents are preferred• Kim has a fast car. Sandy has an even faster one. Lee likes to drive it.• it preferentially refers to Sandy's car, rather than Kim's.

• Grammatical role Subjects > objects > everything else:• Fred went to the Grafton Centre with Bill. He bought a CD.• he is more likely to be interpreted as Fred than as Bill.


• Repeated mention Entities that have been mentioned more frequently are preferred.• Fred was getting bored. He decided to go shopping. Bill went to the Grafton

Centre with Fred. He bought a CD.

• He=Fred (maybe) despite the general preference for subjects

• Parallelism Entities which share the same role as the pronoun in the same sort of sentence are preferred• Bill went with Fred to the Grafton Centre. Kim went with him to Lion Yard.

• Him=Fred, because the parallel interpretation is preferred.


• Coherence effects The pronoun resolution may depend on the rhetorical/discourse relation that is inferred.

• Bill likes Fred. He has a great sense of humour.

• He = Fred preferentially, possibly because the second sentence is interpreted as an explanation of the first, and having a sense of humour is seen as a reason to like someone.


Applications

• Machine Translation

• Spoken Dialogue Systems

• Machine translation (MT)• Direct transfer: map between morphologically analysed structures.

• Syntactic transfer: map between syntactically analysed structures.

• Semantic transfer: map between semantics structures.

• Interlingua: construct a language-neutral representation from parsing and use this for generation


MT using semantic transfer

Semantic transfer is an approach to MT which involves:

1. Parsing a source language string to produce a meaning representation.

2. Transforming that representation to one appropriate for the target language.

3. Generating Target Language from the transformed representation.


Dialogue Systems

Turn-taking: generally there are points where a speaker invites someone else to take a turn (possibly choosing a specic person), explicitly (e.g., by asking a question) or otherwise.

Pauses: pauses between turns are generally very short (a few hundred milliseconds, but highly culture specific).

• Longer pauses are assumed to be meaningful: example from Levinson (1983: 300)

A: Is there something bothering you or not? (1.0 sec pause)

A: Yes or no? (1.5 sec pause)

A: Eh?

B: No.


• Overlap: Utterances can overlap (the acceptability of this is dialect/culture specific but unfortunately humans tend to interrupt automated systems. this is known as barge in).

• Backchannel: Utterances like Uh-huh, OK can occur during other speaker's utterance as a sign that the hearer is paying attention.

• Attention: The speaker needs reassurance that the hearer is understanding/paying attention. Often eye contact is enough, but this is problematic with telephone conversations, dark sunglasses, etc. Dialogue systems should give explicit feedback.

• Cooperativity: Because participants assume the others are cooperative, we get effects such as indirect answers to questions.

• When do you want to leave?

• My meeting starts at 3pm.


1. Single initiative systems (also known as system initiative systems): system controls what happens when.

• System: Which station do you want to leave from?

• User: King's Cross

2. Mixed initiative dialogue. Both participants can control the dialogue to some extent.

• System: Which station do you want to leave from?

• User: I don't know, tell me which station I need for Cambridge.

3. Dialogue tracking


Email response

• The LinGO ERG constraint-based grammar has been used for parsing in commercially-deployed email response systems.

Well, I am going on vacation for the next two weeks, so the first day that I would be able to meet would be the eighteenth.

If you give me your name and your address we will send you the ticket.

The morning is good, but nine o'clock might be a little too late, as I have a seminar at ten o'clock.


References

• James Allen, Natural Language Understanding, Second Edition, Benjamin-Cummings Publishing Co., Inc. USA, 1995.

• Eugene Charniack, Statistical Language Learning, MIT Press, USA, 1996.

• Daniel Jurafsky and James H. Martin, Speech and Language Processing, Second Edition, Prentice Hall, 2008.

• Christopher D. Manning, and Heinrich Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.


Questions ?


Natural language processing

Education

Transcript of Natural language processing