IN4080 Natural Language Processing

IN4080 – 2019 FALLNATURAL LANGUAGE PROCESSING

Jan Tore Lønning

Lecture 8, 8 Oct

Information extraction, pipelines

Sentence structure:

Constituents and phrases

Treebanks

Information extraction, IE

Chunking

Named entity recognition

Relation extraction, 5 different ways

Pipelines

Sentences have inner structure

Sentence: a sequence of words

Properties of words:

morphology, tags, embeddings

Probabilities of sequences

Sentences have inner structure

The structure determines

whether the sentence is

grammatical or not

The structure determines how to

understand the sentence

So far But

Why syntax?

Some sequences of words are

well-formed meaningful

sentences.

Others are not:

Are meaningful of some sentences

sequences well-formed words

It makes a difference:

A dog bit the man.

The man bit a dog.

BOW-models don't capture this

difference

Two ways to describe sentence structure6

Phrase structure Dependency structure

Focus of INF2820 Focus of IN2110

Constituent: A group of word which functions as a unit in the sentence

See Wikipedia: Constituent for criteria of constituency

Phrase: A sequence of words which "belong together"

= constituent (for us)

In some theories a phrase is a constituent of more than one word

The small, cute dog

The dog from Baskerville

enjoied

the apple

the small, cute dog

the apple that Kim had stolen from the store

Phrases

Phrases can be classified into categories:

Noun Phrases, Verb Phrases, Prepositional Phrases, etc.

Phrases of the same category have similar distribution,

e.g. NPs can replace names

(but there are restrictions on case, number, person, gender agreement, etc.)

Phrases of the same category have similar structure, simplified:

NP (roughly): (DET) ADJ* N PP* (+ some alternatives, e.g. pronoun)

PP: PREP NP

Phrase structure

A sentence is hierarchically

ordered into phrases

Various syntactic theories and

models and NLP tools depart

with respect to the actual trees:

Models based on X-bar theory

prefer "deep threes": binary

branching

Penn treebank prefers shallow

A Penn treebank tree10

Sentence structure:

Treebanks

Chunking

Pipelines

Treebanks

A collection of analyzed sentences/trees

Penn treebank is best known

Treebanks

Treebanks are corpora in which each sentence has been paired with a parse tree (presumably the right one).

These are generally created

By first parsing the collection with an automatic parser

And then having human annotators correct each parse as necessary.

This requires detailed annotation guidelines that provide a POS tagset, a grammar and instructions for how to deal with particular grammatical constructions.

Treebank Grammars

Treebanks implicitly define a grammar for the language covered in the treebank.

Such grammars tend to be very flat due to the fact that they tend to avoid recursion.

To ease the annotators burden

For example, the Penn Treebank has 4500 different rules for VPs. Among them...

Speech and Language Processing - Jurafsky and Martin

Different types of treebanks

Hand-made

Human annotators assign trees.

The trees define a grammar:

Many rules

Penn uses flat trees

Parse bank

Start with a grammar

And a parser

Parse the sentences

A human annotator selects the best analysis between the candidates

May be used for training a parse ranker

October 7, 2019

Treebanks

There are available free dependency treebanks for many languages

The place to start in these days: http://universaldependencies.org/

CONLL-formats:

One word per line, a number of columns for various information

CONLL-X, CONLL-U – different POSTAGs

fromAndrei's INF5830 slides

Sentence structure:

Treebanks

Chunking

Pipelines

IE basics

Bottom-Up approach

Start with unrestricted texts, and do the best you can

The approach was in particular developed by the Message Understanding Conferences (MUC) in the 1990s

Select a particular domain and task

Information extraction (IE) is the task of

automatically extracting structured information

from unstructured and/or semi-structured

machine-readable documents. (Wikipedia)

Steps19

(Some appro-

aches do these

steps in a

different order

– or

simultaneously)From NLTK

Some example systems20

Stanford core nlp: http://corenlp.run/

SpaCy (Python): https://spacy.io/docs/api/

OpenNLP (Java): https://opennlp.apache.org/docs/

GATE (Java): https://gate.ac.uk/

UDPipe: http://ufal.mff.cuni.cz/udpipe

Online demo: http://lindat.mff.cuni.cz/services/udpipe/

Dependency parsing:

Wrap-up

Evaluation

Treebanks and pipelines

Chunking

Next steps

Chunk together words to phrases

NP-chunks

Exactly what is an NP-chunk?

It is an NP

But not all NPs are chunks

Flat structure: no NP-chunk is part of another NP chunk

Maximally large

Opposing restrictions

[ The/DT market/NN ] for/IN

[ system-management/NN software/NN ] for/IN

[ Digital/NNP ]

[ 's/POS hardware/NN ] is/VBZ fragmented/JJ enough/RB that/IN

[ a/DT giant/NN ] such/JJ as/IN

[ Computer/NNP Associates/NNPS ] should/MD do/VB well/RB there/RB ./.

Regular Expression Chunker

Input POS-tagged sentences

Use a regular expression over POS to identify NP-chunks

NLTK example:

It inserts parentheses

grammar = r"""NP: {<DT|PP\$>?<JJ>*<NN>}

{<NNP>+} """

IOB-tags

B-NP: First word in NP

I-NP: Part of NP, not first word

O: Not part of NP (phrase)

Properties

One tag per token

Unambiguous

Does not insert anything in the

text itself

Assigning IOB-tags

The process can be considered a form for tagging

POS-tagging: Word to POS-tag

IOB-tagging: POS-tag to IOB-tag

But one may in addition use additional features, e.g. words

Can use various types of classifiers

NLTK uses a MaxEnt Classifier (=LogReg, but the implementation is slow)

We can modify along the lines of mandatory assignment 2, using scikit-learn

J&M, 3. ed.

Evaluating (IOB-)chunkers

cp = nltk.RegexpParser("")

test_sents = conll ('test', chunks=['NP'])

IOB Accuracy: 43.4%

Precision: 0.0%

Recall: 0.0%

F-Measure: 0.0%

What do we evaluate?

IOB-tags? or

Whole chunks?

Yields different results

For IOB-tags:

Baseline: majority class O,

yields > 33%

Whole chunks:

Which chunks did we find?

Harder

Lower numbers

Evaluating (IOB-)chunkers

cp = nltk.RegexpParser("")

test_sents = conll ('test',

chunks=['NP'])

IOB Accuracy: 43.4%

Precision: 0.0%

Recall: 0.0%

F-Measure: 0.0%

>> cp = nltk.RegexpParser(

r"NP: {<[CDJNP].*>+}")

IOB Accuracy: 87.7%

Precision: 70.6%

Recall: 67.8%

F-Measure: 69.2%

Sentence structure:

Treebanks

Chunking

Pipelines

Named entities31

Named entity:

Anything you can refer to by a proper name

i.e. not all NP (chunks):

high fuel prices

Maybe longer NP than just chunk:

Bank of America

Find the phrases

Classify them

Citing high fuel prices, [ORG United Airlines]

said [TIME Friday] it has increased fares by

[MONEY $6] per round trip on flights to

some cities also served by lower-cost

carriers. [ORG American Airlines], a unit of

[ORG AMR Corp.], immediately matched the

move, spokesman [PER Tim Wagner] said.

[ORG United], a unit of [ORG UAL Corp.],

said the increase took effect [TIME Thursday]

and applies to most routes where it

competes against discount carriers, such as

[LOC Chicago] to [LOC Dallas] and [LOC

Denver] to [LOC San Francisco].

Types of NE

The set of types vary between different systems

Which classes are useful depend on application

Ambiguities33

Gazetteer

Useful: List of names,

Gazetteer: list of

geographical names

But does not remove all

ambiguities

cf. example

Representation (IOB)35

Feature-based NER

Similar to tagging and chunking

You will need features from several layers

Features may include

Words, POS-tags, Chunk-tags, Graphical prop.

and more (See J&M, 3.ed)

Feature-based NER algorithms37

Greedy decoding

"Word-by word", decide for the first word, then for the second word, etc.

Can use various learners, e.g. Logistic regression (MaxEnt)

We can use our set-up for mandatory 2 with smaller adjustments

For shortcomings and better alternatives, c.f. J&M, 3. ed, ch.8:

Maximum Entropy Markov Models (MEMM)

Conditional random fields (Preferred approach until recently

Neural NER

The last years: neural architectures show the best results

J&M, 3. ed., ch. 17, sec. 17.1.3, not curriculum in IN4080

IN5550

Evaluation

Have we found the correct NERs?

Evaluate precision and recall as for chunking

For the correctly identified NERs, have we labelled them correctly?

Sentence structure:

Treebanks

Chunking

Pipelines

Extract the relations that exist

between the (named) entities in the

A fixed set of relations (normally)

Determined by application:

Jeopardy

Preventing terrorist attacks

Detecting illness from medical record

• Born_in

• Date_of_birth

• Parent_of

• Author_of

• Winner_of

• Part_of

• Located_in

• Acquire

• Threaten

• Has_symptom

• Has_illness

Examples42

Methods for relation extraction43

1. Hand-written patterns

2. Machine Learning (Supervised classifiers)

3. Semi-supervised classifiers via bootstrapping

4. Semi-supervised classifiers via distant supervision

5. Unsupervised

Example: acquisitions

[ORG]…( buy(s)|

bought|

aquire(s|d) )…[ORG]

Hand-write patterns like this

Properties:

High precision

Will only cover a small set of

patterns

Low recall

Time consuming

(Also in NLTK, sec 7.6)

Example45

5. Unsupervised

2. Supervised classifiers47

A corpus

A fixed set of entities and relations

The sentences in the corpus are hand-annotated:

Entities

Relations between them

Split the corpus into parts for training and testing

Train a classifier:

Choose learner: Naive Bayes, Logistic regression (Max Ent), SVM, …

Select features

2. Supervised classifiers, contd.48

Training:

Use pairs of entities within the same sentence with no relation between them

as negative data

Classification

1. Find the NERs

2. For each pair of NERs determine whether there is a relation between them

3. If there is, label the relation

Examples of features49

American

Airlines, a unit

of AMR,

immediately

matched the

spokesman Tim

Wagner said

Properties50

The bottleneck is the availability of training data

To hand label data is time consuming

Mostly applied to restricted domains

Does not generalize well to other domains

5. Unsupervised

3. Semisupervised, bootstrapping

If we know a pattern for a relation, we can determine whether a pair stands in the relation

Conversely: If we know that a pair stands in a relationship, we can find patterns that describe the relation

Pairs:

IBM – AlchemyAPI

Google – YouTube

Facebook - WhatsApp

Patterns:

[ORG]…bought…[ORG]

Relation

ACQUIRE

Example53

(IBM, AlchemyAPI): ACQUIRE

Search for sentences containing IBM and AlchemyAPI

Results (Web-search, Google, btw. first 10 results):

IBM's Watson makes intelligent acquisition of Denver-based AlchemyAPI(Denver Post)

IBM is buying machine-learning systems maker AlchemyAPI Inc. to bolster its Watson technology as competition heats up in the data analytics and artificial intelligence fields. (Bloomberg)

IBM has acquired computing services provider AlchemyAPI to broaden its portfolio of Watson-branded cognitive computing services. (ComputerWorld)

Example contd.54

Extract patterns

IBM's Watson makes intelligent acquisition of Denver-based AlchemyAPI

(Denver Post)

IBM is buying machine-learning systems maker AlchemyAPI Inc. to bolster its

Watson technology as competition heats up in the data analytics and artificial

intelligence fields. (Bloomberg)

IBM has acquired computing services provider AlchemyAPI to broaden its

portfolio of Watson-branded cognitive computing services. (ComputerWorld)

Procedure

From the extracted sentences,

we extract patterns

Use these patterns to extract

more pairs of entities that stand

in these patterns

These pairs may again be used

for extracting more patterns,

…makes intelligent acquisition …

… is buying …

… has acquired …

Bootstrapping56

A little more57

We could

either extract pattern templates and searching for these

or features for classification and build a classifier

If we use patterns we should generalize

makes intelligent acquisition (make(s)|made) JJ* acquisition

During the process we should evaluate before we extend:

Does the new pattern recognize other pairs we know stand in the relation?

Does the new pattern return pairs that are not in the relation? (Precision)

5. Unsupervised

4. Distant supervision for RE

Combine:

A large external knowledge base, e.g. Wikipedia, Word-net

Large amounts of unlabeled text

Extract tuples that stand in known relation from knowledge base:

Many tuples

Follow the bootstrapping technique on the text

4. Distant supervision for RE

Properties:

Large data sets allow for

fine-grained features

combinations of features

Evaluation

Requirement

Large knowledge-base

5. Unsupervised

5. Unsupervised relation extraction

Open IE

Example:

1. Tag and chunk

2. Find all word sequences

satisfying certain syntactic constraints,

in particular containing a verb

These are taken to be the relations

3. For each such, find the immediate non-vacuous NP to the left and to the right

4. Assign a confidence score

United has a hub in Chicago, which is the headquarters of United Continental Holdings.

r1: <United, has a hub in, Chicago>

r2: <Chicago, is the headquarters of, United Continental Holdings>

Evaluating relation extraction

Supervised methods can be

evaluated on each of the

examples in a test set.

For the semi-supervised

method:

we don’t have a test set.

we can evaluate the precision of

the returned examples manually

Beware the difference between

Determine for a sentence

whether an entity pair in the sen-

tence is in a particular relation

Recall and precision

Determine from a text:

We may use several occurrences

of the pair in the text to draw a

conclusion

Precision

We skip the confidence scoring

More fine grained IE

Tokenization+tagging

Identifying the "actors"

Chunking

Named-entity recognition

Co-reference resolution

Relation detection

Event detection

Co-reference resolution of events

Temporal extraction

Template filling

So far Possible refinements

Sentence structure:

Treebanks

Chunking

Pipelines

Some example systems66

Stanford core nlp: http://corenlp.run/

SpaCy (Python): https://spacy.io/docs/api/

OpenNLP (Java): https://opennlp.apache.org/docs/

GATE (Java): https://gate.ac.uk/

UDPipe: http://ufal.mff.cuni.cz/udpipe

Online demo: http://lindat.mff.cuni.cz/services/udpipe/

IN4080 Natural Language Processing · Can use various learners, e.g. Logistic regression (MaxEnt)...

Documents

Transcript of IN4080 Natural Language Processing · Can use various learners, e.g. Logistic regression (MaxEnt)...

A Semisupervised Support Vector Machines Algorithm for BCI ...

Semisupervised Learning A brief introduction. Semisupervised Learning Introduction Types of semisupervised learning Paper for review References.

A Lazy Man's Approach to Benchmarking: Semisupervised ......A Lazy Man’s Approach to Benchmarking: Semisupervised Classiﬁer Evaluation and Recalibration Peter Welinder*‡ Max

Overviewandconstructionofmeshfreebasisfunctions:From ...dilbert.engr.ucdavis.edu/~suku/maxent/papers/maxent...and ﬁnite element and meshfree methods to name a few. In this paper,

Maxent Models (II)

Introduction to sdm with Maxent Johannes S

MaxEnt’14, The 34th International Workshop on Bayesian ...djafari.free.fr › MaxEnt2014 › slides › MaxEnt2014_opening_slides_FB.… · MaxEnt’14, The 34th International Workshop

Maxent Models and Discriminative Estimation

600 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · Learning Semi-Riemannian Metrics for Semisupervised Feature ... 1041-4347/11/$26.00 2011 IEEE Published by the ... METRICS FOR SEMISUPERVISED

Semisupervised SVM Based on Cuckoo Search Algorithm and Its … · 2019. 7. 30. · ResearchArticle Semisupervised SVM Based on Cuckoo Search Algorithm and Its Application ZipingHe

Maxent interface. Maximum Entropy (Maxent) Deterministic Precise mathematical definition Continuous and categorical environmental data Continuous output.

Classifieur d'entropie maximale (MaxEnt)

Intro to Maxent

Motivation Background Dataset€¦ · Motivation Background Dataset Classification Summary Classifier Setup Accuracy(%) Maxent 66.7 Maxent + LIWC 69.1 Maxent + Psycho-Linguistic Norm

Semisupervised alignment of manifolds

A Semisupervised Learning Method to Merge Search Engine Results

Semisupervised Hyperspectral Image Segmentation Using ... · Semisupervised Hyperspectral Image Segmentation Using Multinomial Logistic Regression With Active Learning Jun Li, José

Introduction to SDM with Maxent JohannesS Signer

MaxEnt 2009 talk