Introduction to Categorical Compositional Distributional Semantics Lecture · PDF...

53
Introduction to Categorical Compositional Distributional Semantics Lecture 4: Using Density Matrices for Ambiguity and Entailment Dimitri Kartsaklis 1 Martha Lewis 2 1 Department of Theoretical and Applied Linguistics University of Cambridge 2 Department of Computer Science University of Oxford ESSLLI 2017 Toulouse, France D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 1/4

Transcript of Introduction to Categorical Compositional Distributional Semantics Lecture · PDF...

Page 1: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Introduction to Categorical CompositionalDistributional Semantics

Lecture 4: Using Density Matrices for Ambiguityand Entailment

Dimitri Kartsaklis1 Martha Lewis2

1 Department of Theoretical and Applied LinguisticsUniversity of Cambridge

2Department of Computer ScienceUniversity of Oxford

ESSLLI 2017

Toulouse, France

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 1/47

Page 2: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

What we’ve seen so far

We’ve described compositional and distributional models, andlooked at elementary category theory.

We’ve combined these together to give the core of the CCDSmodel.

We’ve described some of the tasks that CCDS can be used for.

We’ve seen how adding extra structure in the form ofFrobenius algebras allows us to model functional words suchas prepositions, relative pronouns and so on.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 2/47

Page 3: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

This talk in a nutshell

Inspired by categorical quantum mechanics, we show how theCCDS model can be extended.

We show how lexical ambiguity can be modelled, and howusing ambiguous words in a sentence can disambiguate them.

We discuss lexical entailment, and use an order on densitymatrices to give a type of entailment.

We show how the notion of lexical entailment we use interactswell with compositionality.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 3/47

Page 4: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Outline

1 Composition and lexical ambiguity

2 Open system quantum semantics

3 Textual entailment

4 Graded hyponymy

5 From theory to practice

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 4/47

Page 5: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Ambiguity in word spaces

Compositional distributional models of meaning are mainly basedon ambiguous semantic spaces:

0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.80.8

0.6

0.4

0.2

0.0

0.2

0.4

0.6

0.8

donor transplantliver

transplantation

kidney

lung

organ (medicine)

accompaniment

bass

orchestra

hymn

recital

violin

concert

organ (music)

organ

∗real vectors projected onto a 2-dimensional space using MDS

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 5/47

Page 6: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Lexical ambiguity and composition

Is this the best we can do?

−−−→organ = −−−→organmusic +−−−→organmed

Then why not having vectors like this:

−−−→guitar +

−−−−→kidney

...or even this?

−−−→book +

−−−−→banana

Kartsaklis and Sadrzadeh (EMNLP 2013, ACL 2014):

Using a step of prior disambiguation on the word vectors/tensorsbefore the composition improves the quality of the composedvectors.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 6/47

Page 7: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Homonymy and polysemy (1/2)

We distinguish between two types of lexical ambiguity:

In cases of homonymy (organ, bank, vessel etc.), due to somehistorical accident the same word is used to describe two (ormore) completely unrelated concepts.

Polysemy relates to subtle deviations between the differentsenses of the same word.

Example:

The distinction between the financial sense and the river senseof bank is a case of homonymy;

Within the financial sense, a distinction between the abstractconcept of bank as an institution and the concrete building isa case of polysemy.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 7/47

Page 8: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Homonymy and polysemy (2/2)

Example #1: “I went to the bank to open a savings account”

The word bank is used with its financial sense

The sayer refers to both of the polysemous meanings ofbankfin (institution and building) at the same time

Example #2: “I went to the bank”

The word bank is probably used with the financial sense inmind (because most of the time this is the case)

However, a small possibility that the sayer has actually visiteda river bank still exists

Main point:

Polysemy: Relatively coherent and self-contained conceptsHomonymy: Lack of specification

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 8/47

Page 9: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Homonymy and polysemy (2/2)

Example #1: “I went to the bank to open a savings account”

The word bank is used with its financial sense

The sayer refers to both of the polysemous meanings ofbankfin (institution and building) at the same time

Example #2: “I went to the bank”

The word bank is probably used with the financial sense inmind (because most of the time this is the case)

However, a small possibility that the sayer has actually visiteda river bank still exists

Main point:

Polysemy: Relatively coherent and self-contained conceptsHomonymy: Lack of specification

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 8/47

Page 10: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Setting our goals

The problem:

How can we formalize the explicit treatment of lexical ambiguity inthe categorical model of Coecke et al?

We seek a model that will allow us:

1 to express homonymous words as probabilistic mixings of theirindividual meanings;

2 to retain the ambiguity until the presence of sufficient contextthat will eventually resolve it during composition time;

3 to achieve all the above in the multi-linear setting imposed bythe vector space semantics of our original model.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 9/47

Page 11: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Outline

1 Composition and lexical ambiguity

2 Open system quantum semantics

3 Textual entailment

4 Graded hyponymy

5 From theory to practice

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 10/47

Page 12: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

A little quantum theory

Quantum mechanics and distributional models of meaning areboth based on vector space semantics

The state of a quantum system is represented by a vector in aHilbert space H. Fixing a basis for H:

|ψ〉 = c1|k1〉+ c2|k2〉+ . . .+ cn|kn〉

we take |ψ〉 to be a quantum superposition of the basis states{|ki 〉}i .

i.e. the quantum system co-exists in all basis states in parallelwith strengths denoted by the corresponding weights

Such a state is called a pure state.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 11/47

Page 13: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Word vectors as quantum states

We take words to be quantum systems, and word vectorsspecific states of these systems:

|w〉 = c1|k1〉+ c2|k2〉+ . . .+ cn|kn〉

Each element of the ONB {|ki 〉}i is essentially an atomicsymbol:

|cat〉 = 12|milk ′〉+ 8|cute ′〉+ . . .+ 0|bank ′〉

In other words, a word vector is a probability distribution overatomic symbols

|w〉 is a pure state: when word w is seen alone, it is likeco-occurring with all the basis words with strengths denotedby the various coefficients.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 12/47

Page 14: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Encoding homonymy with mixed states

Ideally, every disjoint meaning of a homonymous word mustbe represented by a distinct pure state:

|bankfin〉 = a1|k1〉+ a2|k2〉+ . . .+ an|kn〉|bankriv 〉 = b1|k1〉+ b2|k2〉+ . . .+ bn|kn〉

{ai}i 6= {bi}i , since the financial sense and the river sense areexpected to be seen in drastically different contexts

So we have two distinct states describing the same system

We cannot be certain under which state our system may befound – we only know that the former state is more probablethan the latter

In other words, the system is better described by aprobabilistic mixture of pure states, i.e. a mixed state.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 13/47

Page 15: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Density operators

Mathematically, a mixed state is represented by a densityoperator:

ρ(w) =∑

i

pi |si 〉〈si |

For example:

ρ(bank) = 0.80|bankfin〉〈bankfin|+ 0.20|bankriv 〉〈bankriv |

A density operator is a probability distribution over vectors.

Properties of a density operator ρ

Positive semi-definite: 〈v |ρ|v〉 ≥ 0 ∀v ∈ H

Of trace one: Tr(ρ) = 1

Self-adjoint: ρ = ρ†

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 14/47

Page 16: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Quantum measurements

Density operators interact with observables to producequantum measurements.

Assuming a system in state |ψ〉 and an observable A witheigen-decomposition A =

∑i ei |ei 〉〈ei |:

〈A〉ψ = 〈ψ|A|ψ〉 =∑

i

|〈ei |ψ〉|2ei

Born rule: |〈ei |ψ〉|2 is the probability of observing ei as theresult of the measurement.

For a density operator ρ =∑

j pj |ψj〉〈ψj | and the sameobservable A, we have:

〈A〉ρ =∑

j

pj〈ψj |A|ψj〉 =∑

j

∑i

pj ei |〈ei |ψj〉|2 = Tr(ρA)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 15/47

Page 17: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Complete positivity: The CPM construction

In order to apply the new formulation on the categorical model ofCoecke et al. we need:

to replace word vectors with density operators

to replace linear maps with completely positive linear maps,i.e. maps that send positive operators to positive operatorswhile respecting the monoidal structure.

Selinger (2007):

Any dagger compact closed category is associated with a categoryin which the objects are the objects of the original category, butthe maps are completely positive maps.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 16/47

Page 18: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Categorical model of meaning: Reprise

The passage from a grammar to distributional meaning isdefined according to the following composition:

PregF−→ FHilb

L−→ CPM(FHilb)

The meaning of a sentence w1w2 . . .wn with grammaticalderivation α becomes:

L(F(α)) (ρ(w1)⊗CPM ρ(w2)⊗CPM . . .⊗CPM ρ(wn))

Composition takes this form:

Subject-intransitive verb: ρIN = TrN(ρ(v) ◦ (ρ(s)⊗ 1S ))

Adjective-noun: ρAN = TrN(ρ(adj) ◦ (1N ⊗ ρ(n)))

Subj-trans. verb-Obj: ρTS = TrN,N(ρ(v) ◦ (ρ(s)⊗ 1S ⊗ ρ(o)))

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 17/47

Page 19: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Using Frobenius algebras

Compact closed categories are simple structures: there is atensor product and ε and η maps

Kartsaklis et al. (COLING 2012), Sadrzadeh et al. (MoL2013): Take advantage of the fact that every vector spacewith a fixed basis has a Frobenius structure over it

...i.e. there exist maps for copying and deleting the basis:

∆ :: |i〉 7→ |i〉 ⊗ |i〉 µ :: |i〉 ⊗ |i〉 7→ |i〉

Advantages:

Provides a way to build tensors for relational words, and simplifiesthe calculations

Introduces a second form of composition (in case of vector spaces,element-wise vector multiplication)

Useful in modelling various linguistic phenomena, e.g. intonation(Kartsaklis and Sadrzadeh, MoL 2015)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 18/47

Page 20: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Frobenius algebras and density operators

Complexity is reduced: tensors of order n become tensors oforder n − 2

Composition becomes the Hadamard product of densityoperators:

ρCS = ρ(s)� Tr(ρ(v) ◦ (1N ⊗ ρ(o)))

ρCO = Tr(ρ(v) ◦ (ρ(s)⊗ 1N))� ρ(o)

Piedeleu et al. (2015): The new formulation also allows fornon-commutative versions of Frobenius algebras:

ρCS = ρ(s) ◦ Tr(ρ(v) ◦ (1N ⊗ ρ(o)))

ρCO = Tr(ρ(v) ◦ (ρ(s)⊗ 1N)) ◦ ρ(o)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 19/47

Page 21: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Outline

1 Composition and lexical ambiguity

2 Open system quantum semantics

3 Textual entailment

4 Graded hyponymy

5 From theory to practice

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 20/47

Page 22: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Distributional Inclusion Hypothesis and distributionalsemantics

In distributional semantics terms, DIH means that thecontexts of u are included in the contexts of v .

Example: Imagine a toy corpus {‘a boy runs’, ‘a person runs’,‘a person sleeps’}; boy entails person since:

{run} ⊆ {run, sleep}

Intuition: A person can do everythingthat can be done by a boy, and more.

For two words u and v we say:

u ` v whenever F(−→u ) ⊆ F(−→v )

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 21/47

Page 23: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Extending feature inclusion to CDMs

In the presence of a compositional operator, feature inclusionadheres to set-theoretic properties.

For element-wise composition, we have:

F(−→v1 + · · ·+−→vn) = F(−→v1) ∪ · · · ∪ F(−→vn)

F(−→v1 � · · · � −→vn) = F(−→v1) ∩ · · · ∩ F(−→vn)

It is also the case that:

F(max(−→v1 , · · · ,−→vn)) = F(−→v1) ∪ · · · ∪ F(−→vn)

F(min(−→v1 , · · · ,−→vn)) = F(−→v1) ∩ · · · ∩ F(−→vn)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 22/47

Page 24: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Generic feature inclusion for tensor-based composition

In the generic case we have:w11 · · · w1n

w21 · · · w2n

......

wm1 · · · wmn

× v1

...vn

By looking the matrix as a list of column vectors(−→w1,−→w2, · · · ,−→wn), the above becomes:

v1−→w1 + v2

−→w2 + · · ·+ vn−→wn

Feature set of generic tensor-based composition:

F(w ×−→v ) =⋃

vi 6=0

F(−→wi ) =⋃

i

F(−→wi ) |F(vi )

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 23/47

Page 25: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Feature inclusion for relational tensor-based model

Grefenstette and Sadrzadeh (2011) extensional approach:

Relational words:

−→adj =

∑i

−−−→Nouni

−−→verbitr =

∑i

−→Sbj i verbtr =

∑i

−→Sbj i ⊗

−−→Obj i

Compound representations:

−→an =−→adj �−−→noun −→sv =

−−→verbitr �

−−→subj svo = verbtr � (

−−→subj ⊗

−→obj)

Feature inclusion behaviour:

F(−→sv ) =⋃

i

F(−→Sbj i ) ∩ F(−→s ) F(−→vo) =

⋃i

F(−−→Obj i ) ∩ F(−→o )

F(svo) =⋃

i

F(−→Sbj i )×F(

−−→Obj i ) ∩ F(−→s )×F(−→o )

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 24/47

Page 26: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Feature inclusion for projective tensor-based models

Kartsaklis and Sadrzadeh (2016) projective approach:

Relational words:

v itv :=∑

i

−−→Sbji ⊗

−−→Sbji vvp :=

∑i

−−→Obji ⊗

−−→Obji

v trv :=∑

i

−−→Sbji ⊗

(−−→Sbji +

−−→Obji

2

)⊗−−→Obji

Sentence vector after composition:

−→sv = −→s T × v itv =∑

i

〈−−→Sbji |−→s 〉

−−→Sbji

Feature inclusion behaviour:

F(−→sv ) =⋃

i

F(−→Sbj i ) |F

(〈−→Sbji |−→s 〉

)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 25/47

Page 27: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Preservation of entailment at the sentence level

In element-wise composition and tensor-based composition,entailment extends from word level to sentence level:

For two sentences s1 = u1 . . . un and s2 = v1 . . . vn for whichui ` vi , it is always the case that s1 ` s2.

E.g. consider two intransitive sentences for which

F(−−→subj1) ⊆ F(

−−→subj2) and F(

−−→verb1) ⊆ F(

−−→verb2); we have:

F(−−→subj1) ∩ F(

−−−→verb1) ⊆ F(

−−→subj2) and F(

−−→subj1) ∩ F(

−−−→verb1) ⊆ F(

−−→verb2)

and consequently:

F(−−→subj1) ∩ F(

−−→verb1) ⊆ F(

−−→subj2) ∩ F(

−−→verb2)

(Proof for the tensor case: Balkır, Kartsaklis, Sadrzadeh (2016))

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 26/47

Page 28: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Outline

1 Composition and lexical ambiguity

2 Open system quantum semantics

3 Textual entailment

4 Graded hyponymy

5 From theory to practice

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 27/47

Page 29: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Entailment and hyponymy

We use density matrices, or more generally, positive operatorsto represent words.

Positive operators A, B have the Lowner orderingA v B ⇐⇒ B − A is positive.

We use this ordering to represent entailment, and introduce agraded version - useful for linguistic phenomena.

We will show that graded entailment lifts compositionally tosentence level.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 28/47

Page 30: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Words as positive operators

A positive operator P is self-adjoint and ∀v ∈ V .〈v |P|v〉 ≥ 0

Density matrices are convex combinations of projectors:ρ =

∑i pi |vi 〉〈vi |, s.t.

∑i pi = 1

We can view concepts as collections of items, with pi

indicating relative frequency.

For example:

JpetK =pd |dog〉〈dog|+ pc |cat〉〈cat|+pt |tarantula〉〈tarantula|+ ...

where ∀i .pi ≥ 0 and∑

i

pi = 1

There are various choices for normalisation of the densitymatrices...

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 29/47

Page 31: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

A transitive sentence in CPM(FHilb)

Suppose we have finite dimensional real Hilbert spaces N and Sand vectors :

|Annie〉, |Betty〉, |Clara〉, |beer〉, |wine〉 ∈ N

|likes〉, |appreciates〉 ∈ N ⊗ S ⊗ N

Then, we can set:

Jthe sistersK =1

3(|Annie〉〈Annie|+ |Betty〉〈Betty|+ |Clara〉〈Clara|)

JdrinksK =1

2(|beer〉〈beer|+ |wine〉〈wine|)

JenjoyK =1

2(|like〉〈like|+ |appreciate〉〈appreciate|)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 30/47

Page 32: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

A transitive sentence in CPM(FHilb)

Suppose we have finite dimensional real Hilbert spaces N and Sand vectors :

|Annie〉, |Betty〉, |Clara〉, |beer〉, |wine〉 ∈ N

|likes〉, |appreciates〉 ∈ N ⊗ S ⊗ N

Then, we can set:

Jthe sistersK =1

3(|Annie〉〈Annie|+ |Betty〉〈Betty|+ |Clara〉〈Clara|)

JdrinksK =1

2(|beer〉〈beer|+ |wine〉〈wine|)

JenjoyK =1

2(|like〉〈like|+ |appreciate〉〈appreciate|)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 30/47

Page 33: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

A transitive sentence in CPM(FHilb)

Then s = The sisters enjoy drinks is given by:

JsK = (ε⊗ 1⊗ ε)(Jthe sistersK⊗ JenjoyK⊗ JdrinksK)

The sisters enjoy drinks

S NN NN=

N S N ′ N ′ N ′NNN N ′S

The sisters enjoy drinks

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 31/47

Page 34: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Entailment for Positive Operators

We order positive operators using the Lowner order andinterpret this as entailment:

A v B ⇐⇒ B − A is positive

As the Lowner order restricts to the usual ordering onprojection operators, we can embed quantum logic[Birkhoff and Von Neumann, 1936] within the poset ofprojection operators, providing a direct link to existing theory.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 32/47

Page 35: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Entailment for Positive Operators

We order positive operators using the Lowner order andinterpret this as entailment:

A v B ⇐⇒ B − A is positive

As the Lowner order restricts to the usual ordering onprojection operators, we can embed quantum logic[Birkhoff and Von Neumann, 1936] within the poset ofprojection operators, providing a direct link to existing theory.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 32/47

Page 36: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

So how do we do graded hyponymy?

Recall that positive operators A, B have the Lowner orderingA v B ⇐⇒ B − A is positive.

We say that A is a hyponym of B if A v B

We say that A is a k-hyponym of B for a given value of k inthe range (0, 1] and write A 2k B if:

B − kA is positive

We are interested in the maximum such k .

Theorem

For positive self-adjoint matrices A, B such thatsupp(A) ⊆ supp(B), the maximum k such that B − kA ≥ 0 isgiven by 1/λ where λ is the maximum eigenvalue of B+A.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 33/47

Page 37: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Properties of k-hyponymy

Reflexivity: k-hyponymy is reflexive for k = 1.

Symmetry: k-hyponymy is anti-symmetric for k = 1, butneither symmetric nor anti-symmetric for k ∈ (0, 1).

Transitivity: k-hyponymy satisfies a version of transitivity.Suppose A 2k B and B 2l C . Then A 2kl C .

Continuity: For A 2k B, when there is a small perturbation toA, there is a correspondingly small decrease in the value of k .The perturbation must lie in the support of B, but canintroduce off-diagonal elements.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 34/47

Page 38: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

k-hyponymy interacts well with compositionality

We would like our notion of entailment to work at thesentence level.

Since sentences are represented as positive operators, we cancompare them directly.

If sentences have similar structure, we can also give a lowerbound on the entailment strength between sentences based onthe entailment strengths between the words in the sentences.

Example

Suppose JdogK 2k JpetK and JparkK 2l JfieldK. Then

JMy dog runs in the parkK 2??? JMy pet runs in the fieldK

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 35/47

Page 39: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

k-hyponymy interacts well with compositionality

Theorem

Let Φ = A1 . . .An and Ψ = B1 . . .Bn be two positive phrases ofthe same length and grammatical structure ϕ. Let theircorresponding density matrices be denoted by JA1K, . . . , JAnKand JB1K, . . . , JBnK respectively. Suppose that JAiK 2ki

JBiKfor i ∈ {1, . . . , n} and some ki ∈ (0, 1]. Then:

ϕ(Φ) 2k1···kn ϕ(Ψ).

so k1 · · · kn provides a lower bound on the extent to which ϕ(Φ)entails ϕ(Ψ)

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 36/47

Page 40: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Example

Suppose we have pure states JnibbleK, JscoffK, JcakeK, JchocolateK.The more general eat and sweets are given by:

JeatK =1

2(JnibbleK + JscoffK), JsweetsK =

1

2(JcakeK + JchocolateK)

Then

JscoffK 21/2 JeatK, JcakeK 21/2 JsweetsK

We consider the sentences:

Js1K = ϕ(JMaryK⊗ JscoffsK⊗ JcakeK)

Js2K = ϕ(JMaryK⊗ JeatsK⊗ JsweetsK)

We will show that Js1K 2kl Js2K where kl = 12 ×

12 = 1

4

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 37/47

Page 41: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Example (cont’d)

Expanding Js2K we obtain:

Js2K = ϕ(JMaryK⊗ 1

2(JnibblesK + JscoffsK)⊗ 1

2(JcakeK + JchocK))

=1

4Js1K +

1

4(ϕ(JMaryK⊗ JscoffsK⊗ JchocK)

+ ϕ(JMaryK⊗ JnibblesK⊗ JcakeK)

+ ϕ(JMaryK⊗ JnibblesK⊗ JchocK))

Therefore:

Js2K−1

4Js1K =

1

4(ϕ(JMaryK⊗ JchocK⊗ JchocK))

+ ϕ(JMaryK⊗ JnibblesK⊗ JcakeK)

+ ϕ(JMaryK⊗ JnibblesK⊗ JchocK))

We can see that Js2K− 14Js1K is positive by the fact that positivity

is preserved under addition and tensor product. Therefore:

Js1K 2kl Js2K

as required.D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 38/47

Page 42: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Mixing entailment and ambiguity

CPM construction applies to any dagger compact closedcategory, even to CPM(FHilb) itself

In other words, we can have density operators of densityoperators,

...i.e. a probability distribution over a set of probabilitydistributions

We can use this fact to encode two (or more) distinct kinds ofinformation

Example: Balkır (2014) uses a form of density operators toencode textual entailment. We can encode both ambiguityand entailment information as follows:

ρ(bank) = 0.5|Bamb〉〈Bamb|+ 0.5|Bent〉〈Bent |

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 39/47

Page 43: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Outline

1 Composition and lexical ambiguity

2 Open system quantum semantics

3 Textual entailment

4 Graded hyponymy

5 From theory to practice

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 40/47

Page 44: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Creating density operators

Density operators can be created with standard WSI methods.For example:

Schutze (1998):

1 Create vectors for all contexts of a target word, e.g. by averagingthe vectors of other words in the same sentence

2 Cluster those context vectors

3 Use the centroid of the produced clusters as sense vectors.

This will produce a statistical ensemble {(pi , |si 〉)}i that canbe used for creating density operators:

ρ(w) =∑

i

pi |si 〉〈si |

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 41/47

Page 45: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Measuring ambiguity

How does ambiguity evolve from homonymous words (e.g.‘nail’) to unambiguous compounds (‘rusty nail’, ‘nail thatgrows’)?

We can measure it with Von Neumann entropy.

Von Neuman entropy is zero for a pure state (i.e. acompletely unambiguous word), and ln dim(H) for amaximally mixed state.

For a density operator ρ with eigen-decompositionρ =

∑i ei |ei 〉〈ei |, Von Neumann entropy is defined as:

S(ρ) = −Tr(ρ ln ρ) = −∑

i

ei ln ei

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 42/47

Page 46: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Measuring entropy: A small-scale experiment

Relative Clausesnoun: verb1/verb2 noun noun that verb1 noun that verb2

organ: enchant/ache 0.18 0.11 0.08vessel : swell/sail 0.25 0.16 0.01queen: fly/rule 0.28 0.14 0.16nail : gleam/grow 0.19 0.06 0.14bank: overflow/loan 0.21 0.19 0.18

Adjectivesnoun: adj1/adj2 noun adj1 noun adj2 nounorgan: music/body 0.18 0.10 0.13vessel : blood/naval 0.25 0.05 0.07queen: fair/chess 0.28 0.05 0.16nail : rusty/finger 0.19 0.04 0.11bank: water/financial 0.21 0.20 0.16

An important aspect of the proposed model:

Disambiguation = Purification

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 43/47

Page 47: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Compactclosure

CategoricalQuantumMechanics

OriginalDisCo model

Open systemextension

Vector spacesemantics

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47

Page 48: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Compactclosure

CategoricalQuantumMechanics

OriginalDisCo model

Open systemextension

Vector spacesemantics

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47

Page 49: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Compactclosure

CategoricalQuantumMechanics

OriginalDisCo model

Open systemextension

Vector spacesemantics

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47

Page 50: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Compactclosure

CategoricalQuantumMechanics

OriginalDisCo model

Open systemextension

Vector spacesemantics

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47

Page 51: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

Summary

Density operators offer richer semantics representations fordistributional models of meaning

From probability distributions over symbols we advance toprobability distributions over vectors

The nested levels of CPM construction is an intriguing featurethat deserves separate treatment

Density operators support a form of logic whose distributionaland compositional properties remains to be examined

Large-scale experimental evaluation currently in progress

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 45/47

Page 52: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

References I

Abramsky, S. and Coecke, B. (2004).

A categorical semantics of quantum protocols.In 19th Annual IEEE Symposium on Logic in Computer Science, pages 415–425.

Balkır, E. (2014).

Using density matrices in a compositional distributional model of meaning.Master’s thesis, University of Oxford.

Birkhoff, G. and Von Neumann, J. (1936).

The logic of quantum mechanics.Annals of Mathematics, pages 823–843.

Coecke, B., Sadrzadeh, M., and Clark, S. (2010).

Mathematical Foundations for a Compositional Distributional Model of Meaning. Lambek Festschrift.Linguistic Analysis, 36:345–384.

Kartsaklis, D., Kalchbrenner, N., and Sadrzadeh, M. (2014).

Resolving lexical ambiguity in tensor regression models of meaning.In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers), pages 212–217, Baltimore, Maryland. Association for Computational Linguistics.

Kartsaklis, D. and Sadrzadeh, M. (2013).

Prior disambiguation of word tensors for constructing sentence vectors.In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages1590–1601, Seattle, Washington, USA. Association for Computational Linguistics.

Kartsaklis, D. and Sadrzadeh, M. (2015).

A Frobenius model of information structure in categorical compositional distributional semantics.In Proceedings of the 14th Meeting on Mathematics of Language.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 46/47

Page 53: Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

References II

Kartsaklis, D., Sadrzadeh, M., and Pulman, S. (2012).

A unified sentence space for categorical distributional-compositional semantics: Theory and experiments.In Proceedings of 24th International Conference on Computational Linguistics (COLING 2012): Posters,pages 549–558, Mumbai, India. The COLING 2012 Organizing Committee.

Kartsaklis, D., Sadrzadeh, M., Pulman, S., and Coecke, B. (2015).

Reasoning about meaning in natural language with compact closed categories and Frobenius algebras.In Chubb, J., Eskandarian, A., and Harizanov, V., editors, Logic and Algebraic Structures in QuantumComputing and Information, Association for Symbolic Logic Lecture Notes in Logic. Cambridge UniversityPress.

Piedeleu, R., Kartsaklis, D., Coecke, B., and Sadrzadeh, M. (2015).

Open system categorical quantum semantics in natural language processing.arXiv preprint arXiv:1502.00831.

Sadrzadeh, M., Clark, S., and Coecke, B. (2013).

The Frobenius anatomy of word meanings I: subject and object relative pronouns.Journal of Logic and Computation, Advance Access.

Sadrzadeh, M., Clark, S., and Coecke, B. (2014).

The Frobenius anatomy of word meanings II: Possessive relative pronouns.Journal of Logic and Computation.

Selinger, P. (2007).

Dagger compact closed categories and completely positive maps.Electronic Notes in Theoretical Computer Science, 170:139–163.

D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 47/47