Introduction to Categorical CompositionalDistributional Semantics
Lecture 4: Using Density Matrices for Ambiguityand Entailment
Dimitri Kartsaklis1 Martha Lewis2
1 Department of Theoretical and Applied LinguisticsUniversity of Cambridge
2Department of Computer ScienceUniversity of Oxford
ESSLLI 2017
Toulouse, France
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 1/47
What we’ve seen so far
We’ve described compositional and distributional models, andlooked at elementary category theory.
We’ve combined these together to give the core of the CCDSmodel.
We’ve described some of the tasks that CCDS can be used for.
We’ve seen how adding extra structure in the form ofFrobenius algebras allows us to model functional words suchas prepositions, relative pronouns and so on.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 2/47
This talk in a nutshell
Inspired by categorical quantum mechanics, we show how theCCDS model can be extended.
We show how lexical ambiguity can be modelled, and howusing ambiguous words in a sentence can disambiguate them.
We discuss lexical entailment, and use an order on densitymatrices to give a type of entailment.
We show how the notion of lexical entailment we use interactswell with compositionality.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 3/47
Outline
1 Composition and lexical ambiguity
2 Open system quantum semantics
3 Textual entailment
4 Graded hyponymy
5 From theory to practice
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 4/47
Ambiguity in word spaces
Compositional distributional models of meaning are mainly basedon ambiguous semantic spaces:
0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.80.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
donor transplantliver
transplantation
kidney
lung
organ (medicine)
accompaniment
bass
orchestra
hymn
recital
violin
concert
organ (music)
organ
∗real vectors projected onto a 2-dimensional space using MDS
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 5/47
Lexical ambiguity and composition
Is this the best we can do?
−−−→organ = −−−→organmusic +−−−→organmed
Then why not having vectors like this:
−−−→guitar +
−−−−→kidney
...or even this?
−−−→book +
−−−−→banana
Kartsaklis and Sadrzadeh (EMNLP 2013, ACL 2014):
Using a step of prior disambiguation on the word vectors/tensorsbefore the composition improves the quality of the composedvectors.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 6/47
Homonymy and polysemy (1/2)
We distinguish between two types of lexical ambiguity:
In cases of homonymy (organ, bank, vessel etc.), due to somehistorical accident the same word is used to describe two (ormore) completely unrelated concepts.
Polysemy relates to subtle deviations between the differentsenses of the same word.
Example:
The distinction between the financial sense and the river senseof bank is a case of homonymy;
Within the financial sense, a distinction between the abstractconcept of bank as an institution and the concrete building isa case of polysemy.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 7/47
Homonymy and polysemy (2/2)
Example #1: “I went to the bank to open a savings account”
The word bank is used with its financial sense
The sayer refers to both of the polysemous meanings ofbankfin (institution and building) at the same time
Example #2: “I went to the bank”
The word bank is probably used with the financial sense inmind (because most of the time this is the case)
However, a small possibility that the sayer has actually visiteda river bank still exists
Main point:
Polysemy: Relatively coherent and self-contained conceptsHomonymy: Lack of specification
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 8/47
Homonymy and polysemy (2/2)
Example #1: “I went to the bank to open a savings account”
The word bank is used with its financial sense
The sayer refers to both of the polysemous meanings ofbankfin (institution and building) at the same time
Example #2: “I went to the bank”
The word bank is probably used with the financial sense inmind (because most of the time this is the case)
However, a small possibility that the sayer has actually visiteda river bank still exists
Main point:
Polysemy: Relatively coherent and self-contained conceptsHomonymy: Lack of specification
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 8/47
Setting our goals
The problem:
How can we formalize the explicit treatment of lexical ambiguity inthe categorical model of Coecke et al?
We seek a model that will allow us:
1 to express homonymous words as probabilistic mixings of theirindividual meanings;
2 to retain the ambiguity until the presence of sufficient contextthat will eventually resolve it during composition time;
3 to achieve all the above in the multi-linear setting imposed bythe vector space semantics of our original model.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 9/47
Outline
1 Composition and lexical ambiguity
2 Open system quantum semantics
3 Textual entailment
4 Graded hyponymy
5 From theory to practice
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 10/47
A little quantum theory
Quantum mechanics and distributional models of meaning areboth based on vector space semantics
The state of a quantum system is represented by a vector in aHilbert space H. Fixing a basis for H:
|ψ〉 = c1|k1〉+ c2|k2〉+ . . .+ cn|kn〉
we take |ψ〉 to be a quantum superposition of the basis states{|ki 〉}i .
i.e. the quantum system co-exists in all basis states in parallelwith strengths denoted by the corresponding weights
Such a state is called a pure state.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 11/47
Word vectors as quantum states
We take words to be quantum systems, and word vectorsspecific states of these systems:
|w〉 = c1|k1〉+ c2|k2〉+ . . .+ cn|kn〉
Each element of the ONB {|ki 〉}i is essentially an atomicsymbol:
|cat〉 = 12|milk ′〉+ 8|cute ′〉+ . . .+ 0|bank ′〉
In other words, a word vector is a probability distribution overatomic symbols
|w〉 is a pure state: when word w is seen alone, it is likeco-occurring with all the basis words with strengths denotedby the various coefficients.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 12/47
Encoding homonymy with mixed states
Ideally, every disjoint meaning of a homonymous word mustbe represented by a distinct pure state:
|bankfin〉 = a1|k1〉+ a2|k2〉+ . . .+ an|kn〉|bankriv 〉 = b1|k1〉+ b2|k2〉+ . . .+ bn|kn〉
{ai}i 6= {bi}i , since the financial sense and the river sense areexpected to be seen in drastically different contexts
So we have two distinct states describing the same system
We cannot be certain under which state our system may befound – we only know that the former state is more probablethan the latter
In other words, the system is better described by aprobabilistic mixture of pure states, i.e. a mixed state.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 13/47
Density operators
Mathematically, a mixed state is represented by a densityoperator:
ρ(w) =∑
i
pi |si 〉〈si |
For example:
ρ(bank) = 0.80|bankfin〉〈bankfin|+ 0.20|bankriv 〉〈bankriv |
A density operator is a probability distribution over vectors.
Properties of a density operator ρ
Positive semi-definite: 〈v |ρ|v〉 ≥ 0 ∀v ∈ H
Of trace one: Tr(ρ) = 1
Self-adjoint: ρ = ρ†
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 14/47
Quantum measurements
Density operators interact with observables to producequantum measurements.
Assuming a system in state |ψ〉 and an observable A witheigen-decomposition A =
∑i ei |ei 〉〈ei |:
〈A〉ψ = 〈ψ|A|ψ〉 =∑
i
|〈ei |ψ〉|2ei
Born rule: |〈ei |ψ〉|2 is the probability of observing ei as theresult of the measurement.
For a density operator ρ =∑
j pj |ψj〉〈ψj | and the sameobservable A, we have:
〈A〉ρ =∑
j
pj〈ψj |A|ψj〉 =∑
j
∑i
pj ei |〈ei |ψj〉|2 = Tr(ρA)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 15/47
Complete positivity: The CPM construction
In order to apply the new formulation on the categorical model ofCoecke et al. we need:
to replace word vectors with density operators
to replace linear maps with completely positive linear maps,i.e. maps that send positive operators to positive operatorswhile respecting the monoidal structure.
Selinger (2007):
Any dagger compact closed category is associated with a categoryin which the objects are the objects of the original category, butthe maps are completely positive maps.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 16/47
Categorical model of meaning: Reprise
The passage from a grammar to distributional meaning isdefined according to the following composition:
PregF−→ FHilb
L−→ CPM(FHilb)
The meaning of a sentence w1w2 . . .wn with grammaticalderivation α becomes:
L(F(α)) (ρ(w1)⊗CPM ρ(w2)⊗CPM . . .⊗CPM ρ(wn))
Composition takes this form:
Subject-intransitive verb: ρIN = TrN(ρ(v) ◦ (ρ(s)⊗ 1S ))
Adjective-noun: ρAN = TrN(ρ(adj) ◦ (1N ⊗ ρ(n)))
Subj-trans. verb-Obj: ρTS = TrN,N(ρ(v) ◦ (ρ(s)⊗ 1S ⊗ ρ(o)))
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 17/47
Using Frobenius algebras
Compact closed categories are simple structures: there is atensor product and ε and η maps
Kartsaklis et al. (COLING 2012), Sadrzadeh et al. (MoL2013): Take advantage of the fact that every vector spacewith a fixed basis has a Frobenius structure over it
...i.e. there exist maps for copying and deleting the basis:
∆ :: |i〉 7→ |i〉 ⊗ |i〉 µ :: |i〉 ⊗ |i〉 7→ |i〉
Advantages:
Provides a way to build tensors for relational words, and simplifiesthe calculations
Introduces a second form of composition (in case of vector spaces,element-wise vector multiplication)
Useful in modelling various linguistic phenomena, e.g. intonation(Kartsaklis and Sadrzadeh, MoL 2015)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 18/47
Frobenius algebras and density operators
Complexity is reduced: tensors of order n become tensors oforder n − 2
Composition becomes the Hadamard product of densityoperators:
ρCS = ρ(s)� Tr(ρ(v) ◦ (1N ⊗ ρ(o)))
ρCO = Tr(ρ(v) ◦ (ρ(s)⊗ 1N))� ρ(o)
Piedeleu et al. (2015): The new formulation also allows fornon-commutative versions of Frobenius algebras:
ρCS = ρ(s) ◦ Tr(ρ(v) ◦ (1N ⊗ ρ(o)))
ρCO = Tr(ρ(v) ◦ (ρ(s)⊗ 1N)) ◦ ρ(o)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 19/47
Outline
1 Composition and lexical ambiguity
2 Open system quantum semantics
3 Textual entailment
4 Graded hyponymy
5 From theory to practice
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 20/47
Distributional Inclusion Hypothesis and distributionalsemantics
In distributional semantics terms, DIH means that thecontexts of u are included in the contexts of v .
Example: Imagine a toy corpus {‘a boy runs’, ‘a person runs’,‘a person sleeps’}; boy entails person since:
{run} ⊆ {run, sleep}
Intuition: A person can do everythingthat can be done by a boy, and more.
For two words u and v we say:
u ` v whenever F(−→u ) ⊆ F(−→v )
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 21/47
Extending feature inclusion to CDMs
In the presence of a compositional operator, feature inclusionadheres to set-theoretic properties.
For element-wise composition, we have:
F(−→v1 + · · ·+−→vn) = F(−→v1) ∪ · · · ∪ F(−→vn)
F(−→v1 � · · · � −→vn) = F(−→v1) ∩ · · · ∩ F(−→vn)
It is also the case that:
F(max(−→v1 , · · · ,−→vn)) = F(−→v1) ∪ · · · ∪ F(−→vn)
F(min(−→v1 , · · · ,−→vn)) = F(−→v1) ∩ · · · ∩ F(−→vn)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 22/47
Generic feature inclusion for tensor-based composition
In the generic case we have:w11 · · · w1n
w21 · · · w2n
......
wm1 · · · wmn
× v1
...vn
By looking the matrix as a list of column vectors(−→w1,−→w2, · · · ,−→wn), the above becomes:
v1−→w1 + v2
−→w2 + · · ·+ vn−→wn
Feature set of generic tensor-based composition:
F(w ×−→v ) =⋃
vi 6=0
F(−→wi ) =⋃
i
F(−→wi ) |F(vi )
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 23/47
Feature inclusion for relational tensor-based model
Grefenstette and Sadrzadeh (2011) extensional approach:
Relational words:
−→adj =
∑i
−−−→Nouni
−−→verbitr =
∑i
−→Sbj i verbtr =
∑i
−→Sbj i ⊗
−−→Obj i
Compound representations:
−→an =−→adj �−−→noun −→sv =
−−→verbitr �
−−→subj svo = verbtr � (
−−→subj ⊗
−→obj)
Feature inclusion behaviour:
F(−→sv ) =⋃
i
F(−→Sbj i ) ∩ F(−→s ) F(−→vo) =
⋃i
F(−−→Obj i ) ∩ F(−→o )
F(svo) =⋃
i
F(−→Sbj i )×F(
−−→Obj i ) ∩ F(−→s )×F(−→o )
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 24/47
Feature inclusion for projective tensor-based models
Kartsaklis and Sadrzadeh (2016) projective approach:
Relational words:
v itv :=∑
i
−−→Sbji ⊗
−−→Sbji vvp :=
∑i
−−→Obji ⊗
−−→Obji
v trv :=∑
i
−−→Sbji ⊗
(−−→Sbji +
−−→Obji
2
)⊗−−→Obji
Sentence vector after composition:
−→sv = −→s T × v itv =∑
i
〈−−→Sbji |−→s 〉
−−→Sbji
Feature inclusion behaviour:
F(−→sv ) =⋃
i
F(−→Sbj i ) |F
(〈−→Sbji |−→s 〉
)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 25/47
Preservation of entailment at the sentence level
In element-wise composition and tensor-based composition,entailment extends from word level to sentence level:
For two sentences s1 = u1 . . . un and s2 = v1 . . . vn for whichui ` vi , it is always the case that s1 ` s2.
E.g. consider two intransitive sentences for which
F(−−→subj1) ⊆ F(
−−→subj2) and F(
−−→verb1) ⊆ F(
−−→verb2); we have:
F(−−→subj1) ∩ F(
−−−→verb1) ⊆ F(
−−→subj2) and F(
−−→subj1) ∩ F(
−−−→verb1) ⊆ F(
−−→verb2)
and consequently:
F(−−→subj1) ∩ F(
−−→verb1) ⊆ F(
−−→subj2) ∩ F(
−−→verb2)
(Proof for the tensor case: Balkır, Kartsaklis, Sadrzadeh (2016))
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 26/47
Outline
1 Composition and lexical ambiguity
2 Open system quantum semantics
3 Textual entailment
4 Graded hyponymy
5 From theory to practice
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 27/47
Entailment and hyponymy
We use density matrices, or more generally, positive operatorsto represent words.
Positive operators A, B have the Lowner orderingA v B ⇐⇒ B − A is positive.
We use this ordering to represent entailment, and introduce agraded version - useful for linguistic phenomena.
We will show that graded entailment lifts compositionally tosentence level.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 28/47
Words as positive operators
A positive operator P is self-adjoint and ∀v ∈ V .〈v |P|v〉 ≥ 0
Density matrices are convex combinations of projectors:ρ =
∑i pi |vi 〉〈vi |, s.t.
∑i pi = 1
We can view concepts as collections of items, with pi
indicating relative frequency.
For example:
JpetK =pd |dog〉〈dog|+ pc |cat〉〈cat|+pt |tarantula〉〈tarantula|+ ...
where ∀i .pi ≥ 0 and∑
i
pi = 1
There are various choices for normalisation of the densitymatrices...
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 29/47
A transitive sentence in CPM(FHilb)
Suppose we have finite dimensional real Hilbert spaces N and Sand vectors :
|Annie〉, |Betty〉, |Clara〉, |beer〉, |wine〉 ∈ N
|likes〉, |appreciates〉 ∈ N ⊗ S ⊗ N
Then, we can set:
Jthe sistersK =1
3(|Annie〉〈Annie|+ |Betty〉〈Betty|+ |Clara〉〈Clara|)
JdrinksK =1
2(|beer〉〈beer|+ |wine〉〈wine|)
JenjoyK =1
2(|like〉〈like|+ |appreciate〉〈appreciate|)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 30/47
A transitive sentence in CPM(FHilb)
Suppose we have finite dimensional real Hilbert spaces N and Sand vectors :
|Annie〉, |Betty〉, |Clara〉, |beer〉, |wine〉 ∈ N
|likes〉, |appreciates〉 ∈ N ⊗ S ⊗ N
Then, we can set:
Jthe sistersK =1
3(|Annie〉〈Annie|+ |Betty〉〈Betty|+ |Clara〉〈Clara|)
JdrinksK =1
2(|beer〉〈beer|+ |wine〉〈wine|)
JenjoyK =1
2(|like〉〈like|+ |appreciate〉〈appreciate|)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 30/47
A transitive sentence in CPM(FHilb)
Then s = The sisters enjoy drinks is given by:
JsK = (ε⊗ 1⊗ ε)(Jthe sistersK⊗ JenjoyK⊗ JdrinksK)
The sisters enjoy drinks
S NN NN=
N S N ′ N ′ N ′NNN N ′S
The sisters enjoy drinks
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 31/47
Entailment for Positive Operators
We order positive operators using the Lowner order andinterpret this as entailment:
A v B ⇐⇒ B − A is positive
As the Lowner order restricts to the usual ordering onprojection operators, we can embed quantum logic[Birkhoff and Von Neumann, 1936] within the poset ofprojection operators, providing a direct link to existing theory.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 32/47
Entailment for Positive Operators
We order positive operators using the Lowner order andinterpret this as entailment:
A v B ⇐⇒ B − A is positive
As the Lowner order restricts to the usual ordering onprojection operators, we can embed quantum logic[Birkhoff and Von Neumann, 1936] within the poset ofprojection operators, providing a direct link to existing theory.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 32/47
So how do we do graded hyponymy?
Recall that positive operators A, B have the Lowner orderingA v B ⇐⇒ B − A is positive.
We say that A is a hyponym of B if A v B
We say that A is a k-hyponym of B for a given value of k inthe range (0, 1] and write A 2k B if:
B − kA is positive
We are interested in the maximum such k .
Theorem
For positive self-adjoint matrices A, B such thatsupp(A) ⊆ supp(B), the maximum k such that B − kA ≥ 0 isgiven by 1/λ where λ is the maximum eigenvalue of B+A.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 33/47
Properties of k-hyponymy
Reflexivity: k-hyponymy is reflexive for k = 1.
Symmetry: k-hyponymy is anti-symmetric for k = 1, butneither symmetric nor anti-symmetric for k ∈ (0, 1).
Transitivity: k-hyponymy satisfies a version of transitivity.Suppose A 2k B and B 2l C . Then A 2kl C .
Continuity: For A 2k B, when there is a small perturbation toA, there is a correspondingly small decrease in the value of k .The perturbation must lie in the support of B, but canintroduce off-diagonal elements.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 34/47
k-hyponymy interacts well with compositionality
We would like our notion of entailment to work at thesentence level.
Since sentences are represented as positive operators, we cancompare them directly.
If sentences have similar structure, we can also give a lowerbound on the entailment strength between sentences based onthe entailment strengths between the words in the sentences.
Example
Suppose JdogK 2k JpetK and JparkK 2l JfieldK. Then
JMy dog runs in the parkK 2??? JMy pet runs in the fieldK
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 35/47
k-hyponymy interacts well with compositionality
Theorem
Let Φ = A1 . . .An and Ψ = B1 . . .Bn be two positive phrases ofthe same length and grammatical structure ϕ. Let theircorresponding density matrices be denoted by JA1K, . . . , JAnKand JB1K, . . . , JBnK respectively. Suppose that JAiK 2ki
JBiKfor i ∈ {1, . . . , n} and some ki ∈ (0, 1]. Then:
ϕ(Φ) 2k1···kn ϕ(Ψ).
so k1 · · · kn provides a lower bound on the extent to which ϕ(Φ)entails ϕ(Ψ)
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 36/47
Example
Suppose we have pure states JnibbleK, JscoffK, JcakeK, JchocolateK.The more general eat and sweets are given by:
JeatK =1
2(JnibbleK + JscoffK), JsweetsK =
1
2(JcakeK + JchocolateK)
Then
JscoffK 21/2 JeatK, JcakeK 21/2 JsweetsK
We consider the sentences:
Js1K = ϕ(JMaryK⊗ JscoffsK⊗ JcakeK)
Js2K = ϕ(JMaryK⊗ JeatsK⊗ JsweetsK)
We will show that Js1K 2kl Js2K where kl = 12 ×
12 = 1
4
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 37/47
Example (cont’d)
Expanding Js2K we obtain:
Js2K = ϕ(JMaryK⊗ 1
2(JnibblesK + JscoffsK)⊗ 1
2(JcakeK + JchocK))
=1
4Js1K +
1
4(ϕ(JMaryK⊗ JscoffsK⊗ JchocK)
+ ϕ(JMaryK⊗ JnibblesK⊗ JcakeK)
+ ϕ(JMaryK⊗ JnibblesK⊗ JchocK))
Therefore:
Js2K−1
4Js1K =
1
4(ϕ(JMaryK⊗ JchocK⊗ JchocK))
+ ϕ(JMaryK⊗ JnibblesK⊗ JcakeK)
+ ϕ(JMaryK⊗ JnibblesK⊗ JchocK))
We can see that Js2K− 14Js1K is positive by the fact that positivity
is preserved under addition and tensor product. Therefore:
Js1K 2kl Js2K
as required.D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 38/47
Mixing entailment and ambiguity
CPM construction applies to any dagger compact closedcategory, even to CPM(FHilb) itself
In other words, we can have density operators of densityoperators,
...i.e. a probability distribution over a set of probabilitydistributions
We can use this fact to encode two (or more) distinct kinds ofinformation
Example: Balkır (2014) uses a form of density operators toencode textual entailment. We can encode both ambiguityand entailment information as follows:
ρ(bank) = 0.5|Bamb〉〈Bamb|+ 0.5|Bent〉〈Bent |
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 39/47
Outline
1 Composition and lexical ambiguity
2 Open system quantum semantics
3 Textual entailment
4 Graded hyponymy
5 From theory to practice
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 40/47
Creating density operators
Density operators can be created with standard WSI methods.For example:
Schutze (1998):
1 Create vectors for all contexts of a target word, e.g. by averagingthe vectors of other words in the same sentence
2 Cluster those context vectors
3 Use the centroid of the produced clusters as sense vectors.
This will produce a statistical ensemble {(pi , |si 〉)}i that canbe used for creating density operators:
ρ(w) =∑
i
pi |si 〉〈si |
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 41/47
Measuring ambiguity
How does ambiguity evolve from homonymous words (e.g.‘nail’) to unambiguous compounds (‘rusty nail’, ‘nail thatgrows’)?
We can measure it with Von Neumann entropy.
Von Neuman entropy is zero for a pure state (i.e. acompletely unambiguous word), and ln dim(H) for amaximally mixed state.
For a density operator ρ with eigen-decompositionρ =
∑i ei |ei 〉〈ei |, Von Neumann entropy is defined as:
S(ρ) = −Tr(ρ ln ρ) = −∑
i
ei ln ei
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 42/47
Measuring entropy: A small-scale experiment
Relative Clausesnoun: verb1/verb2 noun noun that verb1 noun that verb2
organ: enchant/ache 0.18 0.11 0.08vessel : swell/sail 0.25 0.16 0.01queen: fly/rule 0.28 0.14 0.16nail : gleam/grow 0.19 0.06 0.14bank: overflow/loan 0.21 0.19 0.18
Adjectivesnoun: adj1/adj2 noun adj1 noun adj2 nounorgan: music/body 0.18 0.10 0.13vessel : blood/naval 0.25 0.05 0.07queen: fair/chess 0.28 0.05 0.16nail : rusty/finger 0.19 0.04 0.11bank: water/financial 0.21 0.20 0.16
An important aspect of the proposed model:
Disambiguation = Purification
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 43/47
Compactclosure
CategoricalQuantumMechanics
OriginalDisCo model
Open systemextension
Vector spacesemantics
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47
Compactclosure
CategoricalQuantumMechanics
OriginalDisCo model
Open systemextension
Vector spacesemantics
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47
Compactclosure
CategoricalQuantumMechanics
OriginalDisCo model
Open systemextension
Vector spacesemantics
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47
Compactclosure
CategoricalQuantumMechanics
OriginalDisCo model
Open systemextension
Vector spacesemantics
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 44/47
Summary
Density operators offer richer semantics representations fordistributional models of meaning
From probability distributions over symbols we advance toprobability distributions over vectors
The nested levels of CPM construction is an intriguing featurethat deserves separate treatment
Density operators support a form of logic whose distributionaland compositional properties remains to be examined
Large-scale experimental evaluation currently in progress
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 45/47
References I
Abramsky, S. and Coecke, B. (2004).
A categorical semantics of quantum protocols.In 19th Annual IEEE Symposium on Logic in Computer Science, pages 415–425.
Balkır, E. (2014).
Using density matrices in a compositional distributional model of meaning.Master’s thesis, University of Oxford.
Birkhoff, G. and Von Neumann, J. (1936).
The logic of quantum mechanics.Annals of Mathematics, pages 823–843.
Coecke, B., Sadrzadeh, M., and Clark, S. (2010).
Mathematical Foundations for a Compositional Distributional Model of Meaning. Lambek Festschrift.Linguistic Analysis, 36:345–384.
Kartsaklis, D., Kalchbrenner, N., and Sadrzadeh, M. (2014).
Resolving lexical ambiguity in tensor regression models of meaning.In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers), pages 212–217, Baltimore, Maryland. Association for Computational Linguistics.
Kartsaklis, D. and Sadrzadeh, M. (2013).
Prior disambiguation of word tensors for constructing sentence vectors.In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages1590–1601, Seattle, Washington, USA. Association for Computational Linguistics.
Kartsaklis, D. and Sadrzadeh, M. (2015).
A Frobenius model of information structure in categorical compositional distributional semantics.In Proceedings of the 14th Meeting on Mathematics of Language.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 46/47
References II
Kartsaklis, D., Sadrzadeh, M., and Pulman, S. (2012).
A unified sentence space for categorical distributional-compositional semantics: Theory and experiments.In Proceedings of 24th International Conference on Computational Linguistics (COLING 2012): Posters,pages 549–558, Mumbai, India. The COLING 2012 Organizing Committee.
Kartsaklis, D., Sadrzadeh, M., Pulman, S., and Coecke, B. (2015).
Reasoning about meaning in natural language with compact closed categories and Frobenius algebras.In Chubb, J., Eskandarian, A., and Harizanov, V., editors, Logic and Algebraic Structures in QuantumComputing and Information, Association for Symbolic Logic Lecture Notes in Logic. Cambridge UniversityPress.
Piedeleu, R., Kartsaklis, D., Coecke, B., and Sadrzadeh, M. (2015).
Open system categorical quantum semantics in natural language processing.arXiv preprint arXiv:1502.00831.
Sadrzadeh, M., Clark, S., and Coecke, B. (2013).
The Frobenius anatomy of word meanings I: subject and object relative pronouns.Journal of Logic and Computation, Advance Access.
Sadrzadeh, M., Clark, S., and Coecke, B. (2014).
The Frobenius anatomy of word meanings II: Possessive relative pronouns.Journal of Logic and Computation.
Selinger, P. (2007).
Dagger compact closed categories and completely positive maps.Electronic Notes in Theoretical Computer Science, 170:139–163.
D. Kartsaklis, M. Lewis Introduction to CCDS - Lecture 4: Density Matrices 47/47
Top Related