Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as...
-
Upload
lorenzo-pinion -
Category
Documents
-
view
214 -
download
0
Transcript of Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as...
![Page 1: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/1.jpg)
Universal Scaling of Semantic Universal Scaling of Semantic InformationInformation
Revealed from IB word clustersRevealed from IB word clustersoror
Human language as optimal biological adaptationHuman language as optimal biological adaptation
Naftali TishbySchool of Computer Science & Engineering &School of Computer Science & Engineering &
Interdisciplinary Center for Neural ComputationInterdisciplinary Center for Neural ComputationThe Hebrew University, Jerusalem, IsraelThe Hebrew University, Jerusalem, Israel
http://www.cs.huji.ac.il/~tishby
Workshop on Machine Learning in Natural Language ProcessingWorkshop on Machine Learning in Natural Language ProcessingCRI, Haifa UniversityCRI, Haifa University
December 2006December 2006
![Page 2: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/2.jpg)
Outline:Outline: Language – Language – a window into our cognitive processinga window into our cognitive processing
What can we learnWhat can we learn from word statistics? from word statistics? How can weHow can we quantify quantify itit?? Is there a Is there a “correct level” “correct level” of description of description ??
Information BottleneckInformation Bottleneck (IB) (IB) and the representation of relevanceand the representation of relevance Finding Approximate sufficient statistics Finding Approximate sufficient statistics
Words, documents and Words, documents and meaningmeaning… … Trading complexity and accuracyTrading complexity and accuracy
ScalingScaling of semantic information of semantic information Possible models: Possible models: small world propertiessmall world properties
![Page 3: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/3.jpg)
0 1 2 3 4 5 6
x 104
0
4000
6000
Number of observed words
What are words?What are words?• acquired persistent neural activity associated with perception and cognitive functions• appear in every language in a regular power-law sub-linear rate
Nu
mb
er o
f d
iffe
ren
t w
ord
s
Log number of words
Lo
g n
um
be
r o
f d
iffe
ren
t w
ord
s
8.5 9 9.5 10 10.5 117.5
8
8.5
9
9.5
data
y = 0.64x + 2.07
10000
8000
2000
![Page 4: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/4.jpg)
english
0 1000 2000 30000
1
2
3x 10
4
# docs
# di
ffer
ent
wor
ds
4 5 6 7 88
8.5
9
9.5
10
10.5
log(# docs)
log(
# di
ffer
ent
wor
ds)
first 100 docs are not displayed
0 2 4 6
x 105
0
1
2
3x 10
4
# words
# di
ffer
ent
wor
ds
10 11 12 13 148
8.5
9
9.5
10
10.5
log(# words)
log(
# di
ffer
ent
wor
ds)
first 100 docs are not displayed
data
y = 0.55x + 5.92
data
y = 0.56x + 2.81
![Page 5: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/5.jpg)
hebrew
0 500 10000
5000
10000
# docs
# di
ffer
ent
wor
ds
4 5 6 77.5
8
8.5
9
9.5
log(# docs)
log(
# di
ffer
ent
wor
ds)
first 100 docs are not displayed
0 2 4 6
x 104
0
2000
4000
6000
8000
10000
# words
# di
ffer
ent
wor
ds
8 9 10 117.5
8
8.5
9
9.5
log(# words)
log(
# di
ffer
ent
wor
ds)
first 100 docs are not displayed
data
y = 0.65x + 4.57
data
y = 0.64x + 2.07
![Page 6: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/6.jpg)
korean
0 500 1000 15000
2
4
6x 10
4
# docs
# di
ffer
ent
wor
ds
4 5 6 7 88.5
9
9.5
10
10.5
11
log(# docs)
log(
# di
ffer
ent
wor
ds)
first 100 docs are not displayed
0 0.5 1 1.5 2
x 105
0
1
2
3
4
5x 10
4
# words
# di
ffer
ent
wor
ds
9 10 11 12 138.5
9
9.5
10
10.5
11
log(# words)
log(
# di
ffer
ent
wor
ds)
first 100 docs are not displayed
data
y = 0.77x + 5.13
data
y = 0.70x + 2.21
![Page 7: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/7.jpg)
Rank – Frequency of wordsRank – Frequency of words
Words exhibit “scale-free” statistics- Words exhibit “scale-free” statistics- Zipf’s lawZipf’s law
0 2 4 6 8 10-11
-10
-9
-8
-7
-6
-5
-4
-3
log Rank
log
F
req
ue
nc
y
Hebrew Zipf curve
![Page 8: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/8.jpg)
How are words/languages How are words/languages generated?generated? Basic observations:Basic observations:
• serve for serve for communicationcommunication and representation and representation• adapt to variableadapt to variable world world statisticsstatistics • collectivecollective (social) entity (social) entity • acquired continuouslyacquired continuously (individually and collectively)(individually and collectively)
Competition Competition between comm. efficiency between comm. efficiency and adaptability / learnabilityand adaptability / learnability
![Page 9: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/9.jpg)
Complexity
Acc
ura
cy
Possible Models/representations
Limited dataLimited data
Bounded Bounded
ComputationComputation
Complexity – Accuracy Tradeoff
![Page 10: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/10.jpg)
Can we quantify it…?
When there is a (relevant) prediction or distortion measure
Accuracy good predictions (low distortion/error)
Complexity long minimal description (optimal codes)
A general tradeoff between distortion and compression:
Information Theory
![Page 11: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/11.jpg)
What can we learn from word co-occurrence...?
Audio Health www Drug noise Dos Doctor ...
Topic1 12 0 0 0 8 0 0 ...
Topic2 0 9 2 11 1 0 6 ...
Topic3 0 10 1 6 0 0 20 ...
Topic4 9 1 0 0 7 0 1 ...
Topic5 0 3 9 0 1 10 0 ...
Topic6 1 11 0 6 0 1 7 ...
Topic7 0 0 8 0 2 12 2 ... Topic8 15 0 1 1 10 0 0 ...
Topic9 0 12 1 16 0 1 12 ...
Topic10 1 0 9 0 1 11 2 ...
... ... ... ... ... ... ... ... ...
![Page 12: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/12.jpg)
We need to index the max number of non-overlapping green blobs inside the blue blob:
(mutual information!)
XX̂)x|x̂(p
)ˆ|(2 XXnH
)(2 XnH
)ˆ,()ˆ|()( 22/2 XXnIXXnHXnH
Representation and Mutual Representation and Mutual InformationInformation
![Page 13: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/13.jpg)
IB: an Information TheoreticIB: an Information Theoretic Principle PrincipleFor extracting For extracting RelevantRelevant structure structure
The minimal representation of X that keeps as much information about another variable, Y, as possible.
Generalizes the classical notion of “sufficient statistics”. ( , )
ˆ
I X YX Y
X
)ˆ,( XXI
),ˆ( YXI
ˆ( | )ˆ ˆ( , ) ( , )p x xMin I X X I X Y
![Page 14: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/14.jpg)
The Self Consistent EquationsSelf Consistent Equations Marginal:
Markov condition:
Bayes’ rule:
x
xpxxpxp )()|ˆ()ˆ(
x
xxpxypxyp )ˆ|()|()ˆ|(
)|ˆ()ˆ(
)()ˆ|( xxp
xp
xpxxp
0)|ˆ(
)]|ˆ([
xxp
xxpL
)ˆ,(exp
),(
)ˆ()|ˆ( xxD
xZ
xpxxp KL
![Page 15: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/15.jpg)
The emerged effective distortioneffective distortion measure:
y
KLKL
xyp
xypxyp
xypxypDxxD
)ˆ|(
)|(log)|(
)ˆ|(|)|(ˆ,
• Regular if is absolutely continuous w.r.t.
• Small if predicts y as well as x:
)ˆ|( xyp )|( xyp
x̂
yx
yx
xyp
xxp
xyp
)ˆ|(
)|ˆ(
)|(
ˆ
![Page 16: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/16.jpg)
The Information BottleneckInformation Bottleneck Algorithm
)ˆ,()ˆ,(min
),(logminminmin
)|ˆ(),ˆ(),ˆ|(
)|ˆ()ˆ()ˆ|(
xxDXXI
xZ
KLxxpxpxyp
xxpxpxyp
xtt
tx
t
KLt
t
tt
xxpxypxyp
xxpxpxp
xxDxZ
xpxxp
)ˆ|()|()ˆ|(
)|ˆ()()ˆ(
)ˆ,(exp),(
)ˆ()|ˆ(1
“free energy”
![Page 17: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/17.jpg)
The emergent effective distortion measure:
)ˆ|(|)|(ˆ, xypxypDxxD KLKL
)ˆ(xp )|ˆ( xxp
)ˆ|( xypGeneralizedBA-algorithm
![Page 18: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/18.jpg)
Can be calculated analytically for Markov chains, Gaussian processes, etc., and numerically in general.
IY
IX
IC1Y (IC1
X)
IC2Y (IC2
X)
IC3Y (IC3
X)
The limit is always the convexenvelope of increasing complexityInformation Curves
![Page 19: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/19.jpg)
Naftali Tishby ACAI-99 20
Words and topics again...
Audio Health www Drug noise Dos Doctor ...
Topic1 12 0 0 0 8 0 0 ...
Topic2 0 9 2 11 1 0 6 ...
Topic3 0 10 1 6 0 0 20 ...
Topic4 9 1 0 0 7 0 1 ...
Topic5 0 3 9 0 1 10 0 ...
Topic6 1 11 0 6 0 1 7 ...
Topic7 0 0 8 0 2 12 2 ... Topic8 15 0 1 1 10 0 0 ...
Topic9 0 12 1 16 0 1 12 ...
Topic10 1 0 9 0 1 11 2 ...
... ... ... ... ... ... ... ... ...
![Page 20: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/20.jpg)
Simple Example
Audio Noise Health Drug Doctor www Dos ....
Doc1 12 8 0 0 0 0 0 ...
Doc4 9 7 1 0 1 0 0 ...
Doc8 15 10 0 1 0 1 0 ...
Doc2 0 1 9 11 6 2 0 ...
Doc3 0 0 10 6 20 1 0 ...
Doc6 1 0 11 6 7 0 1 ... Doc9 0 0 12 16 12 1 1 ...
Doc5 0 1 3 0 0 9 10 ... Doc7 0 2 0 0 2 8 12 ... Doc10 1 1 0 0 2 9 11 ...
... ... ... ... ... ... ... ... ...
![Page 21: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/21.jpg)
Audio Noise Health Drug Doctor www Dos ...
Cluster1 36 25 1 1 1 1 0 ...
Cluster2 1 1 42 39 45 4 2 ...
Cluster3 1 4 3 0 4 26 33 ...
... ... ... ... ... ... ... ... ...
A new compact representation
The document clusters preserve the relevant
information between the documents and words
![Page 22: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/22.jpg)
Analyzing Co-Occurrence Tables
Topics
WordsTopics-Words counts matrix
![Page 23: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/23.jpg)
Words
The exact same counts matrix after permutation
Topics
![Page 24: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/24.jpg)
Word clusters
TopicClusters
The eord clusters provide a compact representation that preserve the informationabout the topics
![Page 25: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/25.jpg)
Quantified by Mutual Information
21 2
1212112121
X,X )X(P
)XX(Plog)XX(P)X(P )XX(H)X(H)X;X(I
The distinctionsinside each clusterAre less relevant forpredicting the class
WordsIrrelevant
distinctions
![Page 26: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/26.jpg)
Symmetric IB through Deterministic Annealing
alt.atheismrec.autosrec.motorcyclesrec.sport.*sci.medsci.spacesoc.religion.christiantalk.politics.*
comp.*misc.forsalesci.cryptsci.electronics
carturkishgameteamjesusgunhockey…
xfileimageencryptionwindowdosmac…
New
sgro
up
Word
P(TC,TW)
![Page 27: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/27.jpg)
Symmetric IB through Deterministic Annealing
New
sgro
up
word
comp.graphicscomp.os.ms-windows.misccomp.windows.x
comp.sys.ibm.pc.hardwarecomp.sys.mac.hardwaremisc.forsalesci.cryptsci.electronics
windowsimagewindowjpeggraphics…
encryptiondbideescrowmonitor…
P(TC,TW)
![Page 28: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/28.jpg)
Symmetric IB through Deterministic Annealing
New
sgro
up
word
P(TC,TW)
![Page 29: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/29.jpg)
Symmetric IB through Deterministic Annealing
New
sgro
up
word
alt.atheismrec.sport.baseballrec.sport.hockeysoc.religion.christiantalk.politics.mideasttalk.religion.misc
rec.autosrec.motorcyclessci.medsci.spacetalk.politics.gunstalk.politics.misc
armenianturkishjesushockeyisraeliarmenians…
carqgunbikefbihealth…
P(TC,TW)
![Page 30: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/30.jpg)
Symmetric IB through Deterministic Annealing
New
sgro
up
Word
P(TC,TW)
![Page 31: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/31.jpg)
Symmetric IB through Deterministic Annealing
New
sgro
up
Word
P(TC,TW)
![Page 32: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/32.jpg)
Symmetric IB through Deterministic Annealing
New
sgro
up
Wordatheistschristianityjesusbiblesinfaith…
alt.atheismsoc.religion.christiantalk.religion.misc
P(TC,TW)
![Page 33: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/33.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
We observe Semantic Scaling
-3.1 -3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1-7
-6.5
-6
-5.5
-5
-4.5
y = 1.92*x - 0.866
data 1 linear
),(
),ˆ(
YXI
YXIIY
)(/),ˆ( XHXXII X
X
Y
X
Y
I
I
I
I
1
1
92.1)1(1 XY IcI
![Page 34: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/34.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(T;X)/H(X)
I(T
;Y)/
I(X
;Y)
20NG Noam data
20NG russian data
![Page 35: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/35.jpg)
Simplified Chinese2.09
Traditional Chinese1.73
Dutch2.3
French2.22
Hebrew1.63
Italian2.35
Japanese1.42
Portuguese2.9
Spanish1.89
![Page 36: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/36.jpg)
-4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0-8
-7
-6
-5
-4
-3
-2
-1
0
log(1-I(T;X)/H(X))
log
(1-I
(T;Y
)/I(
X;Y
))
Chinese SimplifiedChinese Traditional
Dutch
French
HebrewItalian
Japanese
Korean
PorgutueseSpanish
English 20NG Jose
English UTF
English ReutersEnglish 20NG Noam
![Page 37: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/37.jpg)
-4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5-14
-12
-10
-8
-6
-4
-2
log(1-I(T;X)/H(X))
log
(1-I
(T;Y
)/I(
X;Y
))
Random selection of 200 words
![Page 38: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/38.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
)();( ˆˆ XcHYXIXX
Can we understand it?
)ˆ|;( XYXI
),(
),ˆ(
YXI
YXIIY
)(/),ˆ( XHXXII X
)ˆ|(
)ˆ|;(
XXH
XYXI
H
I
X
Y
Any subset of Any subset of the language the language has the same has the same exponent! exponent!
)ˆ|( XXH
![Page 39: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/39.jpg)
)();( ˆˆ XcHYXIXX
But what does it tell about Language?
XY
XY
HI
XXH
H
XYXI
I
loglog
)ˆ|()ˆ|;(
)ˆ|(
)ˆ|;(
XXH
XYXI
H
I
X
Y
““Efficiency of the words”: Efficiency of the words”:
Log-ratio of added Log-ratio of added
Word EntropyWord Entropy
that is transferred tothat is transferred to
Meaningful InformationMeaningful Information
Language appears to have Language appears to have constantconstant word efficiency! word efficiency!
~ 2~ 2
![Page 40: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/40.jpg)
Possible Explanations?Possible Explanations? Power laws are too common to mean anything… Power laws are too common to mean anything…
Zipf’s law and similar… Zipf’s law and similar… “never trust linear log-log plots…”“never trust linear log-log plots…”
It’s It’s a property of my Analysisa property of my Analysis, not of Language, not of Language How do I know that its not all in How do I know that its not all in the way we clusterthe way we cluster the the
words?words?
Words are generated at a Words are generated at a Constant level of Constant level of Ambiguity:Ambiguity: words are generated at awords are generated at a constant rate, constant rate, depending depending
only on the concept (occurred) only on the concept (occurred) ambiguity in ambiguity in usage usage irrespective of vocabulary size or domainirrespective of vocabulary size or domain
Small worldSmall world (scale free) properties of word (scale free) properties of word acquisition…acquisition…
![Page 41: Universal Scaling of Semantic Information Revealed from IB word clusters or Human language as optimal biological adaptation Naftali Tishby School of Computer.](https://reader030.fdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930034/html5/thumbnails/41.jpg)
Many Thanks to…Many Thanks to…
Bill BialekBill Bialek Fernando PereiraFernando Pereira Noam SlonimNoam Slonim
Dmitry DavidovDmitry Davidov Amir NavotAmir Navot Josemine MagdalenJosemine Magdalen
Banter Co. (z”l)Banter Co. (z”l)