How much information does a language have? Shanon, C. Prediction and Entropy of Printed English,...
-
Upload
diana-coffey -
Category
Documents
-
view
220 -
download
2
Transcript of How much information does a language have? Shanon, C. Prediction and Entropy of Printed English,...
![Page 1: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/1.jpg)
How much information does a language have?
• Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951
![Page 2: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/2.jpg)
Motivation/Skills
![Page 3: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/3.jpg)
RedundancyThe redundancy of ordinary English, not
considering statistical structure over greater distances than about eight letters, is roughly 50%. This means that when we write
En_ _ _sh ha_f o_ w_ _t w_ w_ _te i_ dete_ _ _ _e_ b_ t_e str_ct_r_ _ f _ _ _ lang_ _ _ _ a_d H_ _f i_ c_os_n fre_ _ _
Redundancy =1-H/Hmax
![Page 4: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/4.jpg)
![Page 5: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/5.jpg)
Entropy
How much information is produced on average
for each letter
i
ii ppH log
27
1
logi
ii ppH
![Page 6: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/6.jpg)
‘L’Evêqe en effet est très streect: le clergé, de temps en temps, se permet de révéler ses préférences envers des
‘événements’ frenchement débreedés, mets l’évêqe hème qe ses fêtes respectent des règles sévères et les
trensgresser, c’est fréqemment reesqer de se fère relegger’.
Saisi par l'inspiration, il composa illico un lai, qui, suivant la tradition du Canticum Canticorum Salomonis,
magnifiait l'illuminant corps d'Anastasia : Ton corps, un grand galion où j'irai au long-cours, un sloop, un brigantin tanguant sous mon roulis, Ton front, un fort dont j'irai à
l'assaut, un bastion, un glacis qui fondra sous l'aquilon du transport qui m'agit,
![Page 7: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/7.jpg)
>
0.131E0.105T0.082A0.08O
0.071N0.068R0.063I0.061S0.053H0.038D0.034L0.029F0.028C0.025M0.025U0.02G0.02Y0.02P
0.015W0.014B0.009V0.004K0.002X0.001J0.001Q8E-04Z
13.4EnglishS
01.4SpanishS
![Page 8: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/8.jpg)
![Page 9: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/9.jpg)
How much information is obtained by adding one
letter?
ji
ii jbpLogjbp,
2 ),(),(
S E
i
ii bpLogbp )()( 2 NF
E
NN FLimH
0.131E
0.105T
0.082A
0.002X0.001J0.001Q8E-04Z
SE
![Page 10: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/10.jpg)
FnBits per letter
F04.75
F14.03
F23.32
F33.1
![Page 11: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/11.jpg)
3 order
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF
DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.
![Page 12: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/12.jpg)
#WordProbability
1The.071
2of.034
3and.03
Vocabulary size (no. lemmas)
%of content in OEC
Example lemmas
10 25% the, of, and, to, that, have
100 50% from, because, go, me, our, well, way
1000 75% girl, win, decide, huge, difficult, series
7000 90% tackle, peak, crude, purely, dude, modest
50,000 95% saboteur, autocracy, calyx, conformist
>1,000,000 99% laggardly, endobenthic, pomological
![Page 13: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/13.jpg)
#WordProbability
1The.071
2of.034
3and.03
![Page 14: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/14.jpg)
Zipf’s Law
82.11wordF
nPn
1.
11
nnP 1
8727
1
n
nP
62.2Length
Fword
![Page 15: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/15.jpg)
Is English trying to warn us?
992-995 America ensure oil opportunity
2629-2634 bush admit specifically agents smell denied
16047-16048 arafat unhealthy
#WordProbability
1The.071
2of.034
3and.03
![Page 16: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/16.jpg)
How to continue?
Aoccdrnig to rseearch at an Elingsh uinervtisy, it deosn't
mttaer in waht oredr the ltteers in a wrod are, the
olny iprmoatnt tihng is that the frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can
sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by
it slef but the wrod as a wlohe.
![Page 17: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/17.jpg)
Revealing the statistic of the language
• Q….. 2034 words start with q
• ….q 8 words finish with q
….q …. Ira0q q0.1
![Page 18: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/18.jpg)
Revealing the statistic of the language
THERE IS NO REVERSE ON A MOT0RCYCLE
1115112112111511711121321227111141111131
FRlEND 0F MINE FOUND THIS OUT
861311111111111621111112111111
RATHER DRAMATICALLY THE OTHER DAY
41111111151111111111161111111111111R R R R
4 1 1 1
![Page 19: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/19.jpg)
# of times guessedPosition of the guessed letter
![Page 20: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/20.jpg)
What is the probability to find the number 1 in the third position?
THE 1 1 1REV1 1 5ERS1 1 2MOT1 1 2THA
112
![Page 21: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/21.jpg)
THE 1 1 1ANT3 1 3ERS1 1 2MOT1 1 2HER
222
THA 1 1 2
HEN1 1 3ERS1 1 2TH_
1 1 3AN_
312
HE_ 2 2 1REV1 1 5ERS1 1 2MOT1 1 2AND
311
11...
11 ),,...,(Nii
NNi jiipq
21
),,( 2131
ii
jiipq
LASCUProbability to find the number I in
the place N
21
),,( 2132
ii
jiipq
![Page 22: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/22.jpg)
BoundsTHERE IS NO REVERSE ON A
MOT0RCYCLE
F0 (all the letter have the
same probability)
F1(each letter has its own
probability)
F2(correlation of two letters)
1115112112111511711121321227111141111131
F0 (all the numbers have the
same probability)
F1(each number has its own
probability)
F2(correlation of two
numbers) FN
27
1i
Ni
Ni Logqq
![Page 23: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/23.jpg)
Bounds
27
1i
Ni
NiN LogqqF
![Page 24: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/24.jpg)
27
11 )()(
iN
Ni
Ni FiLogqqi
Entropy
![Page 25: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/25.jpg)
![Page 26: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/26.jpg)
4
47
17
13
3
1
5
3
4
47
17
13
1.3
1.3
1.3
1.3
4
.47
.17
.13
0.013
0.013
0.013
0.013
41q
42q
444 ii LogqqF
27
1
41
4 )(i
ii Logiqqi
![Page 27: How much information does a language have? Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951.](https://reader035.fdocuments.net/reader035/viewer/2022081514/5515c82755034693758b499d/html5/thumbnails/27.jpg)
Bounds
Redundancy ~ 75%
FnBits per letter
F04.75
F14.03
F23.32
F33.1