Lecture 6: Comparing Things Word Similarity

23
Methods in Computational Linguistics II Queens College Lecture 6: Comparing Things Word Similarity

description

Lecture 6: Comparing Things Word Similarity. Methods in Computational Linguistics II Queens College. Today. List Comprehensions Determining Word Similarity Co-occurrences WordNet. List Comprehensions. Compact way to process every item in a list. [x for x in array]. Methods. - PowerPoint PPT Presentation

Transcript of Lecture 6: Comparing Things Word Similarity

Page 1: Lecture  6: Comparing Things Word Similarity

Methods in Computational Linguistics II

Queens College

Lecture 6: Comparing ThingsWord Similarity

Page 2: Lecture  6: Comparing Things Word Similarity

2

Today

• List Comprehensions• Determining Word Similarity• Co-occurrences • WordNet

Page 3: Lecture  6: Comparing Things Word Similarity

3

List Comprehensions

• Compact way to process every item in a list.

• [x for x in array]

Page 4: Lecture  6: Comparing Things Word Similarity

4

Methods

• Using the iterating variable, x, methods can be applied.

• Their value is stored in the resulting list.• [len(x) for x in array]

Page 5: Lecture  6: Comparing Things Word Similarity

5

Conditionals

• Elements from the original list can be omitted from the resulting list, using conditional statements

• [x for x in array if len(x) == 3]

Page 6: Lecture  6: Comparing Things Word Similarity

6

Building up

• These can be combined to build up complicated lists

• [x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)]

Page 7: Lecture  6: Comparing Things Word Similarity

7

Lists Containing Lists

• Lists can contain lists• [[a, 1], [b, 2], [d, 4]]• ...or tuples• [(a, 1), (b, 2), (d, 4)]• [ [d, d*d] for d in array if d < 4]

Page 8: Lecture  6: Comparing Things Word Similarity

8

Lists within lists are often called 2-d arrays

• This is another way we store tables.

• Similar to nested dictionaries.• a = [[0,1], [1,0]• a[1][1]• a[0][0]

Page 9: Lecture  6: Comparing Things Word Similarity

9

Using multiple lists

• Multiple lists can be processed simultaneously in a list comprehension

• [x*y for x in array1 for y in array2]

Page 10: Lecture  6: Comparing Things Word Similarity

10

Co-occurrences

• How would you identify common co-occurrences?

• Define a co-occurrence:– “school bus” vs. “school river”

Page 11: Lecture  6: Comparing Things Word Similarity

11

How are words related?

Page 12: Lecture  6: Comparing Things Word Similarity

12

Some relations

Page 13: Lecture  6: Comparing Things Word Similarity

13

Anything else?

• What relationships would you like to know about between words?

Page 14: Lecture  6: Comparing Things Word Similarity

14

WordNet

Page 15: Lecture  6: Comparing Things Word Similarity

15

Synsets

Page 16: Lecture  6: Comparing Things Word Similarity

16

Other relationships in WordNet

Page 17: Lecture  6: Comparing Things Word Similarity

17

WordNet Similarity

Page 18: Lecture  6: Comparing Things Word Similarity

18

WordNet Similarity

Page 19: Lecture  6: Comparing Things Word Similarity

19

Word sense disambiguation

Page 20: Lecture  6: Comparing Things Word Similarity

20

Stemming and Lemmatizing

Page 21: Lecture  6: Comparing Things Word Similarity

21

Stemming and Lemmatization in NLTK

Page 22: Lecture  6: Comparing Things Word Similarity

22

WordNet Demo

Page 23: Lecture  6: Comparing Things Word Similarity

23

Next Time

• Word Similarity– Wordnet

• Data structures– 2-d arrays. – Trees– Graphs