Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
-
Upload
barbra-pierce -
Category
Documents
-
view
223 -
download
2
Transcript of Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
Methods in Computational Linguistics II
Queens College
Lecture 1: Introduction
2
Methods in Computational Linguistics II
• 2nd semester of a two semester course providing instruction in – The basics of computer science and
programming (via python)– An introduction to techniques in computational
linguistics
3
My background
• Research– Speech Synthesis, and Recognition– Prosody (Intonation)– Speech Segmentation– Non-native speech– Political speech, and other paralinguistics
• Computer Science professor at Queens and CUNY GC.
• Worked at IBM and Google
4
Your Background
• Name.• What are your research interests in linguistics?• How do you expect computational linguistics to
fit into your work?– Are there techniques or applications that you are
particularly looking to learn
• Programming background?– 1 semester? more?
• Are you simultaneously taking Language Technologies
5
Outline
• NLTK– Overview– Major Capabilities
• Searching and Sorting.– Linear (Sequential) search– Binary Search– Insertion sort– MergeSort
• Course Policies• Syllabus Review
6
NLTK
• Natural Language Toolkit.
• A set of utilities in python that facilitate the processing of text.
7
NLTK Functionality
• Accessing corpora• String processing• Collocation discovery• Part of speech tagging• Classification and Clustering• Evaluation Metrics• Chunking• Parsing
8
NLTK Functionality
• Semantic interpretation– first order logic, lambda calculus, model
checking
• Probability and estimation• WordNet Browsing• Chatbots
9
NLTK as a resource
• This range of functionality is quite broad, and not necessarily cohesive.
• However, there are resources and tools (functions and objects) that underpin most major computational linguistics tasks.
10
Major Computational Linguistics Tasks
• Syntax– Tagging– Parsing
• Semantics– Information Extraction– Semantic Role Labeling
• Phonology• Sentence Processing• Segmentation
• Summarization• Speech Recognition• Speech Synthesis• Information Retrieval• Sentiment Analysis• Authorship studies• Co-reference
resolution
11
NLTK Resources
• NLTK also contained lexical material– Project Gutenberg– WordNet– Penn Treebank (subset)– Named Entity Recognition data– Inaugural addresses– Sentiment data– Names corpus– Switchboard (subset)– TIMIT– Webtext
12
Quick Assignment
• Methods I used NLTK.• Homework 0
– Make sure that NLTK is installed and working correctly
– Install matplotlib to use nltk’s graphing functions.
• “Due” asap.
13
One Question Pop Quiz
• Solve for p
14
Math
• Computational Linguistics requires a not-quite-trivial amount of math.
• Statistics and probabilistic modeling form the pillars underlying these computational techniques.
• This involves counting and algebra.• Machine learning governs the classification and
clustering techniques that CL makes heavy use of.– Requires calculus, statistics, linear algebra.
15
Math in this course
• Overview of probability.– Next class
• Algebra for evaluation, some common features
• Statistics for Naïve Bayes classification• Entropy in Decision Trees
16
Outline
• NLTK– Overview– Major Capabilities
• Searching and Sorting.– Linear (Sequential) search– Binary Search– Insertion sort– MergeSort
• Course Policies• Syllabus Review
17
Data Structures, Algorithms, etc.
• In computer science, there is a tight relationship between data structures and algorithms
• In general, the more complex the data structure– the more general or flexible the data and
relationships that can be represented– the faster algorithms can run
18
Searching and Sorting
• Searching and sorting is a frequent example of the relationship between algorithm runtimes, and data structuring.
• Search: identify the location of a value, x, in a list, A.
• Sort: manipulate a list A, such that the values in A are increasing. A[i] <= A[i+1]
19
Sequential Search
def search(A, x):for i in xrange(len(A)):
if A[i] == x:return i
return -1
20
How long does sequential search take to run?
• Best case?
• Worst case?
• Average case?
21
Binary Search
• If the list A is in increasing order, large chunks of the list can be be ignored.
def search(A, x):top = len(A)bottom = 0while bottom < top:mid = (top + bottom) / 2if A[mid] < x:bottom = mid + 1elif A[mid] > x:top = midelse:return midreturn -1
22
How long does binary search take to run?
• Best Case?
• Worst Case?
• Average Case?
23
Improvement of Binary Search
• Binary search is a significant improvement– log n < n
• However, Binary search requires that A is sorted.
• How long does it take to sort an Array and how does this impact the total runtime?
24
Insertion Sort
• Sort the list [5, 2, 4, 6, 1, 3]
def insertionSort(A):for j in xrange(1, len(A)):
key = A[j]i = j - 1while i > -1 and A[i] >
key:A[i + 1] = A[i]i = i - 1
A[i + 1] = key
25
How long does Insertion sort take to run?
• Best Case?
• Worst Case?
• Average Case?
26
Can we sort faster?
• Yes.
• This requires recursion. • We’ll come back to this, but here is a first
example.
27
Merge Sort
def mergeSort(A):if len(A) == 1:
return Amid = len(A) / 2Abottom = mergeSort(A[1:mid])Atop = mergeSort(A[mid +
1:len(A)])return merge(Abottom, Atop)
28
Merge
def merge(A, B):C = []i = 0j = 0A.append(float('inf'))B.append(float('inf'))for k in xrange(len(A) + len(B)):
if A[i] < B[j]:C.append(A[i])i = i + 1
else:C.append(B[j])j = j + 1
return C
29
How long does Merge Sort take to run?
• Hint: This is a (much) harder question.• Best Case?
• Worst Case?
• Average Case?
30
Comparison of run times
Sorting Searching0 n
n*log(n) log(n)
How much searching do you need to do to make it worth sorting?
31
Class Structure and Policies
• Course website:– http://eniac.cs.qc.cuny.edu/andrew/methods2/syllabus.html
• Email list– Banner does not have an email function– Put your email address on the sign up sheet.