LEARNING VALID ADVERB- ADJECTIVE PAIRSweb.stanford.edu/~cysuen/projects/cs224u_presentation.pdf ·...
Transcript of LEARNING VALID ADVERB- ADJECTIVE PAIRSweb.stanford.edu/~cysuen/projects/cs224u_presentation.pdf ·...
LEARNING VALID ADVERB-ADJECTIVE PAIRS CAROLINE SUEN
CS224U WINTER 2013
THE CHALLENGE We can say:
• “The glass is half full.” • or “Wow, Bob is really tall.”
But can we say:
• “Wow, Bob is half tall”. • or “The glass is really full.” ?
Goal: develop a model that can learn whether an adverb and an adjective can be used together and make grammatical sense.
PRIOR WORK Syrett and Lidz (2010)
• Use linguistics to develop patterns
Sentiment analysis
• Benemara et. al (2007), Liu et. al (2009)
Adjective-noun pairs • Hatzivassiloglou et. al (1993)
EXTRACTING DATA half completely extremely nearly
full 5 3 3 1 tall 0 0 4 0
smart 0 1 4 0 daylong 0 0 0 1
• New York Times dataset, ~18000 articles • Stanford POS tagger to find valid adverb-adjective pairs • 1019 adverbs, 4876 adjectives, 19337 pairs
BUILDING A GRAPH half
completely
extremely
nearly
full
tall
smart
daylong
Relatively sparse bipartite graph
PARTITIONING
half
completely
extremely
nearly
full
tall
smart
daylong
BUILDING A GRAPH: TECHNICAL DETAILS • Used Stanford Network Analysis Platform • Experimented:
• Find dense bipartite subgraphs using the frequent itemset algorithm
• Build adverb graphs and adjective graphs and run community detection algorithms on these graphs
• Based on common neighbors
half
completely
extremely
nearly
full
tall
smart
daylong
half
completely extremely
nearly
full
tall
smart
daylong
Adjective graph
Adverb graph
From Wikipedia
CLIQUE PERCOLATION
CLASSIFY: DOES AN EDGE BELONG? Use the communities that adverbs u and adjective v are in. If, by combining these communities, the edge density is sufficiently high, we claim that u and v can be paired up. Harder case:
• An adverb is in communities C1 and C2. How likely is it to be connected to an adjective in communities D1, D2, and D3?
• Thankfully, this is rare! • Larger and more densely connected communities are
given higher weight
EVALUATION: RECALL • Find “test data” (1100 edges) – remaining edges is
“training data” • Find communities based on training data • Observe fraction of test data edges recovered
EVALUATION: RECALL
Not enough connections: 260 (21.7%)
Not discovered by community detection algorithm: 129 (11.7%)
Correctly discovered by community detection algorithm: 711 (64.6%)
CHALLENGES + NEXT STEPS • Not enough pairings
• (recall for test data with enough connections: 84.6%) • Clique percolation is slow
• priority was building evaluation framework first • next steps: experimenting with clustering
• Adjective edge connections are much more important than adverb connections
• Current framework does not test precision • MTurk for crowd-sourced, hand-labeled data
• Potential next step:
• Check Syrett and Lidz’ linguistic results
THE END
THANKS FOR LISTENING! J