Csc410 presentation
-
Upload
matthew-ross -
Category
Education
-
view
49 -
download
0
Transcript of Csc410 presentation
Goals For the Project
● Write Software that would pull data from
Twitter based on location
● Gain Insights on language used in different
cities
● Become well versed in web technologies and
software development practices
Tasks
Step 1: Collect and Format Data from Twitter
API
Step 2: Structure Collected data
Step 3: Perform data analysis on structured
data
Step 2: Crunching Numbers
Step by Step
● For each set of tweets
● Read each tweet into an ArrayList
o Find all HashTags
o Find all User Mentions
o Find all Uppercase Words
● Remove The Stop Words
o For every Tweet String
o Take Each word and insert it into an
ArrayList
o Store Each ArrayList in an ArrayList
ArrayList<ArrayList<String>>
o Iterate through ArrayList of StopWords
o If any of the individual word in the Data
Structure contain a Stop Word, replace
with STOP
Classes● Main-Find the important stuff
o Collected Data From Files
o Gathered Information Through
Regular Expressions
● City-Store The information
o Only Mutator(set) and Accessor(get)
Methods
o More Formatting Data
● Comparison-Do Stuff to the information
o Takes in two City Objects
o Get Data Proportions?
Stop Words?
In computing, stop words are words which are filtered out prior to, or after,
processing of natural language data (text).
take
taken
tell
tends
th
than
thank
thanks
thanx
that
that's
thats
the
their
theirs
them
themselves
then
thence
there
there's
thereafter
thereby
therefore
therein
thereupon
these
they
they'd
they'll
they're
they've
think
third
this
thorough
thoroughly
those
though
three
through
throughout
thru
thus
to
together
too
took
toward
towards
tried
tries
truly
try
trying
twice
two
Step 3: THE ALGORITHM
● The Idea was to compare the distribution of tags,uppercase words, and mentions
● So we could say "location X is more similar to location Y than to similar to
Z"● Let these proportions be P1 and P2, where both P1 and P2 range from 0
to 1. Then we compute the absolute difference d = |P_1 - P_2|● The distance between L1 and L2 in this context can be computed for
example, by selecting top X information pieces (again, tag/word/mention) from those that appear in the tweets at L1 (call this, S1, and S2 for L2)
● Then computing d with respect to the top pieces from L1's point of view and d from L2's point of view, and by taking the average between the two distance values.
Tools and Technologies Learned
● Java Regular Expressions
● Ruby Programming
● Ruby Gems
● JQuery
● AJAX
● Restful Web Architecture
● Practical Object Oriented
Practices