Goal recap Implementation Experimental Results Conclusion Questions & Answers.

10

Transcript of Goal recap Implementation Experimental Results Conclusion Questions & Answers.

Goal recap Implementation Experimental Results Conclusion Questions & Answers

Our goal is to implement framework, to predict network traffic by mining mainstream news articles

Method› Latent Dirichlet Allocation (LDA) identifies

and classifies popular topics in articles ISP can query and pre-cache highly

popular videos to reduce overall traffic and delay

Implemented a python program to parse the news articles and collect the title and content

Original LDA implementation processed random Wikipedia articles, we modified it to pass and process news articles.

Wrote a script to extract and store YouYube statistical data such as, view-counts, number of subscribers, YouTube ID’s, date of upload, user profile data, etc.

Wrote and implemented a program to sort topics by popularity , we pick most popular topics and compare it with news websites› Popular news websites (such as CNN, BBC)

generate popularity chart over time by click-view data

Implemented the ZOOM Operation› Wrote a program to distribute the articles by

sources/category› Query words using frequent pattern mining and

LDA results to check relevancy and accuracy of popular topics

(X axis) # of feeds VS (Y axis) Video relevance to the topic

(X axis) # of feeds VS (Y axis) Accuracy of selecting video with most traffic

Online LDA alone accurately chooses the most popular topic around 57% of the times using 1k articles. With 100k articles it is around 91% accurate. The blue line is the accuracy using both Online LDA and frequent pattern mining. With 1k articles the accuracy is around 92%. Using 100k articles the accuracy close to 100%.

When using only Online LDA there is only around a 60% chance the selected video will be relevant to the actual topic when using 10k articles. When using 100k articles the probability rises to about 87%. When using frequent pattern mining and Online LDA there is around a 94% chance the video selected is relevant using 10k articles. With 100k the probability is close to 100%.

From these results we conclude that using Online LDA combined with frequent pattern mining we will be able to predict popular topics from mainstream media and identify relevant videos from video portals with high accuracy

Thank you Q&A!!