Mining Scipy Lectures
-
Upload
marcel-caraciolo -
Category
Technology
-
view
3.567 -
download
2
description
Transcript of Mining Scipy Lectures
![Page 1: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/1.jpg)
Mining LecturesMarcel Caraciolo - @marcelcaraciolo
1
![Page 2: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/2.jpg)
Who’s me ? Marcel Pinheiro Caraciolo
Brazilian, lover of crabs
M.S.C Candidate at Data Mining and Recommender Systems
Current moderator of the Local Python User Group at Pernambuco
Interested at machine learning, recommender systems and mobile computing
Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com
Young apprentice with Python programming since 2008.
Director of P&D - brazilian startup Orygens
2
![Page 3: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/3.jpg)
How I started this analysis?
24 hours ago...
3
![Page 4: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/4.jpg)
Question
How were the topics distributed around the Scipy Conference
General Sessions ?
4
![Page 5: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/5.jpg)
Scrapping of Scipy Conference
Small Web-Crawler for extracting the approved lectures
urllib2, re, BeautifulSoap...5
![Page 6: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/6.jpg)
Resume
Lectures
minutes length
41
820
6
![Page 7: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/7.jpg)
It means...
=~ 4100 tweets posted.
7
![Page 8: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/8.jpg)
Or watch...
Star Wars Trilogy
2x
8
![Page 9: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/9.jpg)
Or finish Super Mario Game...
82 x!
9
![Page 10: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/10.jpg)
Or open the Eclipse
2 x!
Na nossa língua agora...
Abrir o Eclipse 2 vezes!
11
10
![Page 11: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/11.jpg)
Most popular Authors
Dharhas Pothina - 3
Wes McKinney - 2
All the others - 1
11
![Page 12: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/12.jpg)
Playing with the text...
The most frequent words at the conference
nltk, re
12
![Page 13: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/13.jpg)
But let’s take a deeper look.I used the clustering algorithm K-Means
Tool used for visualization Ubigraph
13
![Page 14: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/14.jpg)
Distribution of the Lectures
Basic Frameworksmatplotlib, ipython
Parallelism performance, gpu, statistical
Building frameworksperformance, models, web services
VisualizationNumpy
toolkits using Numpy
data analysis, statistical
14
![Page 15: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/15.jpg)
To sum up...
Mining english text is so much easier!!!Submit your work also!
Spread the scientific python over the community
I expect to be back to Scipy next year!
15
![Page 16: Mining Scipy Lectures](https://reader033.fdocuments.net/reader033/viewer/2022052619/5552c1a9b4c90581158b47fc/html5/thumbnails/16.jpg)
Mining LecturesMarcel Caraciolo - @marcelcaraciolo
https://github.com/marcelcaraciolo/clustering_scipy
16