Fast data mining flow prototyping using IPython Notebook
description
Transcript of Fast data mining flow prototyping using IPython Notebook
![Page 1: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/1.jpg)
Fast data mining flow prototyping using IPython Notebook
2013/01/31
Jimmy Lai
r97922028 [at] ntu.edu.tw
![Page 2: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/2.jpg)
Outline
1. Workflow for data mining
2. What IPython Notebook provides
3. Exemplified by text classification
4. Demo code and Notebook usage
IPython Notebook 2
![Page 3: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/3.jpg)
Workflow for data mining
• Traditional programming workflow:
– Edit -> Compile -> Run
• Data Mining workflow:
– Execute -> Explore
– Consists of many data processing stages and we may do trials in each stage with different methods.
– Stages: data parsing, feature extraction, feature selection, model training, model predicting, post processing, etc.
IPython Notebook 3
![Page 4: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/4.jpg)
What IPython Notebook provides
• Interactive Web IDE – Display rich data like plots by matplotlib, math
symbols by latex
– Code cell for sketching
– Execute piece of code in arbitrarily order
– Browser interface for programming remotely
– Easy to demonstrate code and execution result in html or PDF.
• IPython Notebook makes sketching data analysis easily.
IPython Notebook 4
![Page 5: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/5.jpg)
Demo code and Notebook usage
• Demo Code: ipython_demo directory in https://bitbucket.org/noahsark/slideshare
• Ipython Notebook: – Install
$ pip install ipython
– Execution (under ipython_demo dir)
$ ipython notebook --pylab=inline
– Open notebook with browser, e.g. http://127.0.0.1:8888
IPython Notebook 5
![Page 6: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/6.jpg)
IPython Note Interface
IPython Notebook 6
![Page 7: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/7.jpg)
Exemplified by text classification
• Text classification on newsgroup dataset.
• Dataset:
– Build in sklearn.datasets
– Each article belongs to one of the 20 groups
• Goal: classify article to one of the newsgroup name.
• Experiment: feature generation using different ngram parameters.
IPython Notebook 7
![Page 8: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/8.jpg)
Example article
IPython Notebook 8
talk.politics.mideast
![Page 9: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/9.jpg)
IPython Notebook 9
![Page 10: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/10.jpg)
Sample result of feature extraction
IPython Notebook 10
![Page 11: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/11.jpg)
Table of experiment setups
IPython Notebook 11
![Page 12: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/12.jpg)
IPython Notebook 12
![Page 13: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/13.jpg)
Experiment Result
IPython Notebook 13
![Page 14: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/14.jpg)
IPython Notebook 14
![Page 15: Fast data mining flow prototyping using IPython Notebook](https://reader033.fdocuments.net/reader033/viewer/2022052410/54c6847d4a79598d528b46fd/html5/thumbnails/15.jpg)
Observation from plots
IPython Notebook 15