The Artful Business of Data Mining: Computational Statistics with Open Source Tools
-
Upload
david-coallier -
Category
Technology
-
view
973 -
download
1
Transcript of The Artful Business of Data Mining: Computational Statistics with Open Source Tools
![Page 1: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/1.jpg)
The Artful Businessof Data Mining
Computational Statistics with Open Source Tool
Wednesday 20 March 13
![Page 2: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/2.jpg)
David Coallier@davidcoallier
Wednesday 20 March 13
![Page 3: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/3.jpg)
Data ScientistAt Engine Yard (.com)
Wednesday 20 March 13
![Page 4: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/4.jpg)
Find Data
Wednesday 20 March 13
![Page 5: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/5.jpg)
Clean Data
Wednesday 20 March 13
![Page 6: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/6.jpg)
Analyse Data?
Wednesday 20 March 13
![Page 7: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/7.jpg)
Analyse Data
Wednesday 20 March 13
![Page 8: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/8.jpg)
Question Data
Wednesday 20 March 13
![Page 9: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/9.jpg)
Report Findings
Wednesday 20 March 13
![Page 10: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/10.jpg)
Data Scientist
Wednesday 20 March 13
![Page 11: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/11.jpg)
Data Janitor
Wednesday 20 March 13
![Page 12: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/12.jpg)
Actual Tasks
Wednesday 20 March 13
![Page 13: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/13.jpg)
“If your modelis elegant, it’s probably wrong”
Wednesday 20 March 13
![Page 14: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/14.jpg)
“The Times they area-Changing”
— Bob Dylan
Wednesday 20 March 13
![Page 15: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/15.jpg)
Python & R
Wednesday 20 March 13
![Page 17: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/17.jpg)
scipy.stats
Wednesday 20 March 13
![Page 18: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/18.jpg)
scipy.statsDescriptive Statistics
Wednesday 20 March 13
![Page 19: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/19.jpg)
from scipy.stats import describe
s = [1,2,1,3,4,5]
print describe(s)
Wednesday 20 March 13
![Page 20: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/20.jpg)
scipy.statsProbability Distributions
Wednesday 20 March 13
![Page 21: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/21.jpg)
ExamplePoisson Distribution
Wednesday 20 March 13
![Page 22: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/22.jpg)
f (k;λ) = λ ke−k
k!for k >= 0
Wednesday 20 March 13
![Page 23: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/23.jpg)
import scipy.stats.poissonp = poisson.pmf([1,2,3,4,1,2,3], 2)
Wednesday 20 March 13
![Page 24: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/24.jpg)
print p.mean()print p.sum()...
Wednesday 20 March 13
![Page 26: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/26.jpg)
NumPyLinear Algebra
Wednesday 20 March 13
![Page 27: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/27.jpg)
1 00 1
⎛⎝⎜
⎞⎠⎟
Wednesday 20 March 13
![Page 28: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/28.jpg)
import numpy as npx = np.array([ [1, 0], [0, 1] ])vec, val = np.linalg.eig(x)np.linalg.eigvals(x)
Wednesday 20 March 13
![Page 29: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/29.jpg)
>>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.], [ 0., 1.] ]) )
Wednesday 20 March 13
![Page 30: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/30.jpg)
MatplotlibPython Plotting
Wednesday 20 March 13
![Page 31: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/31.jpg)
statsmodelsAdvanced Statistics Modeling
Wednesday 20 March 13
![Page 32: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/32.jpg)
NLTKNatural Language Tool Kit
Wednesday 20 March 13
![Page 33: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/33.jpg)
scikit-learnMachine Learning
Wednesday 20 March 13
![Page 34: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/34.jpg)
from sklearn import treeX = [[0, 0], [1, 1]]Y = [0, 1]clf = tree.DecisionTreeClassifier()clf = clf.fit(X, Y)
clf.predict([[2., 2.]])>>> array([1])
Wednesday 20 March 13
![Page 35: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/35.jpg)
PyBrain... Machine Learning
Wednesday 20 March 13
![Page 36: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/36.jpg)
PyMCBayesian Inference
Wednesday 20 March 13
![Page 37: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/37.jpg)
PatternWeb Mining for Python
Wednesday 20 March 13
![Page 38: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/38.jpg)
NetworkXStudy Networks
Wednesday 20 March 13
![Page 39: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/39.jpg)
MILKMOAR machine LEARNING!
Wednesday 20 March 13
![Page 40: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/40.jpg)
Pandaseasy-to-use
data structures
Wednesday 20 March 13
![Page 41: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/41.jpg)
from pandas import *x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18}])
print x[x['age'] > 20].count()print x[x['age'] > 20].mean()
Wednesday 20 March 13
![Page 42: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/42.jpg)
RWednesday 20 March 13
![Page 43: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/43.jpg)
RStudioThe IDE
Wednesday 20 March 13
![Page 44: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/44.jpg)
lubridateand zoo
Dealing with Dates...
Wednesday 20 March 13
![Page 45: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/45.jpg)
yy/mm/dd mm/dd/yyYYYY-mm-dd HH:MM:ss TZyy-mm-dd 1363784094.513425yy/mm different timezone
Wednesday 20 March 13
![Page 46: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/46.jpg)
reshape2Reshape your Data
Wednesday 20 March 13
![Page 47: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/47.jpg)
ggplot2Visualise your Data
Wednesday 20 March 13
![Page 48: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/48.jpg)
RCurl, RJSONIOFind more Data
Wednesday 20 March 13
![Page 49: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/49.jpg)
HMiscMiscellaneous useful functions
Wednesday 20 March 13
![Page 50: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/50.jpg)
forecastCan you guess?
Wednesday 20 March 13
![Page 51: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/51.jpg)
garchAnd ruGarch
Wednesday 20 March 13
![Page 52: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/52.jpg)
quantmodStatistical Financial Trading
Wednesday 20 March 13
![Page 53: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/53.jpg)
xtsExtensible Time Series
Wednesday 20 March 13
![Page 54: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/54.jpg)
igraphStudy Networks
Wednesday 20 March 13
![Page 55: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/55.jpg)
maptoolsRead & View Maps
Wednesday 20 March 13
![Page 56: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/56.jpg)
map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[floor(USArrests$Rape/max(USArrests$Rape)*28)], fill=T)
Wednesday 20 March 13
![Page 57: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/57.jpg)
StorageWednesday 20 March 13
![Page 58: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/58.jpg)
Oppose“big” Data
Wednesday 20 March 13
![Page 59: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/59.jpg)
“Learn how
to sample”
Wednesday 20 March 13
![Page 60: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/60.jpg)
ExperimentsWednesday 20 March 13
![Page 61: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/61.jpg)
What DoYou Want to Answer?
Wednesday 20 March 13
![Page 62: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/62.jpg)
UnderstandYour Audience
Wednesday 20 March 13
![Page 63: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/63.jpg)
ScientificReporting
Wednesday 20 March 13
![Page 64: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/64.jpg)
Busy-nessTime is money
Wednesday 20 March 13
![Page 65: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/65.jpg)
PublicVisualisation
Wednesday 20 March 13
![Page 66: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/66.jpg)
Best Visualisation,Bad Data
Wednesday 20 March 13
![Page 67: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/67.jpg)
Best Forecastingmodels...Bad Visualisation
Wednesday 20 March 13
![Page 68: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/68.jpg)
Wednesday 20 March 13
![Page 69: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/69.jpg)
Wednesday 20 March 13
![Page 70: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/70.jpg)
SeanchaíWednesday 20 March 13
![Page 71: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/71.jpg)
Wednesday 20 March 13
![Page 72: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/72.jpg)
FeelitWednesday 20 March 13
![Page 73: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/73.jpg)
Wednesday 20 March 13
![Page 74: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/74.jpg)
Wednesday 20 March 13
![Page 75: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/75.jpg)
Wednesday 20 March 13
![Page 76: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/76.jpg)
“Don’t be scared of bar charts.”
Wednesday 20 March 13
![Page 77: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/77.jpg)
Mathematical StatisticsEngineering BusinessEconomicsCuriosity
Wednesday 20 March 13
![Page 78: The Artful Business of Data Mining: Computational Statistics with Open Source Tools](https://reader035.fdocuments.net/reader035/viewer/2022070523/58ec9aec1a28ab5c788b456d/html5/thumbnails/78.jpg)
davidcoallier.github.com@davidcoallier on Twitter
Wednesday 20 March 13