Analyzing Hadoop
with HadoopMontag, 4. Juni 12
© [email protected], confidential - Do not distribute
Data Grows Faster Than Moore's Law!
Unstructured: 61.7% growth
Structured: 21.8 % growth
http://www.emc.com/about/news/press/2011/20110628-01.htm
Montag, 4. Juni 12
© [email protected], confidential - Do not distribute
Data Warehouse
Static
ETL
Slow
Business Intelligence
Barrier
Hadoop
Dynamic
Raw Load
Fast
Analytics
Agile
30+ Years Workflow
Montag, 4. Juni 12
SQL
Hadoop + Hive
NO-SQL Hadoop 10+MLOC
http://dearcomputer.nl/gir/?q=nerd+&s=4&b=Rip+Google!
http://thepage.time.com/2009/04/18/why-is-this-elephant-crying/
Montag, 4. Juni 12
Evolution backward
http://chelseavose.wordpress.com/2012/01/26/is-evolution-real/
Structured English Query Language
1970’SEQUEL
ANSI SQL ORM JDO NO-SQL Hive
Montag, 4. Juni 12
Unstructured + Structured
Montag, 4. Juni 12
git log --numstat --pretty=format:%H,%ai,%cn,%ce%+B
Montag, 4. Juni 12
Data Quality?
Montag, 4. Juni 12
Results...
Montag, 4. Juni 12
Commits per Year
200
Montag, 4. Juni 12
LOC Changes per Year
7,000,000
Montag, 4. Juni 12
Most Lines Added
1,500,000
Montag, 4. Juni 12
2006 eMails vs Commits
72
commitsemails
Montag, 4. Juni 12
2011 eMails vs Commitscommitsemails
559
Montag, 4. Juni 12
EMails per Month
800
Montag, 4. Juni 12
Most Discussed, Least Changed
Montag, 4. Juni 12
Most Active Emailers
900
Montag, 4. Juni 12
We’re hiring!
Montag, 4. Juni 12
Emails with Most Replies
Montag, 4. Juni 12
Avg Characters per Commit Message
120
Montag, 4. Juni 12
Longest Comment
35,000
Montag, 4. Juni 12
Email Activity per Timezone
Montag, 4. Juni 12
Follow us: @datameer
Montag, 4. Juni 12
Top Related