Smalltalk and Big Data - Avi Bryant
-
Upload
smalltalk-solutions -
Category
Documents
-
view
991 -
download
0
Transcript of Smalltalk and Big Data - Avi Bryant
Smalltalk and Big Data
Avi BryantTwitter
Smalltalk and Big Data
Avi BryantTwitter
and the web
Smalltalk and Big Data
Avi BryantTwitter
and the weband stuff
2004
2004-2011
View
Controller
Model
Web Client
Web Server
Storage
Web Client
HTML
GET/POST
HTML
GET/POST
HTML
GET/POST
HTML
GET/POST
Web Client
HTML
GET/POST
HTML
GET/POST
HTML
GET/POST
HTML
GET/POST
Web Client
HTML+JS
GET/POST
JSON
XHR
JSON
XHR
JSON
XHR
Ten days to implement the lexer, parser, bytecode emitter, interpreter, built-in classes, and decompiler.
Ten days without much sleep to build JS from scratch, "make it look like Java" (I made it look like C), and smuggle in its saving graces: first class functions (closures came later but were part of the plan), Self-ish prototypes (one per instance, not many as in Self).
I'll do better in the next life.
— Brendan Eich
Lars Bak
150M+ active users
Web Client
Web Server
Storage
Web Server
• Continuation-based flow control
Web Server
• Continuation-based flow control• HTML generation
Web Server
• Continuation-based flow control• HTML generation• Stateful UI components
Web Server
• Continuation-based flow control• HTML generation• Stateful UI components• callbacks with unique IDs
Web Server
• Continuation-based flow control• HTML generation• Stateful UI components• callbacks with unique IDs
CSRF<img src=“http://mail.google.com/mail/?logout” />
CSRF<img src=“http://mail.google.com/mail/?logout” />
http://mail.google.com/mail/?logout&token=ab4367de
CSRF<img src=“http://mail.google.com/mail/?logout” />
http://mail.google.com/mail/?logout&token=ab4367dehttp://mail.google.com/seaside/mail?_k=ab4367de
Burn the disk packs• No components, continuations, or canvas• JSON builder w/ callbacks
Web Client
Web Server
Storage
Storage
=~
Stone
Shared Page Cache
Gem Gem
Shared Page Cache
Gem Gem
Shared Page Cache
Gem Gem
Shared Page Cache
Gem Gem
MySQL
Memcache
Ruby Ruby
Memcache
Ruby Ruby
Memcache
Ruby Ruby
Memcache
Ruby Ruby
Gem+SPC+Stone = Transparent Management
Ruby+Memcache+MySQL =Explicit Management
Storage
Gem+SPC+Stone = Transparent Management
Ruby+Memcache+MySQL =Explicit Management
Storage
Gem+SPC+Stone = Transparent Management
Ruby+Memcache+MySQL =Explicit Management
Storage
MySQL
Memcache
Ruby Ruby
Memcache
Ruby Ruby
Memcache
Ruby Ruby
Memcache
Ruby Ruby
MySQL MySQL MySQL
Sharding?
OOCL: 3B objects500GB data
Sharding?
OOCL: 3B objects500GB data
= 3 weeks of tweets
Stone Slave
Shared Page Cache
Gem Gem
Shared Page Cache
Gem Gem
Shared Page Cache
Gem Gem
Shared Page Cache
Gem Gem
Stone Stone Stone
Web Client
Web Server
Online Storage Offline Storage
Offline Storage
15TB
Offline Storage
15TB
Thu
Offline Storage
15TB
Mon
15TB
Tue
15TB
Wed
15TB
Thu
15TB
Fri
15TB
Sat
15TB
Sun
15TB
Mon
15TB
Tue
15TB
Wed
15TB
Thu
15TB
Fri
15TB
Sat
15TB
Sun
15TB
Mon
15TB
Tue
15TB
Wed
15TB
Thu
15TB
Fri
15TB
Sat
15TB
Sun
Hadoop
Hadoop
Hadoop
tweets.tsv/part0
tweets.tsv
tweets.tsv/part1
tweets.tsv/part2
tweets.tsv/part0tweets.tsv/part1
tweets.tsv
tweets.tsv/part1tweets.tsv/part2
tweets.tsv/part2tweets.tsv/part0
MAP REDUCE
tweets.tsv/part0
tweets.tsv
tweets.tsv/part1
tweets.tsv/part2
grep smalltalk tweets.tsv > st.tsv
grep smalltalk tweets.tsv/part0 > st.tsv/part0
grep smalltalk tweets.tsv/part1 > st.tsv/part1
grep smalltalk tweets.tsv/part2 > st.tsv/part2
MAP
tweets.tsv/part0st.tsv/part0
tweets.tsv
tweets.tsv/part1st.tsv/part1
tweets.tsv/part2st.tsv/part2
grep smalltalk tweets.tsv > st.tsv
grep smalltalk tweets.tsv/part0 > st.tsv/part0
grep smalltalk tweets.tsv/part1 > st.tsv/part1
grep smalltalk tweets.tsv/part2 > st.tsv/part2
MAP
tweets.tsv/part0st.tsv/part0
tweets.tsv
tweets.tsv/part1st.tsv/part1
tweets.tsv/part2st.tsv/part2
wc -l st.tsv > count.tsv
REDUCE
sum > count.tsv/part0
wc -l st.tsv/part0
wc -l st.tsv/part1
wc -l st.tsv/part2
count-words st.tsv/* | sort | sum > count.tsv
squeak 3smalltalk 5visualworks 10squeak 6smalltalk 4visualworks 7squeak 1visualworks 3
squeak 1squeak 3squeak 6smalltalk 4smalltalk 5visualworks 3 visualworks 7visualworks 10
squeak 10smalltalk 9visualworks 20
tweets.tsv/part0st.tsv/part0
tweets.tsv
tweets.tsv/part1st.tsv/part1
tweets.tsv/part2st.tsv/part2
count-words st.tsv | sort | sum
REDUCE
sum > count.tsv/part2
count-words st.tsv/part0
count-words st.tsv/part1
count-words st.tsv/part2
sum > count.tsv/part1
sum > count.tsv/part0
(word, count)
MAP REDUCE
MAP
REDUCE
MAP
REDUCE REDUCE
MAP
MAP
REDUCE
MAP REDUCE
MAP
REDUCE
MAP
REDUCE REDUCE
MAP
MAP
REDUCE
Join
Group & Count
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
:(
:/
users := ‘users.csv’ loadFromHadoop.users_1825 := users select:
[:ea | ea age between: 18 and: 25].
joined := users_1825 joinedWith: pages by: ...
?
!HadoopCollection categoriesFor: ‘map/reduce’!map: mapBlock thenReduce: reduceBlock...
Doesn’t Need
Raw performance
Extensive libraries
Concurrency/Async IO
Wide industry acceptance
Fast startup time
Doesn’t Need
Raw performance
Extensive libraries
Concurrency/Async IO
Wide industry acceptance
Fast startup time
Should Have
Lightweight functions/blocks
Dynamic OO
Process migration
Good debugging
Doesn’t Need
Raw performance
Extensive libraries
Concurrency/Async IO
Wide industry acceptance
Fast startup time
Should Have
Lightweight functions/blocks
Dynamic OO
Process migration
Good debugging
Doesn’t Need
Raw performance
Extensive libraries
Concurrency/Async IO
Wide industry acceptance
Fast startup time (JVM integration)
Doesn’t Need
Raw performance
Extensive libraries
Concurrency/Async IO
Wide industry acceptance
Fast startup time
Should Have
Lightweight functions/blocks
Dynamic OO
Process migration
Good debugging
(JVM integration)
?