Flume office-hours-110228
-
Upload
cloudera-inc -
Category
Technology
-
view
2.614 -
download
0
Transcript of Flume office-hours-110228
![Page 1: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/1.jpg)
![Page 2: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/2.jpg)
Flume Office HoursCommunity planning
Jonathan HsiehCloudera HQ, 2/28/2011
![Page 3: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/3.jpg)
Flume Office Hours, 2/28/2011 3
Outline
• State of the world• What’s new?• Stories (Chime in!)• What needs work?• Prioritizing what is next.• Q+A
![Page 4: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/4.jpg)
Flume Office Hours, 2/28/2011 4
STATE OF THE WORLD
![Page 5: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/5.jpg)
Flume Office Hours, 2/28/2011 5
Growing user and developer community
• Github stats:– Currently 295 watchers, 51 forks
• New Committers: – 9/10: Eric Sammer (Cloudera)– 1/11: Bruce Mitchener (Independent)
• User characteristics– Most potential users seem to use adhoc
scripts– Most users are early adopters / startup
devops
May-10 Jun-10 Aug-10 Sep-10 Nov-10 Jan-11 Feb-110
50
100
150
200
250
300
350
0
10
20
30
40
50
60
Watchers
Forks
![Page 6: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/6.jpg)
Flume Office Hours, 2/28/2011 6
A short feature history
• 6/10: v0.9.0 – Initial open source release
• 8/10: v0.9.1 – Fixes for hangs – Initial compression features
• 10/10: v0.9.1+29 (CDH3b3, packages)– Added kerberized HDFS support– Flume cookbook– Elastic Search / Cassandra Plugins– Initial Voldemort Plugins
• 11/10: v0.9.2– Support for other compression codecs– Avro RPC– Improvements to tail and exec– Robustness improvements– Initial Hbase / MongoDB Plugin
• 2/11: v0.9.3 (CDH3b4, packages)– Flume Node Windows support– Initial JSON metrics support– Multi-master functional– Robustness improvements– JRuby / AMQP Plugins– S3/EC2 Blog Stories
• 4/11: v0.9.3+xxx (CDH3 Stable, packages)– Excessive Duplication fixes– Compression fixes
• ?/11: v0.9.4
![Page 7: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/7.jpg)
Flume Office Hours, 2/28/2011 7
WHATS NEW?
![Page 8: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/8.jpg)
Flume Office Hours, 2/28/2011 8
New features
• Flume node JSON metrics– http://node:35862/node/reports
• Terser syntax{ deco1 => { deco2 => sink } } deco1 deco2 sink
• Multiple collector sink supportcollector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),
] }
• Limited Multi-master support• Windows support
![Page 9: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/9.jpg)
Flume Office Hours, 2/28/2011 9
STORIES
![Page 10: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/10.jpg)
Flume Office Hours, 2/28/2011 10
Flume
: The Standard Use Case
HDFS
AgentAgentAgentAgent
AgentAgentAgentAgent
AgentAgentAgentAgent
Collector
Collector
Collector
Masterserverserverserverserver
serverserverserverserver
serverserverserverserverAgent tier Collector tier
![Page 11: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/11.jpg)
Flume Office Hours, 2/28/2011 11
: Multi Datacenter
HDFS
API se
rver
Collector tier
Pro
cess
or
serv
er
AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent
AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent
Collector
Collector
Collector
Collector
Collector
Collector
apiapiapiapiapiapiapiapiapiapiapiapi
apiapiapiproc
apiapiapiproc
apiapiapiproc
![Page 12: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/12.jpg)
Flume Office Hours, 2/28/2011 12
: Multi Datacenter
HDFS
API se
rver
Collector tier
Pro
cess
or
serv
er
AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent
AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent
Collector
Collector
Collector
Collector
Collector
Collector
Relay
apiapiapiapiapiapiapiapiapiapiapiapi
apiapiapiproc
apiapiapiproc
apiapiapiproc
![Page 13: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/13.jpg)
Flume Office Hours, 2/28/2011 13
Flume
: Near Realtime Aggregator
HDFS
DB Hive job
CollectorTracker AgentAgentAgentAgentAd svrAd svrAd svrAd svr
reports
verify
quickreports
![Page 14: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/14.jpg)
Flume Office Hours, 2/28/2011 14
Flume
An enterprise storyA
PI se
rver
Collector tierAgentAgentAgentWinAgentAgentAgentLinuxAgentAgentAgentLinux
Collector
Collector
Collector
apiapiapiapiapiapiapiapiapiapiapiapi
Kerberos HDFS
D D DDDD
Active Directory / LDAP
![Page 15: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/15.jpg)
Flume Office Hours, 2/28/2011 15
index
hbase
hdfs
An emerging community story
HDFSHive queryAgentAgentAgentAgentsvr
Collector Fanout HBase
Incremental Search Idx
Key lookup
Range query
Search query
Faceted query
Pig query
Flume
![Page 16: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/16.jpg)
Flume Office Hours, 2/28/2011 16
WHAT NEEDS WORK?WHAT COMES NEXT?
![Page 17: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/17.jpg)
Flume Office Hours, 2/28/2011 17
Known issues
• Excessive event duplication (due to tail or e2e agent)• Configuration translation problem in some cases• Multi-master limited: doesn’t work with translations
![Page 18: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/18.jpg)
Flume Office Hours, 2/28/2011 18
What’s next? (proposals)
• Fix Excessive duplication issues.• Apache Incubator (?)• Log4j/Log4net/logback/etc…• Fix Multi-master limitations.• Security upgrades for node to node
comms (TLS/SSL)• Improved metrics / GUI / usability• Integration with open source
alerting/monitoring tools• Integration with proprietary systems
• Version proofing RPCs / State storage
• Packaging friendly plug-in install• Multi Datacenter Story• Performance Increases• Inline near-realtime analytics• Puppet/Chef style config for nodes• Lightweight Agent• Masterless Agent• Better S3 / AWS support
![Page 19: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/19.jpg)
Flume Office Hours, 2/28/2011 19
Q+A
![Page 20: Flume office-hours-110228](https://reader030.fdocuments.net/reader030/viewer/2022033107/55d54ea4bb61eba2488b4579/html5/thumbnails/20.jpg)