Developing Unit Testable Software with Hadoop at Expedia
-
Upload
huguk -
Category
Technology
-
view
442 -
download
1
Transcript of Developing Unit Testable Software with Hadoop at Expedia
13
Benefits of unit testing
• Build confidence
• Enable change
• Describe behaviour
• Accelerate development
15
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
16
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
25
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
26
Local execution
Framework Local engine
Hive Local-mode
Cascading LocalFlowConnector
Pig Local mode
Crunch MemPipeline
27
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
28
Helper libraries
Framework Library
Hive HiveRunner
Cascading cascading-test
Plunger
Pig PigUnit
Crunch MemPipeline
31
Hive + HiveRunner: pros
• Write/test Hive apps in the same environment
• Seamless UDF development
32
Hive + Runner: cons
• Slow execution
• CSV data – hard to maintain
• Assertions on CSV strings is brittle
• Hadoop compatibility issues
37
Conclusions
• Testing possible with most frameworks
• Efficacy largely influenced by framework
• Tooling is immature