EE324 DISTRIBUTED SYSTEMS FALL 2015 MapReduce. Overview 2 MapReduce.
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
-
Upload
qaoth -
Category
Technology
-
view
377 -
download
2
Transcript of Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce -
new age tools for aid to testing and QA
by Aditya Garg
Confidential | Copyright © QAAgility Technologies
Aditya Garg @Adigindia
Co-Founder and Director QAAgility.com Co-founder & Steering Committee Member of Agile Testing
Alliance – run meetup groups across multiple cities Co-creator and licensed trainer of Agile Testing Alliance’s
certifications CP-BAT, CP-MAT, CP-AAT, CP-SAT Co-Author of a book on Selenium Co-Author of a book on Selenium Love Cooking Indian Dishes – From Rajasthan Tasting (Testing) World food Travelling and meeting testers (Get inspired and may be inspire a few)
@adigindiahttps://www.linkedin.com/in/adigarg
Big Data - Hadoop and MapReduce - new age tools for aid to testing and QA
Topic for the presentation
for aid to testing and QA
What is this
Confidential | Copyright © QA Agility Technologies
1. How to test Big Data applications ?2. How can QA and Testing
What are we going to discuss ?
2. How can QA and Testing team use Big Data tools for their testing needs ?
1. How to test Big Data applications ?2. How can QA and Testing
What are we going to discuss ?
2. How can QA and Testing team use Big Data tools for their testing needs ?
What is Big Data ?Is it just too much Hype or
Confidential | Copyright © QA Agility Technologies
Is it just too much Hype or reality ?
Let us start with what exactly is BigData
Confidential | Copyright © QA Agility Technologies
Which Search Engine do you use ?
http://s
earchs
torage.t
echtarg
et.com
/defini
tionall-t
hat
How much data does Google store ?
https://www.cirrusinsight.com/blog/how-much-data-does-google-store
http://s
earchs
torage.t
echtarg
et.com
/defini
tion/Kil
o-mega
-giga-te
ra-peta
-and-a
ll
On Search Engines – Anyone using DuckDuckGo?
Data Explosion
Key Points in Big Data
1.Volume – Data Explosion2.Velocity3.Variety4.Veracity
Key Points in Big Data
Ref: IBM.com
Definition
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional
Ref: goo.gl/iWZhjJ
management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#379879e621a9
Big Data Application
1. Finance2. Insurance3. Health Care4. Agriculture5. Defense5. Defense6. Manufacturing7. Aero Space8. Oil and Gas9. Advertisement and Marketing10.Election Campaigns11. List goes on --- applicability across industries
Big Data Application
http://www.forbes.com/sites/bernardmarr/2016/02/03/how-the-super-bowl-uses-big-data-to-change-the-game/?
Big Data Application
http://andrewshamlet.com/2015/12/03/who-will-win-the-2016-us-presidential-nominations/
Ref: http://www.
Big Data Application
http://www.forbes.com/sites/bernardmarr/2016/02/02/this-is-why-dictators-love-big-data/2/#4d413e005844
Lets go back to definition
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
Tools solving Big Data Challenge
Confidential | Copyright © QA Agility Technologies
Tool solving the Big Data Challenge
Hadoop – Key components HDFS and MR
*Source Udacity
1. Sqoop takes data from regular RDBMS and puts it into HDFS2. Flume ingests data into HDFS as it is generated by external systems3. HBASE is real time
Hadoop Ecosystem
*Source Udacity
3. HBASE is real time database on top of HDFS4. Hue is a graphical front end to the cluster5. Oozie is workflow management tool6. Mahout is Machine Learning library
HDFS
• HDFS stands for Hadoop Distributed File System, which is the storage system used by Hadoop. The following is a high-level architecture that explains how HDFS architecture that explains how HDFS works.
Map Reduce
Ref: Emanuele Della Valle@manudellavalle
Understanding MapReduceDemo – Word Count
Confidential | Copyright © QA Agility Technologies
Demo – Word CountGiven an input file, count unique words
WordCount – Map Reduce
Reference : http://wearecloud.cz/media/files/prezentace-biz/Big%20Data%20v%20Cloudu.ppt
How can QA and Testing team use Big Data tools
Confidential | Copyright © QA Agility Technologies
team use Big Data tools for their testing needs ?
Problem Statement and Solution using Hadoop and MapReduce
Confidential | Copyright © QA Agility Technologies
and MapReduce
Problem Statement and Solution using Hadoop and MapReduce
Confidential | Copyright © QA Agility Technologies
and MapReduce
MTBT – Multicast Tick by Tick Adapter
Input was exchange feed – Output given to HFT Engine
Legacy Adaptor (3rd Party) connects to the TAP – and converts to a format which can be used by HFT
MTBT - Adaptor
Exchange TAP – Co-location servers listen to it at high speed
can be used by HFT Platforms (Algorithmic Trading Platforms)New Adaptor – being made Inhouse – to increase the speed by 10 Times
HFT Engine
MTBT - Adaptor
Input Output Output over time
MTBT - Adaptor GOAL--------------------------------------------------1. Testing of Fast & dynamic nature of multicast TBT, it is in micro seconds and on an average around 20,000 data points/sec & on expiry/ volatility day, it goes upto 40,000
MTBT – Testing Objective
Input Output Output over timevolatility day, it goes upto 40,000 data points/ sec.2. To check if there is any packet drop.3. To test the generated fresh & accurate order book upto level 20 (configurable)
MTBT - AdaptorSample
Sample
Sample
Sample
Sample
MTBT – Testing Strategy - Sampling
Input Output Output over time
Do A Reverse Comparison
MTBT - Adaptor Challenges--------------------------------------------------1. Manually next to impossible2. Even few seconds samples were running into large MegaBytes (MB) files3. Manually impossible to compare
MTBT – Challenges
Input Output Output over time3. Manually impossible to compare the legacy records with the New code processed records 4. Daily processed data ran into 150 Giga Bytes (GB) plus files
MTBT - Adaptor BIG DATA Problem--------------------------------------------------1. LARGE 150 GB files (legacy and New applications) – VOLUME2. Testing to compare the output and measure the functional
MTBT – It was a BIG DATA Testing problem
Input Output Output over timemeasure the functional effectiveness in real time data environment – VELOCITY3. Packet drops may happen –(VERACITY)4. Variety was not there – except the format of the output file generated was not in similar format – the content/information was there
MTBT – SOLUTION
1 Reduce LEGACY MTBT - Output file into a standard format
2 Reduce NEW INHOUSE MTBT output file into a standard format
3 Compare the two files
4 Generate Report
DEMO
Confidential | Copyright © QA Agility Technologies
1. Record by Record Comparison being done on 8 GB normal Linux server in less than 2 hours2. Automated report generation3. Automated Result shared with
Outcome
Confidential | Copyright © QA Agility Technologies
3. Automated Result shared with stakeholders4. Used again for regression testing and for NFT testing5. Huge Benefits to the client (Time and Money both)
QA team can use the tools in multiple scenarios1. Beta Testing2. Repeated execution effectiveness –applying analytics ( R)3. Capturing Customer feedback and
Other scenarios – Big Data Tool implementation
Confidential | Copyright © QA Agility Technologies
3. Capturing Customer feedback and channeling the same for smarter test execution4. Extracting relevant information from repeated regression cycles from QC5. Adding intelligence on the data generated by the testing team
Other Way to use Big Data (BETA TESTING)
Confidential | Copyright © QA Agility Technologies
Challenge – Tweet on@qaagility@adigindia
Other Way to use Big Data
Confidential | Copyright © QA Agility Technologies
- Effective Regression Testing - Effective Sanity Testing
Thank you and Jai HindQuestions ?
@adigIndia@adigIndia@AgileTA#GTR2016
If Interested – Please attend a One day workshop on Big Data (Saturday 27 Feb : 9 to 6 PM)• Hadoop and Mapreduce• Hadoop and Mapreduce• VM setup• JDK, Eclipse and Hadoop installation • Map Reduce examples
ContactPlease contact us at [email protected]
Confidential | Copyright © QAAgility Technologies
MUMBAI711, Rupa SolitaireMBP, MahapeNavi Mumbai-400701
DENMARK1 Lindebo 7 Lej - 42,2630 Tasstrup, [email protected]
USA 200 E Campus View Blvd.Suite 200, Columbus, OH