Big data for dummies using data stage live tool demo
-
date post
19-Oct-2014 -
Category
Data & Analytics
-
view
409 -
download
6
description
Transcript of Big data for dummies using data stage live tool demo
![Page 1: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/1.jpg)
Big Data for Dummies using DataStageBig Data for Dummies using DataStage
By Peter Bjelvert
InfoSphere Architect
Middlecon AB
![Page 2: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/2.jpg)
ETL – Relational DB
Extract Transform in DataStage
Load
Your powerful DataStage server will handle all complex transformation and the database is only used for reading and writing.
![Page 3: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/3.jpg)
ELT – Relationel DB
ExtractLoad with Transform
If you have powerful Database servers you can push down much of the work to the database, then DataStage will mostly control the flow
![Page 4: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/4.jpg)
Balanced Optimization
Bal. Opt. create a second copy of the jobb that push everything into target. Creates one big SQL statement.
Bal. Opt. creates a new copy of the jobb that push the load into Source and Target
Use DataStage Balanced Optimization to select how to push the load: -To Source-To Target -To Both
The DataStage job is re-written into SQL code.
![Page 5: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/5.jpg)
ETL Balanced Optimization feature of Datastage
ELT – PushDown
DataStage is doing the main work Bal. Opt. creates a new copy of the job with SQL code:SELECT * FROM (SELECT distinct BRANCH_CITY, BRANCH_STATE, BRANCH_ZIP FROM JK_BANK2.BANK_BRANCH) AS A, ( Select distinct BRANCH_CITY,
DB server is doing the main job
![Page 6: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/6.jpg)
Hadoop Distributed File System - HDFS
Application Layer
Workload mgmt Layer
Data Layer
One file3 copies
![Page 7: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/7.jpg)
MapReduce example
![Page 8: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/8.jpg)
Hadoop application stack
Application Layer
Workload mgmt Layer
Data LayerHDFS
MapReduce
JACL, AQL….
![Page 9: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/9.jpg)
IBM’s Hadoop implementation
![Page 10: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/10.jpg)
ETL – HDFS
ExtractTransform in DataStage Load
Your powerful DataStage server can read and write to the distributed file system
![Page 11: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/11.jpg)
DataStage HDFS example
Read and write to a Hadoop system using the new BDFS stage
![Page 12: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/12.jpg)
ELT – Hadoop system
Extract
Use DataStage Balanced Optimization to select how to push the load: -To Source-To Target -To Both
The DataStage job is re-written into JACL code.
Load with Transform
![Page 13: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/13.jpg)
DataStage JACL example
Bal. Opt. create a second copy of the jobb that push everything into target. Creates one big JACL statement.
![Page 14: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/14.jpg)
ETL Balanced Optimization feature of Datastage
ELT – PushDown
DataStage is doing the main work Bal. Opt. creates a new copy of the job with SQL code:SELECT * FROM (SELECT distinct BRANCH_CITY, BRANCH_STATE, BRANCH_ZIP FROM JK_BANK2.BANK_BRANCH) AS A, ( Select distinct BRANCH_CITY,
DB server is doing the main job
HDFS DataStage is doing the main work Bal. Opt. creates a new copy of the job with JACL code:
SetOptions({conf:{"mapred.job.name":"Data
Stage BalOp job BIGDATA:dstage1 ff_read_write_to_hadoop_jaql_balopt_join CustomerTarget 16_#DSJobInvocationId#"}}); setOptions({conf:{"mapred.reduce.tasks":1}}));
Hadoop application server execute the JACL code onall nodes.
![Page 15: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/15.jpg)
Extract, Transform and filter in DataStage
Load good data into HDFS
DataStage can read from many different sources. Convert common data (like time/date) to failitate following queries. Send unwanted data to garbage
A good scenario for DS customer
Analytic functionsAQL …
![Page 16: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/16.jpg)
o LIVE DEMO
![Page 17: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/17.jpg)
o Borrowed images from google
� Slide 6- https://yoyoclouds.wordpress.com/tag/hadoop/� Slide 7- http://kickstarthadoop.blogspot.se/2011/04/word-count-hadoop-
map-reduce-example.html� Slide 8 - http://www.rosebt.com/1/post/2012/07/hadoop-internal-software-
architecture.html� Slide 9- http://www.ndm.net/datawarehouse/IBM/ibm-infosphere-
biginsights
![Page 18: Big data for dummies using data stage live tool demo](https://reader034.fdocuments.net/reader034/viewer/2022050807/5444593cb1af9f740a8b48d1/html5/thumbnails/18.jpg)
Handling Big Data without angst