DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential...
-
Upload
jacob-wilkerson -
Category
Documents
-
view
216 -
download
1
Transcript of DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential...
![Page 1: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/1.jpg)
DataStage Enterprise Edition
![Page 2: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/2.jpg)
Proposed Course Agenda
Day 1– Review of EE Concepts– Sequential Access– Best Practices– DBMS as Source
Day 2– EE Architecture– Transforming Data– DBMS as Target– Sorting Data
Day 3– Combining Data– Configuration Files– Extending EE– Meta Data in EE
Day 4– Job Sequencing– Testing and Debugging
![Page 3: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/3.jpg)
The Course Material
Course Manual Course Manual
Online HelpOnline Help
Exercise Files and Exercise Guide
Exercise Files and Exercise Guide
![Page 4: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/4.jpg)
Using the Course Material
Suggestions for learning– Take notes – Review previous material– Practice– Learn from errors
![Page 5: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/5.jpg)
IntroPart 1
Introduction to DataStage EE
![Page 6: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/6.jpg)
What is DataStage?
Design jobs for Extraction, Transformation, and Loading (ETL)
Ideal tool for data integration projects – such as, data warehouses, data marts, and system migrations
Import, export, create, and managed metadata for use within jobs
Schedule, run, and monitor jobs all within DataStage
Administer your DataStage development and execution environments
![Page 7: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/7.jpg)
DataStage Server and Clients
![Page 8: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/8.jpg)
DataStage Administrator
![Page 9: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/9.jpg)
Client Logon
![Page 10: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/10.jpg)
DataStage Manager
![Page 11: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/11.jpg)
DataStage Designer
![Page 12: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/12.jpg)
DataStage Director
![Page 13: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/13.jpg)
Developing in DataStage
Define global and project properties in Administrator
Import meta data into Manager
Build job in Designer
Compile Designer
Validate, run, and monitor in Director
![Page 14: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/14.jpg)
DataStage Projects
![Page 15: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/15.jpg)
Quiz– True or False
DataStage Designer is used to build and compile your ETL jobs
Manager is used to execute your jobs after you build them
Director is used to execute your jobs after you build them
Administrator is used to set global and project properties
![Page 16: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/16.jpg)
IntroPart 2
Configuring Projects
![Page 17: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/17.jpg)
Module Objectives
After this module you will be able to:– Explain how to create and delete projects– Set project properties in Administrator– Set EE global properties in Administrator
![Page 18: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/18.jpg)
Project Properties
Projects can be created and deleted in Administrator
Project properties and defaults are set in Administrator
![Page 19: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/19.jpg)
Setting Project Properties
To set project properties, log onto Administrator, select your project, and then click “Properties”
![Page 20: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/20.jpg)
Licensing Tab
![Page 21: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/21.jpg)
Projects General Tab
![Page 22: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/22.jpg)
Environment Variables
![Page 23: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/23.jpg)
Permissions Tab
![Page 24: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/24.jpg)
Tracing Tab
![Page 25: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/25.jpg)
Tunables Tab
![Page 26: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/26.jpg)
Parallel Tab
![Page 27: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/27.jpg)
IntroPart 3
Managing Meta Data
![Page 28: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/28.jpg)
Module Objectives
After this module you will be able to:– Describe the DataStage Manager components and
functionality– Import and export DataStage objects– Import metadata for a sequential file
![Page 29: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/29.jpg)
What Is Metadata?
TargetSource Transform
Meta DataRepository
Data
Meta Data
Meta Data
![Page 30: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/30.jpg)
DataStage Manager
![Page 31: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/31.jpg)
Manager Contents
Metadata describing sources and targets: Table definitions
DataStage objects: jobs, routines, table definitions, etc.
![Page 32: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/32.jpg)
Import and Export
Any object in Manager can be exported to a file
Can export whole projects
Use for backup
Sometimes used for version control
Can be used to move DataStage objects from one project to another
Use to share DataStage jobs and projects with other developers
![Page 33: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/33.jpg)
Export Procedure
In Manager, click “Export>DataStage Components”
Select DataStage objects for export
Specified type of export: DSX, XML
Specify file path on client machine
![Page 34: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/34.jpg)
Quiz: True or False?
You can export DataStage objects such as jobs, but you can’t export metadata, such as field definitions of a sequential file.
![Page 35: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/35.jpg)
Quiz: True or False?
The directory to which you export is on the DataStage client machine, not on the DataStage server machine.
![Page 36: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/36.jpg)
Exporting DataStage Objects
![Page 37: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/37.jpg)
Exporting DataStage Objects
![Page 38: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/38.jpg)
Import Procedure
In Manager, click “Import>DataStage Components”
Select DataStage objects for import
![Page 39: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/39.jpg)
Importing DataStage Objects
![Page 40: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/40.jpg)
Import Options
![Page 41: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/41.jpg)
Exercise
Import DataStage Component (table definition)
![Page 42: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/42.jpg)
Metadata Import
Import format and column destinations from sequential files
Import relational table column destinations
Imported as “Table Definitions”
Table definitions can be loaded into job stages
![Page 43: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/43.jpg)
Sequential File Import Procedure
In Manager, click Import>Table Definitions>Sequential File Definitions
Select directory containing sequential file and then the file
Select Manager category
Examined format and column definitions and edit is necessary
![Page 44: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/44.jpg)
Manager Table Definition
![Page 45: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/45.jpg)
Importing Sequential Metadata
![Page 46: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/46.jpg)
IntroPart 4
Designing and Documenting Jobs
![Page 47: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/47.jpg)
Module Objectives
After this module you will be able to:– Describe what a DataStage job is– List the steps involved in creating a job– Describe links and stages– Identify the different types of stages– Design a simple extraction and load job– Compile your job– Create parameters to make your job flexible– Document your job
![Page 48: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/48.jpg)
What Is a Job?
Executable DataStage program
Created in DataStage Designer, but can use components from Manager
Built using a graphical user interface
Compiles into Orchestrate shell language (OSH)
![Page 49: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/49.jpg)
Job Development Overview
In Manager, import metadata defining sources and targets
In Designer, add stages defining data extractions and loads
And Transformers and other stages to defined data transformations
Add linkss defining the flow of data from sources to targets
Compiled the job
In Director, validate, run, and monitor your job
![Page 50: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/50.jpg)
Designer Work Area
![Page 51: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/51.jpg)
Designer Toolbar
Provides quick access to the main functions of Designer
Job properties
Compile
Show/hide metadata markers
![Page 52: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/52.jpg)
Tools Palette
![Page 53: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/53.jpg)
Adding Stages and Links
Stages can be dragged from the tools palette or from the stage type branch of the repository view
Links can be drawn from the tools palette or by right clicking and dragging from one stage to another
![Page 54: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/54.jpg)
Sequential File Stage
Used to extract data from, or load data to, a sequential file
Specify full path to the file
Specify a file format: fixed width or delimited
Specified column definitions
Specify write action
![Page 55: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/55.jpg)
Job Creation Example Sequence
Brief walkthrough of procedure
Presumes meta data already loaded in repository
![Page 56: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/56.jpg)
Designer - Create New Job
![Page 57: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/57.jpg)
Drag Stages and Links Using Palette
![Page 58: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/58.jpg)
Assign Meta Data
![Page 59: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/59.jpg)
Editing a Sequential Source Stage
![Page 60: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/60.jpg)
Editing a Sequential Target
![Page 61: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/61.jpg)
Transformer Stage
Used to define constraints, derivations, and column mappings
A column mapping maps an input column to an output column
In this module will just defined column mappings (no derivations)
![Page 62: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/62.jpg)
Transformer Stage Elements
![Page 63: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/63.jpg)
Create Column Mappings
![Page 64: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/64.jpg)
Creating Stage Variables
![Page 65: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/65.jpg)
Result
![Page 66: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/66.jpg)
Adding Job Parameters
Makes the job more flexible
Parameters can be:– Used in constraints and derivations– Used in directory and file names
Parameter values are determined at run time
![Page 67: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/67.jpg)
Adding Job Documentation
Job Properties– Short and long descriptions– Shows in Manager
Annotation stage– Is a stage on the tool palette– Shows on the job GUI (work area)
![Page 68: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/68.jpg)
Job Properties Documentation
![Page 69: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/69.jpg)
Annotation Stage on the Palette
![Page 70: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/70.jpg)
Annotation Stage Properties
![Page 71: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/71.jpg)
Final Job Work Area with Documentation
![Page 72: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/72.jpg)
Compiling a Job
![Page 73: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/73.jpg)
Errors or Successful Message
![Page 74: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/74.jpg)
IntroPart 5
Running Jobs
![Page 75: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/75.jpg)
Module Objectives
After this module you will be able to:– Validate your job– Use DataStage Director to run your job– Set to run options– Monitor your job’s progress– View job log messages
![Page 76: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/76.jpg)
Prerequisite to Job Execution
Result from Designer compile
![Page 77: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/77.jpg)
DataStage Director
Can schedule, validating, and run jobs
Can be invoked from DataStage Manager or Designer– Tools > Run Director
![Page 78: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/78.jpg)
Running Your Job
![Page 79: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/79.jpg)
Run Options – Parameters and Limits
![Page 80: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/80.jpg)
Director Log View
![Page 81: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/81.jpg)
Message Details are Available
![Page 82: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/82.jpg)
Other Director Functions
Schedule job to run on a particular date/time
Clear job log
Set Director options– Row limits– Abort after x warnings
![Page 83: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/83.jpg)
Module 1
DSEE – DataStage EE
Review
![Page 84: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/84.jpg)
Ascential’s Enterprise Data Integration Platform
CRMERPSCM
RDBMSLegacy
Real-time Client-server Web services
Data WarehouseOther apps.
ANY SOURCE
ANY TARGET
CRMERPSCMBI/AnalyticsRDBMSReal-time Client-server Web servicesData WarehouseOther apps.
Command & Control
DISCOVERDISCOVER
Gather relevant information for target enterprise applications
Data Profiling
PREPAREPREPARE
Data Quality
Cleanse, correct and match input data
TRANSFORMTRANSFORM
Extract, Transform,
Load
Standardize and enrich data and load to targets
Meta Data Management
Parallel Execution
![Page 85: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/85.jpg)
Course Objectives
You will learn to:– Build DataStage EE jobs using complex logic– Utilize parallel processing techniques to increase job
performance– Build custom stages based on application needs
Course emphasis is:– Advanced usage of DataStage EE– Application job development– Best practices techniques
![Page 86: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/86.jpg)
Course Agenda
Day 1– Review of EE Concepts– Sequential Access– Standards– DBMS Access
Day 2– EE Architecture– Transforming Data– Sorting Data
Day 3– Combining Data– Configuration Files
Day 4– Extending EE– Meta Data Usage– Job Control– Testing
![Page 87: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/87.jpg)
Module Objectives
Provide a background for completing work in the DSEE course
Tasks– Review concepts covered in DSEE Essentials course
Skip this module if you recently completed the DataStage EE essentials modules
![Page 88: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/88.jpg)
Review Topics
DataStage architecture
DataStage client review– Administrator– Manager– Designer– Director
Parallel processing paradigm
DataStage Enterprise Edition (DSEE)
![Page 89: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/89.jpg)
Microsoft® Windows NT or UNIX
Designer DirectorRepositoryManagerAdministrator
Extract Cleanse Transform IntegrateDiscover Prepare Transform Extend
Parallel Execution
Meta Data Management
Command & Control
Microsoft® Windows NT/2000/XP
ANY SOURCE
ANY TARGET
CRMERPSCMBI/AnalyticsRDBMSReal-Time Client-server Web servicesData WarehouseOther apps.
Server Repository
Client-Server Architecture
![Page 90: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/90.jpg)
Process Flow
Administrator – add/delete projects, set defaults
Manager – import meta data, backup projects
Designer – assemble jobs, compile, and execute
Director – execute jobs, examine job run logs
![Page 91: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/91.jpg)
Administrator – Licensing and Timeout
![Page 92: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/92.jpg)
Administrator – Project Creation/Removal
Functions specific to a
project.
![Page 93: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/93.jpg)
Administrator – Project Properties
RCP for parallel jobs should be
enabled
Variables for parallel
processing
![Page 94: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/94.jpg)
Administrator – Environment Variables
Variables are category specific
![Page 95: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/95.jpg)
OSH is what is run by the EE Framework
![Page 96: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/96.jpg)
DataStage Manager
![Page 97: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/97.jpg)
Export Objects to MetaStage
Push meta data to
MetaStage
![Page 98: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/98.jpg)
Designer Workspace
Can execute the job from
Designer
![Page 99: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/99.jpg)
DataStage Generated OSH
The EE Framework runs OSH
![Page 100: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/100.jpg)
Director – Executing Jobs
Messages from previous run in different
color
![Page 101: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/101.jpg)
Stages
Can now customize the Designer’s palette
Select desired stages and drag to favorites
![Page 102: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/102.jpg)
Popular Developer Stages
Row generator
Peek
![Page 103: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/103.jpg)
Row Generator
Can build test data
Repeatable property
Edit row in column tab
![Page 104: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/104.jpg)
Peek
Displays field values– Will be displayed in job log or sent to a file– Skip records option– Can control number of records to be displayed
Can be used as stub stage for iterative development (more later)
![Page 105: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/105.jpg)
Why EE is so Effective
Parallel processing paradigm– More hardware, faster processing– Level of parallelization is determined by a
configuration file read at runtime
Emphasis on memory– Data read into memory and lookups performed like
hash table
![Page 106: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/106.jpg)
DataStage EE Enables parallel processing = executing your application on multiple CPUs simultaneously– If you add more resources
(CPUs, RAM, and disks) you increase system performance
• Example system containing6 CPUs (or processing nodes)and disks
1 2
3 4
5 6
Parallel Processing Systems
![Page 107: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/107.jpg)
Three main types of scalable systems
Symmetric Multiprocessors (SMP): shared memory and disk
Clusters: UNIX systems connected via networks
MPP: Massively Parallel Processing
note
Scaleable Systems: Examples
![Page 108: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/108.jpg)
• Multiple CPUs with a single operating system• Programs communicate using shared memory• All CPUs share system resources
(OS, memory with single linear address space, disks, I/O)
When used with Enterprise Edition:• Data transport uses shared memory• Simplified startup
cpu cpu
cpu cpu
Enterprise Edition treats NUMA (NonUniform Memory Access) as plain SMP
SMP: Shared Everything
![Page 109: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/109.jpg)
Source
Transform
Target
Data Warehouse
Operational Data
Archived Data
Clean Load
Disk Disk Disk
Traditional approach to batch processing:• Write to disk and read from disk before each processing operation• Sub-optimal utilization of resources
• a 10 GB stream leads to 70 GB of I/O• processing resources can sit idle during I/O
• Very complex to manage (lots and lots of small jobs)• Becomes impractical with big data volumes
• disk I/O consumes the processing• terabytes of disk required for temporary staging
Traditional Batch Processing
![Page 110: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/110.jpg)
Data Pipelining• Transform, clean and load processes are executing simultaneously on the same processor• rows are moving forward through the flow
Source
Transform
Target
Data Warehouse
Operational Data
Archived Data Clean Load
• Start a downstream process while an upstream process is still running.• This eliminates intermediate storing to disk, which is critical for big data.• This also keeps the processors busy.• Still has limits on scalability
Think of a conveyor belt moving the rows from process to process!
Pipeline Multiprocessing
![Page 111: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/111.jpg)
Data Partitioning
Transform
SourceData
Transform
Transform
Transform
Node 1
Node 2
Node 3
Node 4
A-F
G- M
N-T
U-Z
• Break up big data into partitions
• Run one partition on each processor
• 4X times faster on 4 processors - With data big enough: 100X faster on 100 processors
• This is exactly how the parallel databases work!
• Data Partitioning requires the same transform to all partitions: Aaron Abbott and Zygmund Zorn undergo the same transform
Partition Parallelism
![Page 112: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/112.jpg)
Putting It All Together: Parallel Dataflow
Source Target
Transform Clean Load
Pipelining
Par
titio
ning
SourceData
Data Warehouse
Combining Parallelism Types
![Page 113: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/113.jpg)
Putting It All Together: Parallel Dataflow with Repartioning on-the-fly
Without Landing To Disk!
Source Target
Transform Clean Load
Pipelining
SourceData Data
WarehousePar
titio
ning
Rep
artit
ioni
ng
A-FG- M
N-TU-Z
Customer last name Customer zip code Credit card number
Rep
artit
ioni
ng
Repartitioning
![Page 114: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/114.jpg)
• Dataset: uniform set of rows in the Framework's internal representation - Three flavors: 1. file sets *.fs : stored on multiple Unix files as flat files 2. persistent: *.ds : stored on multiple Unix files in Framework format
read and written using the DataSet Stage 3. virtual: *.v : links, in Framework format, NOT stored on disk - The Framework processes only datasets—hence possible need for Import - Different datasets typically have different schemas- Convention: "dataset" = Framework data set.
• Partition: subset of rows in a dataset earmarked for processing by the same node (virtual CPU, declared in a configuration file).
- All the partitions of a dataset follow the same schema: that of the dataset
EE Program Elements
![Page 115: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/115.jpg)
Orchestrate Program(sequential data flow)
Orchestrate Application Frameworkand Runtime System
Import
Clean 1
Clean 2
Merge Analyze
Configuration File
Centralized Error Handlingand Event Logging
Parallel access to data in files
Parallel access to data in RDBMS
Inter-node communications
Parallel pipelining
Parallelization of operations
Import
Clean 1
Merge Analyze
Clean 2
Relational Data
PerformanceVisualization
Flat Files
Orchestrate Framework:Provides application scalability
DataStage Enterprise Edition:Best-of-breed scalable data integration platformNo limitations on data volumes or throughput
DataStage EE Architecture
DataStage:Provides data integration platform
![Page 116: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/116.jpg)
DSEE:– Automatically scales to fit the machine– Handles data flow among multiple CPU’s and disks
With DSEE you can:– Create applications for SMP’s, clusters and MPP’s…
Enterprise Edition is architecture-neutral– Access relational databases in parallel– Execute external applications in parallel– Store data across multiple disks and nodes
Introduction to DataStage EE
![Page 117: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/117.jpg)
Developer assembles data flow using the Designer
…and gets: parallel access, propagation, transformation, and load.
The design is good for 1 node, 4 nodes, or N nodes. To change # nodes, just swap configuration file.
No need to modify or recompile the design
Job Design VS. Execution
![Page 118: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/118.jpg)
Partitioners distribute rows into partitions– implement data-partition parallelism
Collectors = inverse partitioners
Live on input links of stages running – in parallel (partitioners)– sequentially (collectors)
Use a choice of methods
Partitioners and Collectors
![Page 119: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/119.jpg)
Example Partitioning Icons
partitioner
![Page 120: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/120.jpg)
Exercise
Complete exercises 1-1 and 1-2, and 1-3
![Page 121: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/121.jpg)
Module 2
DSEE Sequential Access
![Page 122: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/122.jpg)
Module Objectives
You will learn to:– Import sequential files into the EE Framework– Utilize parallel processing techniques to increase
sequential file access– Understand usage of the Sequential, DataSet, FileSet,
and LookupFileSet stages– Manage partitioned data stored by the Framework
![Page 123: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/123.jpg)
Types of Sequential Data Stages
Sequential– Fixed or variable length
File Set
Lookup File Set
Data Set
![Page 124: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/124.jpg)
The EE Framework processes only datasets
For files other than datasets, such as flat files, Enterprise Edition must perform import and export operations – this is performed by import and export OSH operators generated by Sequential or FileSet stages
During import or export DataStage performs format translations – into, or out of, the EE internal format
Data is described to the Framework in a schema
Sequential Stage Introduction
![Page 125: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/125.jpg)
How the Sequential Stage Works
Generates Import/Export operators, depending on whether stage is source or target
Performs direct C++ file I/O streams
![Page 126: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/126.jpg)
Using the Sequential File Stage
Importing/Exporting Data
Both import and export of general files (text, binary) are performed by the SequentialFile Stage.
– Data import:
– Data export
EE internal format
EE internal format
![Page 127: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/127.jpg)
Working With Flat Files
Sequential File Stage– Normally will execute in sequential mode– Can be parallel if reading multiple files (file pattern
option)– Can use multiple readers within a node – DSEE needs to know
How file is divided into rowsHow row is divided into columns
![Page 128: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/128.jpg)
Processes Needed to Import Data
Recordization– Divides input stream into records– Set on the format tab
Columnization– Divides the record into columns– Default set on the format tab but can be overridden on
the columns tab– Can be “incomplete” if using a schema or not even
specified in the stage if using RCP
![Page 129: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/129.jpg)
File Format Example
Fie ld 1
F ie ld 1
F ie ld 1
F ie ld 1
F ie ld 1
F ie ld 1
,
,
,
,
,
,
Last fie ld
Last fie ld
n l
n l,
F ie ld D e lim ite r
F in a l D e lim ite r = c o m m a
F in a l D e lim ite r = e n d
R e co rd d e lim ite r
![Page 130: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/130.jpg)
Sequential File Stage
To set the properties, use stage editor– Page (general, input/output)– Tabs (format, columns)
Sequential stage link rules– One input link– One output links (except for reject link definition)– One reject link
Will reject any records not matching meta data in the column definitions
![Page 131: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/131.jpg)
Job Design Using Sequential Stages
Stage categories
![Page 132: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/132.jpg)
General Tab – Sequential Source
Multiple output links
Show records
![Page 133: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/133.jpg)
Properties – Multiple Files
Click to add more files having the same meta data.
![Page 134: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/134.jpg)
Properties - Multiple Readers
Multiple readers option allows you to set number of
readers
![Page 135: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/135.jpg)
Format Tab
File into records Record into columns
![Page 136: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/136.jpg)
Read Methods
![Page 137: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/137.jpg)
Reject Link
Reject mode = output
Source– All records not matching the meta data (the column
definitions)
Target– All records that are rejected for any reason
Meta data – one column, datatype = raw
![Page 138: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/138.jpg)
File Set Stage
Can read or write file sets
Files suffixed by .fs
File set consists of:1. Descriptor file – contains location of raw data files +
meta data
2. Individual raw data files
Can be processed in parallel
![Page 139: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/139.jpg)
File Set Stage Example
Descriptor file
![Page 140: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/140.jpg)
File Set Usage
Why use a file set?– 2G limit on some file systems– Need to distribute data among nodes to prevent
overruns– If used in parallel, runs faster that sequential file
![Page 141: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/141.jpg)
Lookup File Set Stage
Can create file sets
Usually used in conjunction with Lookup stages
![Page 142: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/142.jpg)
Lookup File Set > Properties
Key column specified
Key column dropped in
descriptor file
![Page 143: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/143.jpg)
Data Set
Operating system (Framework) file
Suffixed by .ds
Referred to by a control file
Managed by Data Set Management utility from GUI (Manager, Designer, Director)
Represents persistent data
Key to good performance in set of linked jobs
![Page 144: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/144.jpg)
Persistent Datasets
Accessed from/to disk with DataSet Stage.
Two parts: – Descriptor file:
contains metadata, data location, but NOT the data itself
– Data file(s) contain the data multiple Unix files (one per node), accessible in parallel
input.ds
node1:/local/disk1/…node2:/local/disk2/…
record ( partno: int32; description: string; )
![Page 145: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/145.jpg)
Quiz!
• True or False?Everything that has been data-partitioned must be
collected in same job
![Page 146: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/146.jpg)
Data Set Stage
Is the data partitioned?
![Page 147: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/147.jpg)
Engine Data Translation
Occurs on import– From sequential files or file sets– From RDBMS
Occurs on export– From datasets to file sets or sequential files– From datasets to RDBMS
Engine most efficient when processing internally formatted records (I.e. data contained in datasets)
![Page 148: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/148.jpg)
Managing DataSets
GUI (Manager, Designer, Director) – tools > data set management
Alternative methods – Orchadmin
Unix command line utilityList recordsRemove data sets (will remove all components)
– DsrecordsLists number of records in a dataset
![Page 149: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/149.jpg)
Data Set Management
Display data
Schema
![Page 150: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/150.jpg)
Data Set Management From Unix
Alternative method of managing file sets and data sets– Dsrecords
Gives record count– Unix command-line utility– $ dsrecords ds_name
I.e.. $ dsrecords myDS.ds156999 records
– Orchadmin Manages EE persistent data sets
– Unix command-line utility
I.e. $ orchadmin rm myDataSet.ds
![Page 151: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/151.jpg)
Exercise
Complete exercises 2-1, 2-2, 2-3, and 2-4.
![Page 152: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/152.jpg)
Module 3
Standards and Techniques
![Page 153: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/153.jpg)
Objectives
Establish standard techniques for DSEE development
Will cover:– Job documentation– Naming conventions for jobs, links, and stages– Iterative job design– Useful stages for job development – Using configuration files for development– Using environmental variables– Job parameters
![Page 154: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/154.jpg)
Job Presentation
Document using the annotation
stage
![Page 155: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/155.jpg)
Job Properties Documentation
Description shows in DS Manager and MetaStage
Organize jobs into categories
![Page 156: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/156.jpg)
Naming conventions
Stages named after the – Data they access– Function they perform– DO NOT leave defaulted stage names like
Sequential_File_0
Links named for the data they carry– DO NOT leave defaulted link names like DSLink3
![Page 157: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/157.jpg)
Stage and Link Names
Stages and links renamed to data
they handle
![Page 158: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/158.jpg)
Create Reusable Job Components
Use Enterprise Edition shared containers when feasible
Container
![Page 159: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/159.jpg)
Use Iterative Job Design
Use copy or peek stage as stub
Test job in phases – small first, then increasing in complexity
Use Peek stage to examine records
![Page 160: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/160.jpg)
Copy or Peek Stage Stub
Copy stage
![Page 161: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/161.jpg)
Transformer StageTechniques
Suggestions -– Always include reject link.– Always test for null value before using a column in a
function. – Try to use RCP and only map columns that have a
derivation other than a copy. More on RCP later.– Be aware of Column and Stage variable Data Types.
Often user does not pay attention to Stage Variable type.
– Avoid type conversions.Try to maintain the data type as imported.
![Page 162: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/162.jpg)
The Copy Stage
With 1 link in, 1 link out:
the Copy Stage is the ultimate "no-op" (place-holder): – Partitioners– Sort / Remove Duplicates– Rename, Drop column
… can be inserted on: – input link (Partitioning): Partitioners, Sort, Remove Duplicates)– output link (Mapping page): Rename, Drop.
Sometimes replace the transformer:– Rename,– Drop, – Implicit type Conversions– Link Constraint – break up schema
![Page 163: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/163.jpg)
Developing Jobs
1. Keep it simple• Jobs with many stages are hard to debug and maintain.
2. Start small and Build to final Solution• Use view data, copy, and peek. • Start from source and work out.• Develop with a 1 node configuration file.
3. Solve the business problem before the performance problem.• Don’t worry too much about partitioning until the
sequential flow works as expected.
4. If you have to write to Disk use a Persistent Data set.
![Page 164: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/164.jpg)
Final Result
![Page 165: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/165.jpg)
Good Things to Have in each Job
Use job parameters
Some helpful environmental variables to add to job parameters– $APT_DUMP_SCORE
Report OSH to message log
– $APT_CONFIG_FILEEstablishes runtime parameters to EE engine; I.e. Degree of
parallelization
![Page 166: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/166.jpg)
Setting Job Parameters
Click to add environment
variables
![Page 167: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/167.jpg)
DUMP SCORE Output
Double-click
Mapping Node--> partition
Setting APT_DUMP_SCORE yields:
PartitonerAnd
Collector
![Page 168: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/168.jpg)
Use Multiple Configuration Files
Make a set for 1X, 2X,….
Use different ones for test versus production
Include as a parameter in each job
![Page 169: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/169.jpg)
Exercise
Complete exercise 3-1
![Page 170: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/170.jpg)
Module 4
DBMS Access
![Page 171: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/171.jpg)
Objectives
Understand how DSEE reads and writes records to an RDBMS
Understand how to handle nulls on DBMS lookup
Utilize this knowledge to:– Read and write database tables– Use database tables to lookup data– Use null handling options to clean data
![Page 172: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/172.jpg)
Parallel Database Connectivity
TraditionalTraditionalClient-ServerClient-Server Enterprise EditionEnterprise Edition
SortSort
ClientClient
Parallel RDBMSParallel RDBMS
ClientClient
ClientClient
ClientClient
ClientClient
Parallel RDBMSParallel RDBMS
Only RDBMS is running in parallel Each application has only one connection Suitable only for small data volumes
Parallel server runs APPLICATIONS Application has parallel connections to RDBMS Suitable for large data volumes Higher levels of integration possible
ClientClient
LoadLoad
![Page 173: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/173.jpg)
RDBMS AccessSupported Databases
Enterprise Edition provides high performance / scalable interfaces for:
DB2
Informix
Oracle
Teradata
![Page 174: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/174.jpg)
Automatically convert RDBMS table layouts to/from Enterprise Edition Table Definitions
RDBMS nulls converted to/from nullable field values
Support for standard SQL syntax for specifying:– field list for SELECT statement– filter for WHERE clause
Can write an explicit SQL query to access RDBMS EE supplies additional information in the SQL query
RDBMS Access
![Page 175: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/175.jpg)
RDBMS Stages
DB2/UDB Enterprise
Informix Enterprise
Oracle Enterprise
Teradata Enterprise
![Page 176: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/176.jpg)
RDBMS Usage
As a source– Extract data from table (stream link)
– Extract as table, generated SQL, or user-defined SQL– User-defined can perform joins, access views
– Lookup (reference link)– Normal lookup is memory-based (all table data read into
memory)– Can perform one lookup at a time in DBMS (sparse option)– Continue/drop/fail options
As a target– Inserts– Upserts (Inserts and updates)– Loader
![Page 177: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/177.jpg)
RDBMS Source – Stream Link
Stream link
![Page 178: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/178.jpg)
DBMS Source - User-defined SQL
Columns in SQL statement must match the meta data in columns tab
![Page 179: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/179.jpg)
Exercise
User-defined SQL– Exercise 4-1
![Page 180: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/180.jpg)
DBMS Source – Reference Link
Reject link
![Page 181: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/181.jpg)
Lookup Reject Link
“Output” option automatically creates the reject link
![Page 182: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/182.jpg)
Null Handling
Must handle null condition if lookup record is not found and “continue” option is chosen
Can be done in a transformer stage
![Page 183: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/183.jpg)
Lookup Stage Mapping
Link name
![Page 184: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/184.jpg)
Lookup Stage Properties
Reference link
Must have same column name in input and reference links.
You will get the results of the lookup in the output column.
![Page 185: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/185.jpg)
DBMS as a Target
![Page 186: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/186.jpg)
DBMS As Target
Write Methods– Delete– Load– Upsert– Write (DB2)
Write mode for load method– Truncate– Create– Replace– Append
![Page 187: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/187.jpg)
Target Properties
Upsert mode determines options
Generated code can be copied
![Page 188: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/188.jpg)
Checking for Nulls
Use Transformer stage to test for fields with null values (Use IsNull functions)
In Transformer, can reject or load default value
![Page 189: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/189.jpg)
Exercise
Complete exercise 4-2
![Page 190: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/190.jpg)
Module 5
Platform Architecture
![Page 191: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/191.jpg)
Objectives
Understand how Enterprise Edition Framework processes data
You will be able to:– Read and understand OSH– Perform troubleshooting
![Page 192: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/192.jpg)
Concepts
The Enterprise Edition Platform– Script language - OSH (generated by DataStage
Parallel Canvas, and run by DataStage Director)– Communication - conductor,section leaders,players.– Configuration files (only one active at a time,
describes H/W)– Meta data - schemas/tables– Schema propagation - RCP– EE extensibility - Buildop, Wrapper– Datasets (data in Framework's internal
representation)
![Page 193: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/193.jpg)
Output Data Set schema:prov_num:int16;member_num:int8;custid:int32;
Input Data Set schema:prov_num:int16;member_num:int8;custid:int32;
EE Stages Involve A Series Of Processing Steps
Inpu
tInte
rface
Pa
rtition
er
Bu
siness
Log
ic
Ou
tput
Interface
EE Stage
• Piece of Application Logic Running Against Individual Records
• Parallel or Sequential
DS-EE Stage Elements
![Page 194: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/194.jpg)
• EE Delivers Parallelism in Two Ways– Pipeline– Partition
• Block Buffering Between Components – Eliminates Need for Program
Load Balancing– Maintains Orderly Data Flow
Pipeline
Partition
Dual Parallelism Eliminates Bottlenecks!
Producer
Consumer
DSEE Stage Execution
![Page 195: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/195.jpg)
Stages Control Partition Parallelism
Execution Mode (sequential/parallel) is controlled by Stage– default = parallel for most Ascential-supplied Stages– Developer can override default mode– Parallel Stage inserts the default partitioner (Auto) on its input links – Sequential Stage inserts the default collector (Auto) on its input links – Developer can override default
execution mode (parallel/sequential) of Stage > Advanced tab
choice of partitioner/collector on Input > Partitioning tab
![Page 196: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/196.jpg)
How Parallel Is It?
Degree of parallelism is determined by the configuration file– Total number of logical nodes in default pool, or a
subset if using "constraints". Constraints are assigned to specific pools as defined in
configuration file and can be referenced in the stage
![Page 197: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/197.jpg)
OSH
DataStage EE GUI generates OSH scripts– Ability to view OSH turned on in Administrator– OSH can be viewed in Designer using job properties
The Framework executes OSH
What is OSH?– Orchestrate shell– Has a UNIX command-line interface
![Page 198: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/198.jpg)
OSH Script
An osh script is a quoted string which specifies:– The operators and connections of a single
Orchestrate step – In its simplest form, it is:
osh “op < in.ds > out.ds”
Where:– op is an Orchestrate operator– in.ds is the input data set– out.ds is the output data set
![Page 199: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/199.jpg)
OSH Operators
OSH Operator is an instance of a C++ class inheriting from APT_Operator
Developers can create new operators
Examples of existing operators:– Import– Export– RemoveDups
![Page 200: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/200.jpg)
Enable Visible OSH in Administrator
Will be enabled for all projects
![Page 201: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/201.jpg)
View OSH in Designer
Schema
Operator
![Page 202: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/202.jpg)
OSH Practice
Exercise 5-1 – Instructor demo (optional)
![Page 203: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/203.jpg)
• Operators
• Datasets: set of rows processed by Framework
– Orchestrate data sets:
– persistent (terminal) *.ds, and
– virtual (internal) *.v.
– Also: flat “file sets” *.fs
• Schema: data description (metadata) for datasets and links.
Elements of a Framework Program
![Page 204: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/204.jpg)
• Consist of Partitioned Data and Schema
• Can be Persistent (*.ds) or Virtual (*.v, Link)
• Overcome 2 GB File Limit
=
What you program: What gets processed:
. . .
Multiple files per partitionEach file up to 2GBytes (or larger)
Operator A
Operator A
Operator A
Operator A
Node 1 Node 2 Node 3 Node 4
data filesof x.ds
$ osh “operator_A > x.ds“
GUI
OSH
Datasets
What gets generated:
![Page 205: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/205.jpg)
Computing Architectures: Definition
Clusters and MPP Systems
Shared Disk Shared Nothing
Uniprocessor
Dedicated Disk
• IBM, Sun, HP, Compaq• 2 to 64 processors• Majority of installations
Shared Memory
SMP System(Symmetric Multiprocessor)
DiskDisk
CPU
Memory
CPU CPU CPU
• PC• Workstation• Single processor server
CPU
• 2 to hundreds of processors• MPP: IBM and NCR Teradata• each node is a uniprocessor or SMP
CPU
Disk
Memory
CPU
Disk
Memory
CPU
Disk
Memory
CPU
Disk
Memory
![Page 206: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/206.jpg)
Job Execution:Orchestrate
Conductor Node
C
Processing Node
SL
PP P
SL
PP P
Processing Node
• Conductor - initial DS/EE process– Step Composer
– Creates Section Leader processes (one/node)
– Consolidates massages, outputs them
– Manages orderly shutdown.
• Section Leader – Forks Players processes (one/Stage)
– Manages up/down communication.
• Players– The actual processes associated with Stages
– Combined players: one process only
– Send stderr to SL
– Establish connections to other players for data flow
– Clean up upon completion.• Communication:- SMP: Shared Memory- MPP: TCP
![Page 207: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/207.jpg)
Working with Configuration Files
You can easily switch between config files:'1-node' file - for sequential execution, lighter reports—handy for
testing 'MedN-nodes' file - aims at a mix of pipeline and data-partitioned parallelism
'BigN-nodes' file - aims at full data-partitioned parallelism
Only one file is active while a step is runningThe Framework queries (first) the environment variable:
$APT_CONFIG_FILE
# nodes declared in the config file needs not match # CPUsSame configuration file can be used in development and target
machines
![Page 208: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/208.jpg)
SchedulingNodes, Processes, and CPUs
DS/EE does not: – know how many CPUs are available– schedule
Who knows what?
Who does what?– DS/EE creates (Nodes*Ops) Unix processes – The O/S schedules these processes on the CPUs
Nodes = # logical nodes declared in config. fileOps = # ops. (approx. # blue boxes in V.O.)Processes = # Unix processesCPUs = # available CPUs
Nodes Ops Processes CPUs
User Y N
Orchestrate Y Y Nodes * Ops N
O/S " Y
![Page 209: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/209.jpg)
{ node "n1" { fastname "s1" pool "" "n1" "s1" "app2" "sort" resource disk "/orch/n1/d1" {} resource disk "/orch/n1/d2" {} resource scratchdisk "/temp" {"sort"} } node "n2" { fastname "s2" pool "" "n2" "s2" "app1" resource disk "/orch/n2/d1" {} resource disk "/orch/n2/d2" {} resource scratchdisk "/temp" {} } node "n3" { fastname "s3" pool "" "n3" "s3" "app1" resource disk "/orch/n3/d1" {} resource scratchdisk "/temp" {} } node "n4" { fastname "s4" pool "" "n4" "s4" "app1" resource disk "/orch/n4/d1" {} resource scratchdisk "/temp" {} }
1
43
2
Configuring DSEE – Node Pools
![Page 210: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/210.jpg)
{ node "n1" { fastname "s1" pool "" "n1" "s1" "app2" "sort" resource disk "/orch/n1/d1" {} resource disk "/orch/n1/d2" {"bigdata"} resource scratchdisk "/temp" {"sort"} } node "n2" { fastname "s2" pool "" "n2" "s2" "app1" resource disk "/orch/n2/d1" {} resource disk "/orch/n2/d2" {"bigdata"} resource scratchdisk "/temp" {} } node "n3" { fastname "s3" pool "" "n3" "s3" "app1" resource disk "/orch/n3/d1" {} resource scratchdisk "/temp" {} } node "n4" { fastname "s4" pool "" "n4" "s4" "app1" resource disk "/orch/n4/d1" {} resource scratchdisk "/temp" {} }
1
43
2
Configuring DSEE – Disk Pools
![Page 211: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/211.jpg)
node
1node
2
Parallel to parallel flow may incur reshuffling:Records may jump between nodes
partitioner
Re-Partitioning
![Page 212: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/212.jpg)
Partitioning Methods
Auto
Hash
Entire
Range
Range Map
![Page 213: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/213.jpg)
• Collectors combine partitions of a dataset into a single input stream to a sequential Stage
data partitions
collector
sequential Stage
...
–Collectors do NOT synchronize data
Collectors
![Page 214: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/214.jpg)
Partitioning and Repartitioning Are Visible On Job Design
![Page 215: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/215.jpg)
Partitioning and Collecting Icons
Partitioner Collector
![Page 216: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/216.jpg)
Setting a Node Constraint in the GUI
![Page 217: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/217.jpg)
Reading Messages in Director
Set APT_DUMP_SCORE to true
Can be specified as job parameter
Messages sent to Director log
If set, parallel job will produce a report showing the operators, processes, and datasets in the running job
![Page 218: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/218.jpg)
Messages With APT_DUMP_SCORE = True
![Page 219: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/219.jpg)
Exercise
Complete exercise 5-2
![Page 220: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/220.jpg)
Module 6
Transforming Data
![Page 221: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/221.jpg)
Module Objectives
Understand ways DataStage allows you to transform data
Use this understanding to:– Create column derivations using user-defined code or
system functions– Filter records based on business criteria– Control data flow based on data conditions
![Page 222: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/222.jpg)
Transformed Data
Transformed data is:– Outgoing column is a derivation that may, or may not,
include incoming fields or parts of incoming fields– May be comprised of system variables
Frequently uses functions performed on something (ie. incoming columns)– Divided into categories – I.e.
Date and timeMathematicalLogicalNull handlingMore
![Page 223: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/223.jpg)
Stages Review
Stages that can transform data– Transformer
ParallelBasic (from Parallel palette)
– Aggregator (discussed in later module)
Sample stages that do not transform data– Sequential– FileSet– DataSet– DBMS
![Page 224: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/224.jpg)
Transformer Stage Functions
Control data flow
Create derivations
![Page 225: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/225.jpg)
Flow Control
Separate records flow down links based on data condition – specified in Transformer stage constraints
Transformer stage can filter records
Other stages can filter records but do not exhibit advanced flow control– Sequential can send bad records down reject link– Lookup can reject records based on lookup failure– Filter can select records based on data value
![Page 226: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/226.jpg)
Rejecting Data
Reject option on sequential stage– Data does not agree with meta data– Output consists of one column with binary data type
Reject links (from Lookup stage) result from the drop option of the property “If Not Found”– Lookup “failed”– All columns on reject link (no column mapping option)
Reject constraints are controlled from the constraint editor of the transformer– Can control column mapping– Use the “Other/Log” checkbox
![Page 227: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/227.jpg)
Rejecting Data Example
“If Not Found” property
Constraint – Other/log option
Property Reject Mode = Output
![Page 228: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/228.jpg)
Transformer Stage Properties
![Page 229: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/229.jpg)
Transformer Stage Variables
First of transformer stage entities to execute
Execute in order from top to bottom– Can write a program by using one stage variable to
point to the results of a previous stage variable
Multi-purpose– Counters– Hold values for previous rows to make comparison– Hold derivations to be used in multiple field dervations– Can be used to control execution of constraints
![Page 230: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/230.jpg)
Stage Variables
Show/Hide button
![Page 231: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/231.jpg)
Transforming Data
Derivations– Using expressions– Using functions
Date/time
Transformer Stage Issues– Sometimes require sorting before the transformer
stage – I.e. using stage variable as accumulator and need to break on change of column value
Checking for nulls
![Page 232: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/232.jpg)
Checking for Nulls
Nulls can get introduced into the dataflow because of failed lookups and the way in which you chose to handle this condition
Can be handled in constraints, derivations, stage variables, or a combination of these
![Page 233: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/233.jpg)
Transformer - Handling Rejects
Constraint Rejects– All expressions are
false and reject row is checked
![Page 234: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/234.jpg)
Transformer: Execution Order
• Derivations in stage variables are executed first
• Constraints are executed before derivations
• Column derivations in earlier links are executed before later links
• Derivations in higher columns are executed before lower columns
![Page 235: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/235.jpg)
Parallel Palette - Two Transformers
All > Processing >
Transformer
Is the non-Universe transformer
Has a specific set of functions
No DS routines available
Parallel > Processing
Basic Transformer
Makes server style transforms available on the parallel palette
Can use DS routines
• Program in Basic for both transformers
![Page 236: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/236.jpg)
Transformer Functions From Derivation Editor
Date & Time
Logical
Null Handling
Number
String
Type Conversion
![Page 237: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/237.jpg)
Exercise
Complete exercises 6-1, 6-2, and 6-3
![Page 238: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/238.jpg)
Module 7
Sorting Data
![Page 239: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/239.jpg)
Objectives
Understand DataStage EE sorting options
Use this understanding to create sorted list of data to enable functionality within a transformer stage
![Page 240: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/240.jpg)
Sorting Data
Important because– Some stages require sorted input– Some stages may run faster – I.e Aggregator
Can be performed – Option within stages (use input > partitioning tab and
set partitioning to anything other than auto)– As a separate stage (more complex sorts)
![Page 241: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/241.jpg)
Sorting Alternatives
• Alternative representation of same flow:
![Page 242: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/242.jpg)
Sort Option on Stage Link
![Page 243: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/243.jpg)
Sort Stage
![Page 244: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/244.jpg)
Sort Utility
DataStage – the default
UNIX
![Page 245: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/245.jpg)
Sort Stage - Outputs
Specifies how the output is derived
![Page 246: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/246.jpg)
Sort Specification Options
Input Link Property– Limited functionality– Max memory/partition is 20 MB, then spills to scratch
Sort Stage– Tunable to use more memory before spilling to
scratch.
Note: Spread I/O by adding more scratch file systems to each node of the APT_CONFIG_FILE
![Page 247: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/247.jpg)
Removing Duplicates
Can be done by Sort stage – Use unique option
OR
Remove Duplicates stage– Has more sophisticated ways to remove duplicates
![Page 248: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/248.jpg)
Exercise
Complete exercise 7-1
![Page 249: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/249.jpg)
Module 8
Combining Data
![Page 250: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/250.jpg)
Objectives
Understand how DataStage can combine data using the Join, Lookup, Merge, and Aggregator stages
Use this understanding to create jobs that will– Combine data from separate input streams– Aggregate data to form summary totals
![Page 251: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/251.jpg)
Combining Data
There are two ways to combine data:
– Horizontally: Several input links; one output link (+ optional rejects) made of columns from different input links. E.g.,
JoinsLookupMerge
– Vertically: One input link, one output link with column combining values from all input rows. E.g.,
Aggregator
![Page 252: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/252.jpg)
Join, Lookup & Merge Stages
These "three Stages" combine two or more input links according to values of user-designated "key" column(s).
They differ mainly in:– Memory usage– Treatment of rows with unmatched key values– Input requirements (sorted, de-duplicated)
![Page 253: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/253.jpg)
Not all Links are Created Equal
Joins Lookup Merge
Primary Input: port 0 Left Source MasterSecondary Input(s): ports 1,… Right LU Table(s) Update(s)
• Enterprise Edition distinguishes between:- The Primary Input (Framework port 0)- Secondary - in some cases "Reference" (other ports)
• Naming convention:
Tip: Check "Input Ordering" tab to make sure
intended Primary is listed first
![Page 254: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/254.jpg)
Join Stage Editor
One of four variants:– Inner– Left Outer– Right Outer– Full Outer
Several key columns allowed
Link Order immaterial for Inner and Full Outer Joins (but VERY important for Left/Right Outer and Lookup and Merge)
![Page 255: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/255.jpg)
1. The Join Stage
Four types:
2 sorted input links, 1 output link – "left outer" on primary input, "right outer" on secondary input– Pre-sort make joins "lightweight": few rows need to be in RAM
• Inner• Left Outer• Right Outer• Full Outer
![Page 256: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/256.jpg)
2. The Lookup Stage
Combines: – one source link with– one or more duplicate-free table links
no pre-sort necessaryallows multiple keys LUTsflexible exception handling forsource input rows with no match
Lookup
Sourceinput
One or more tables (LUTs)
Output Reject
0
1
2
0
1
![Page 257: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/257.jpg)
The Lookup Stage
Lookup Tables should be small enough to fit into physical memory (otherwise, performance hit due to paging)
On an MPP you should partition the lookup tables using entire partitioning method, or partition them the same way you partition the source link
On an SMP, no physical duplication of a Lookup Table occurs
![Page 258: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/258.jpg)
The Lookup Stage
Lookup File Set – Like a persistent data set only it
contains metadata about the key.– Useful for staging lookup tables
RDBMS LOOKUP– NORMAL
Loads to an in memory hash table first– SPARSE
Select for each row. Might become a performance
bottleneck.
![Page 259: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/259.jpg)
3. The Merge Stage
Combines – one sorted, duplicate-free master (primary) link with – one or more sorted update (secondary) links.– Pre-sort makes merge "lightweight": few rows need to be in RAM (as with
joins, but opposite to lookup). Follows the Master-Update model:
– Master row and one or more updates row are merged if they have the same value in user-specified key column(s).
– A non-key column occurs in several inputs? The lowest input port number prevails (e.g., master over update; update values are ignored)
– Unmatched ("Bad") master rows can be either kept dropped
– Unmatched ("Bad") update rows in input link can be captured in a "reject" link
– Matched update rows are consumed.
![Page 260: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/260.jpg)
The Merge Stage
Allows composite keys
Multiple update links
Matched update rows are consumed
Unmatched updates can be captured
Lightweight
Space/time tradeoff: presorts vs. in-RAM table
Master One or more updates
Output Rejects
Merge
0
0
21
21
![Page 261: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/261.jpg)
In this table:• , <comma> = separator between primary and secondary input links
(out and reject links)
Synopsis:Joins, Lookup, & Merge
Joins Lookup Merge
Model RDBMS-style relational Source - in RAM LU Table Master -Update(s)Memory usage light heavy light
# and names of Inputs exactly 2: 1 left, 1 right 1 Source, N LU Tables 1 Master, N Update(s)
Mandatory Input Sort both inputs no all inputsDuplicates in primary input OK (x-product) OK Warning!Duplicates in secondary input(s) OK (x-product) Warning! OK only when N = 1Options on unmatched primary NONE [fail] | continue | drop | reject [keep] | dropOptions on unmatched secondary NONE NONE capture in reject set(s)
On match, secondary entries are reusable reusable consumed
# Outputs 1 1 out, (1 reject) 1 out, (N rejects)Captured in reject set(s) Nothing (N/A) unmatched primary entries unmatched secondary entries
![Page 262: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/262.jpg)
The Aggregator Stage
Purpose: Perform data aggregations
Specify:
Zero or more key columns that define the aggregation units (or groups)
Columns to be aggregated
Aggregation functions:count (nulls/non-nulls) sum max/min/range
The grouping method (hash table or pre-sort) is a performance issue
![Page 263: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/263.jpg)
Grouping Methods
Hash: results for each aggregation group are stored in a hash table, and the table is written out after all input has been processed– doesn’t require sorted data– good when number of unique groups is small. Running
tally for each group’s aggregate calculations need to fit easily into memory. Require about 1KB/group of RAM.
– Example: average family income by state, requires .05MB of RAM
Sort: results for only a single aggregation group are kept in memory; when new group is seen (key value changes), current group written out.– requires input sorted by grouping keys– can handle unlimited numbers of groups– Example: average daily balance by credit card
![Page 264: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/264.jpg)
Aggregator Functions
Sum
Min, max
Mean
Missing value count
Non-missing value count
Percent coefficient of variation
![Page 265: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/265.jpg)
Aggregator Properties
![Page 266: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/266.jpg)
Aggregation Types
Aggregation types
![Page 267: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/267.jpg)
Containers
Two varieties– Local– Shared
Local– Simplifies a large, complex diagram
Shared– Creates reusable object that many jobs can include
![Page 268: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/268.jpg)
Creating a Container
Create a job
Select (loop) portions to containerize
Edit > Construct container > local or shared
![Page 269: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/269.jpg)
Using a Container
Select as though it were a stage
![Page 270: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/270.jpg)
Exercise
Complete exercise 8-1
![Page 271: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/271.jpg)
Module 9
Configuration Files
![Page 272: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/272.jpg)
Objectives
Understand how DataStage EE uses configuration files to determine parallel behavior
Use this understanding to– Build a EE configuration file for a computer system– Change node configurations to support adding
resources to processes that need them– Create a job that will change resource allocations at
the stage level
![Page 273: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/273.jpg)
Configuration File Concepts
Determine the processing nodes and disk space connected to each node
When system changes, need only change the configuration file – no need to recompile jobs
When DataStage job runs, platform reads configuration file– Platform automatically scales the application to fit the
system
![Page 274: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/274.jpg)
Processing Nodes Are
Locations on which the framework runs applications
Logical rather than physical construct
Do not necessarily correspond to the number of CPUs in your system– Typically one node for two CPUs
Can define one processing node for multiple physical nodes or multiple processing nodes for one physical node
![Page 275: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/275.jpg)
Optimizing Parallelism
Degree of parallelism determined by number of nodes defined
Parallelism should be optimized, not maximized– Increasing parallelism distributes work load but also
increases Framework overhead
Hardware influences degree of parallelism possible
System hardware partially determines configuration
![Page 276: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/276.jpg)
More Factors to Consider
Communication amongst operators– Should be optimized by your configuration– Operators exchanging large amounts of data should
be assigned to nodes communicating by shared memory or high-speed link
SMP – leave some processors for operating system
Desirable to equalize partitioning of data
Use an experimental approach– Start with small data sets– Try different parallelism while scaling up data set sizes
![Page 277: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/277.jpg)
Factors Affecting Optimal Degree of Parallelism
CPU intensive applications– Benefit from the greatest possible parallelism
Applications that are disk intensive– Number of logical nodes equals the number of disk
spindles being accessed
![Page 278: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/278.jpg)
Configuration File
Text file containing string data that is passed to the Framework– Sits on server side– Can be displayed and edited
Name and location found in environmental variable APT_CONFIG_FILE
Components– Node– Fast name– Pools– Resource
![Page 279: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/279.jpg)
Node Options
Node name – name of a processing node used by EE – Typically the network name– Use command uname –n to obtain network name
Fastname – – Name of node as referred to by fastest network in the system– Operators use physical node name to open connections– NOTE: for SMP, all CPUs share single connection to network
Pools– Names of pools to which this node is assigned– Used to logically group nodes– Can also be used to group resources
Resource– Disk– Scratchdisk
![Page 280: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/280.jpg)
Sample Configuration File
{
node “Node1"
{
fastname "BlackHole"
pools "" "node1"
resource disk "/usr/dsadm/Ascential/DataStage/Datasets" {pools "" }
resource scratchdisk "/usr/dsadm/Ascential/DataStage/Scratch" {pools "" }
}
}
![Page 281: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/281.jpg)
Disk Pools
Disk pools allocate storage
By default, EE uses the default pool, specified by “”
pool "bigdata"
![Page 282: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/282.jpg)
Sorting Requirements
Resource pools can also be specified for sorting:
The Sort stage looks first for scratch disk resources in a “sort” pool, and then in the default disk pool
![Page 283: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/283.jpg)
{ node "n1" { fastname “s1" pool "" "n1" "s1" "sort" resource disk "/data/n1/d1" {} resource disk "/data/n1/d2" {} resource scratchdisk "/scratch" {"sort"} } node "n2" { fastname "s2" pool "" "n2" "s2" "app1" resource disk "/data/n2/d1" {} resource scratchdisk "/scratch" {} } node "n3" { fastname "s3" pool "" "n3" "s3" "app1" resource disk "/data/n3/d1" {} resource scratchdisk "/scratch" {} } node "n4" { fastname "s4" pool "" "n4" "s4" "app1" resource disk "/data/n4/d1" {} resource scratchdisk "/scratch" {} } ...}
{ node "n1" { fastname “s1" pool "" "n1" "s1" "sort" resource disk "/data/n1/d1" {} resource disk "/data/n1/d2" {} resource scratchdisk "/scratch" {"sort"} } node "n2" { fastname "s2" pool "" "n2" "s2" "app1" resource disk "/data/n2/d1" {} resource scratchdisk "/scratch" {} } node "n3" { fastname "s3" pool "" "n3" "s3" "app1" resource disk "/data/n3/d1" {} resource scratchdisk "/scratch" {} } node "n4" { fastname "s4" pool "" "n4" "s4" "app1" resource disk "/data/n4/d1" {} resource scratchdisk "/scratch" {} } ...}
4 5
1
6
2 3
Another Configuration File Example
![Page 284: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/284.jpg)
Resource Types
Disk
Scratchdisk
DB2
Oracle
Saswork
Sortwork
Can exist in a pool– Groups resources together
![Page 285: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/285.jpg)
Using Different Configurations
Lookup stage where DBMS is using a sparse lookup type
![Page 286: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/286.jpg)
Building a Configuration File
Scoping the hardware:– Is the hardware configuration SMP, Cluster, or MPP?– Define each node structure (an SMP would be single
node): Number of CPUs CPU speed Available memory Available page/swap space Connectivity (network/back-panel speed)
– Is the machine dedicated to EE? If not, what other applications are running on it?
– Get a breakdown of the resource usage (vmstat, mpstat, iostat)
– Are there other configuration restrictions? E.g. DB only runs on certain nodes and ETL cannot run on them?
![Page 287: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/287.jpg)
Exercise
Complete exercise 9-1 and 9-2
![Page 288: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/288.jpg)
Module 10
Extending DataStage EE
![Page 289: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/289.jpg)
Objectives
Understand the methods by which you can add functionality to EE
Use this understanding to:– Build a DataStage EE stage that handles special
processing needs not supplied with the vanilla stages– Build a DataStage EE job that uses the new stage
![Page 290: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/290.jpg)
EE Extensibility Overview
Sometimes it will be to your advantage to leverage EE’s extensibility. This extensibility includes:
Wrappers
Buildops
Custom Stages
![Page 291: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/291.jpg)
When To Leverage EE Extensibility
Types of situations:Complex business logic, not easily accomplished using standard EE stagesReuse of existing C, C++, Java, COBOL, etc…
![Page 292: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/292.jpg)
Wrappers vs. Buildop vs. Custom
Wrappers are good if you cannot or do not want to modify the application and performance is not critical.
Buildops: good if you need custom coding but do not need dynamic (runtime-based) input and output interfaces.
Custom (C++ coding using framework API): good if you need custom coding and need dynamic input and output interfaces.
![Page 293: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/293.jpg)
Building “Wrapped” Stages
You can “wrapper” a legacy executable: Binary Unix command Shell script
… and turn it into a Enterprise Edition stage capable, among other things, of parallel execution…
As long as the legacy executable is: amenable to data-partition parallelism
no dependencies between rows
pipe-safe can read rows sequentially no random access to data
![Page 294: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/294.jpg)
Wrappers (Cont’d)
Wrappers are treated as a black box EE has no knowledge of contents
EE has no means of managing anything that occurs inside the wrapper
EE only knows how to export data to and import data from the wrapper
User must know at design time the intended behavior of the wrapper and its schema interface
If the wrappered application needs to see all records prior to processing, it cannot run in parallel.
![Page 295: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/295.jpg)
LS Example
Can this command be wrappered?
![Page 296: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/296.jpg)
Creating a Wrapper
Used in this job ---
To create the “ls” stage
![Page 297: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/297.jpg)
Creating Wrapped Stages
From Manager:Right-Click on Stage Type
> New Parallel Stage > Wrapped
We will "Wrapper” an existing Unix executables – the ls command
Wrapper Starting Point
![Page 298: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/298.jpg)
Wrapper - General Page
Unix command to be wrapped
Name of stage
![Page 299: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/299.jpg)
Conscientiously maintaining the Creator page for all your wrapped stages will eventually earn you the thanks of others.
The "Creator" Page
![Page 300: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/300.jpg)
Wrapper – Properties Page
If your stage will have properties appear, complete the Properties page
This will be the name of the property as it
appears in your stage
![Page 301: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/301.jpg)
Wrapper - Wrapped Page
Interfaces – input and output columns - these should first be entered into the
table definitions meta data (DS Manager); let’s do that now.
![Page 302: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/302.jpg)
• Layout interfaces describe what columns the stage:– Needs for its inputs (if any)
– Creates for its outputs (if any)
– Should be created as tables with columns in Manager
Interface schemas
![Page 303: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/303.jpg)
Column Definition for Wrapper Interface
![Page 304: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/304.jpg)
How Does the Wrapping Work?
– Define the schema for export and importSchemas become interface
schemas of the operator and allow for by-name column access
import
export
stdout ornamed pipe
stdin ornamed pipe
UNIX executable
output schema
input schema
QUIZ: Why does export precede import?
![Page 305: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/305.jpg)
Update the Wrapper Interfaces
This wrapper will have no input interface – i.e. no input link. The location will come as a job parameter that will be passed to the appropriate stage property. Therefore, only the Output tab entry is needed.
![Page 306: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/306.jpg)
Resulting Job
Wrapped stage
![Page 307: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/307.jpg)
Job Run
Show file from Designer palette
![Page 308: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/308.jpg)
Wrapper Story: Cobol Application
Hardware Environment: – IBM SP2, 2 nodes with 4 CPU’s per node.
Software:– DB2/EEE, COBOL, EE
Original COBOL Application:– Extracted source table, performed lookup against table in DB2,
and Loaded results to target table.– 4 hours 20 minutes sequential execution
Enterprise Edition Solution:– Used EE to perform Parallel DB2 Extracts and Loads– Used EE to execute COBOL application in Parallel– EE Framework handled data transfer between
DB2/EEE and COBOL application– 30 minutes 8-way parallel execution
![Page 309: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/309.jpg)
Buildops
Buildop provides a simple means of extending beyond the functionality provided by EE, but does not use an existing executable (like the wrapper)
Reasons to use Buildop include: Speed / Performance
Complex business logic that cannot be easily represented using existing stages
– Lookups across a range of values– Surrogate key generation– Rolling aggregates
Build once and reusable everywhere within project, no shared container necessary
Can combine functionality from different stages into one
![Page 310: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/310.jpg)
BuildOps
– The DataStage programmer encapsulates the business logic
– The Enterprise Edition interface called “buildop” automatically performs the tedious, error-prone tasks: invoke needed header files, build the necessary “plumbing” for a correct and efficient parallel execution.
– Exploits extensibility of EE Framework
![Page 311: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/311.jpg)
From Manager (or Designer):Repository pane:
Right-Click on Stage Type > New Parallel Stage > {Custom | Build | Wrapped}
• "Build" stages from within Enterprise Edition
• "Wrapping” existing “Unix” executables
BuildOp Process Overview
![Page 312: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/312.jpg)
General Page
Identicalto Wrappers,except: Under the Build
Tab, your program!
![Page 313: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/313.jpg)
Logic Tab forBusiness Logic
Enter Business C/C++ logic and arithmetic in four pages under the Logic tab
Main code section goes in Per-Record page- it will be applied to all rows
NOTE: Code will need to be Ansi C/C++ compliant. If code does not compile outside of EE, it won’t compile within EE either!
![Page 314: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/314.jpg)
Code Sections under Logic Tab
Temporary variables declared [and initialized] here
Logic here is executed once BEFORE processing the FIRST row
Logic here is executed once AFTER processing the LAST row
![Page 315: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/315.jpg)
I/O and Transfer
Under Interface tab: Input, Output & Transfer pages
Optional renaming of output port from default "out0"
Write row
Input page: 'Auto Read'Read next row
In-RepositoryTable Definition
'False' setting,not to interfere with Transfer page
First line: output 0
![Page 316: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/316.jpg)
I/O and Transfer
• Transfer all columns from input to output.• If page left blank or Auto Transfer = "False" (and RCP = "False") Only columns in output Table Definition are written
First line:Transfer of index 0
![Page 317: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/317.jpg)
BuildOp Simple Example
Example - sumNoTransfer– Add input columns "a" and "b"; ignore other columns
that might be present in input– Produce a new "sum" column– Do not transfer input columns
sumNoTransfera:int32; b:int32
sum:int32
![Page 318: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/318.jpg)
NO TRANSFER
- RCP set to "False" in stage definition and
- Transfer page left blank, or Auto Transfer = "False"
• Effects:
- input columns "a" and "b" are not transferred
- only new column "sum" is transferred
Compare with transfer ON…
From Peek:
No Transfer
![Page 319: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/319.jpg)
Transfer
TRANSFER
- RCP set to "True" in stage definition or
- Auto Transfer set to "True"
• Effects:- new column "sum" is transferred, as well as- input columns "a" and "b" and- input column "ignored" (present in input, but
not mentioned in stage)
![Page 320: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/320.jpg)
Columns
DS-EE type
Defined in Table Definitions
Value refreshed from row to row
Temp C++ variables
C/C++ type
Need declaration (in Definitions or Pre-Loop page)
Value persistent throughout "loop" over rows, unless modified in code
Columns vs. Temporary C++ Variables
![Page 321: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/321.jpg)
Exercise
Complete exercise 10-1 and 10-2
![Page 322: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/322.jpg)
Exercise
Complete exercises 10-3 and 10-4
![Page 323: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/323.jpg)
Custom Stage
Reasons for a custom stage:– Add EE operator not already in DataStage EE– Build your own Operator and add to DataStage EE
Use EE API
Use Custom Stage to add new operator to EE canvas
![Page 324: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/324.jpg)
Custom Stage
DataStage Manager > select Stage Types branch > right click
![Page 325: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/325.jpg)
Custom Stage
Name of Orchestrate operator to be used
Number of input and output links allowed
![Page 326: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/326.jpg)
Custom Stage – Properties Tab
![Page 327: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/327.jpg)
The Result
![Page 328: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/328.jpg)
Module 11
Meta Data in DataStage EE
![Page 329: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/329.jpg)
Objectives
Understand how EE uses meta data, particularly schemas and runtime column propagation
Use this understanding to:– Build schema definition files to be invoked in
DataStage jobs– Use RCP to manage meta data usage in EE jobs
![Page 330: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/330.jpg)
Establishing Meta Data
Data definitions– Recordization and columnization– Fields have properties that can be set at individual
field levelData types in GUI are translated to types used by EE
– Described as properties on the format/columns tab (outputs or inputs pages) OR
– Using a schema file (can be full or partial)
Schemas– Can be imported into Manager– Can be pointed to by some job stages (i.e. Sequential)
![Page 331: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/331.jpg)
Data Formatting – Record Level
Format tab
Meta data described on a record basis
Record level properties
![Page 332: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/332.jpg)
Data Formatting – Column Level
Defaults for all columns
![Page 333: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/333.jpg)
Column Overrides
Edit row from within the columns tab
Set individual column properties
![Page 334: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/334.jpg)
Extended Column Properties
Field and
string settings
![Page 335: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/335.jpg)
Extended Properties – String Type
Note the ability to convert ASCII to EBCDIC
![Page 336: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/336.jpg)
Editing Columns
Properties depend on the
data type
![Page 337: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/337.jpg)
Schema
Alternative way to specify column definitions for data used in EE jobs
Written in a plain text file
Can be written as a partial record definition
Can be imported into the DataStage repository
![Page 338: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/338.jpg)
Creating a Schema
Using a text editor– Follow correct syntax for definitions– OR
Import from an existing data set or file set– On DataStage Manager import > Table Definitions >
Orchestrate Schema Definitions– Select checkbox for a file with .fs or .ds
![Page 339: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/339.jpg)
Importing a Schema
Schema location can be on the server or local
work station
![Page 340: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/340.jpg)
Data Types
Date
Decimal
Floating point
Integer
String
Time
Timestamp
Vector
Subrecord
Raw
Tagged
![Page 341: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/341.jpg)
Runtime Column Propagation
DataStage EE is flexible about meta data. It can cope with the situation where meta data isn’t fully defined. You can define part of your schema and specify that, if your job encounters extra columns that are not defined in the meta data when it actually runs, it will adopt these extra columns and propagate them through the rest of the job. This is known as runtime column propagation (RCP).
RCP is always on at runtime.
Design and compile time column mapping enforcement.– RCP is off by default.– Enable first at project level. (Administrator project properties)– Enable at job level. (job properties General tab)– Enable at Stage. (Link Output Column tab)
![Page 342: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/342.jpg)
Enabling RCP at Project Level
![Page 343: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/343.jpg)
Enabling RCP at Job Level
![Page 344: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/344.jpg)
Enabling RCP at Stage Level
Go to output link’s columns tab
For transformer you can find the output links columns tab by first going to stage properties
![Page 345: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/345.jpg)
Using RCP with Sequential Stages
To utilize runtime column propagation in the sequential stage you must use the “use schema” option
Stages with this restriction:– Sequential– File Set– External Source– External Target
![Page 346: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/346.jpg)
Runtime Column Propagation
When RCP is Disabled– DataStage Designer will enforce Stage Input
Column to Output Column mappings.– At job compile time modify operators are
inserted on output links in the generated osh.
![Page 347: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/347.jpg)
Runtime Column Propagation
When RCP is Enabled– DataStage Designer will not enforce mapping
rules.– No Modify operator inserted at compile time.– Danger of runtime error if column names
incoming do not match column names outgoing link – case sensitivity.
![Page 348: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/348.jpg)
Exercise
Complete exercises 11-1 and 11-2
![Page 349: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/349.jpg)
Module 12
Job Control Using the Job Sequencer
![Page 350: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/350.jpg)
Objectives
Understand how the DataStage job sequencer works
Use this understanding to build a control job to run a sequence of DataStage jobs
![Page 351: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/351.jpg)
Job Control Options
Manually write job control– Code generated in Basic– Use the job control tab on the job properties page– Generates basic code which you can modify
Job Sequencer– Build a controlling job much the same way you build
other jobs– Comprised of stages and links– No basic coding
![Page 352: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/352.jpg)
Job Sequencer
Build like a regular job
Type “Job Sequence”
Has stages and links
Job Activity stage represents a DataStage job
Links represent passing control
Stages
![Page 353: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/353.jpg)
Example
Job Activity stage – contains
conditional triggers
![Page 354: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/354.jpg)
Job Activity Properties
Job parameters to be passed
Job to be executed – select from dropdown
![Page 355: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/355.jpg)
Job Activity Trigger
Trigger appears as a link in the diagram
Custom options let you define the code
![Page 356: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/356.jpg)
Options
Use custom option for conditionals– Execute if job run successful or warnings only
Can add “wait for file” to execute
Add “execute command” stage to drop real tables and rename new tables to current tables
![Page 357: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/357.jpg)
Job Activity With Multiple Links
Different links having different
triggers
![Page 358: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/358.jpg)
Sequencer Stage
Can be set to all or any
Build job sequencer to control job for the collections application
![Page 359: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/359.jpg)
Notification
Notification Stage
![Page 360: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/360.jpg)
Notification Activity
![Page 361: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/361.jpg)
Sample DataStage log from Mail Notification
Sample DataStage log from Mail Notification
![Page 362: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/362.jpg)
E-Mail Message
Notification Activity Message
![Page 363: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/363.jpg)
Exercise
Complete exercise 12-1
![Page 364: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/364.jpg)
Module 13
Testing and Debugging
![Page 365: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/365.jpg)
Objectives
Understand spectrum of tools to perform testing and debugging
Use this understanding to troubleshoot a DataStage job
![Page 366: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/366.jpg)
Environment Variables
![Page 367: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/367.jpg)
Parallel Environment Variables
![Page 368: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/368.jpg)
Environment VariablesStage Specific
![Page 369: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/369.jpg)
Environment Variables
![Page 370: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/370.jpg)
Environment VariablesCompiler
![Page 371: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/371.jpg)
Typical Job Log Messages:
Environment variables
Configuration File information
Framework Info/Warning/Error messages
Output from the Peek Stage
Additional info with "Reporting" environments
Tracing/Debug output
– Must compile job in trace mode– Adds overhead
The Director
![Page 372: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/372.jpg)
• Job Properties, from Menu Bar of Designer• Director will
prompt you before eachrun
Job Level Environmental Variables
![Page 373: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/373.jpg)
Troubleshooting
If you get an error during compile, check the following:
Compilation problems– If Transformer used, check C++ compiler, LD_LIRBARY_PATH– If Buildop errors try buildop from command line– Some stages may not support RCP – can cause column mismatch .– Use the Show Error and More buttons– Examine Generated OSH– Check environment variables settings
Very little integrity checking during compile, should run validate from Director.
Highlights source of error
![Page 374: DataStage Enterprise Edition. Proposed Course Agenda Day 1 –Review of EE Concepts –Sequential Access –Best Practices –DBMS as Source Day 2 –EE Architecture.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649e0b5503460f94af3ee7/html5/thumbnails/374.jpg)
Generating Test Data
Row Generator stage can be used– Column definitions– Data type dependent
Row Generator plus lookup stages provides good way to create robust test data from pattern files