Imp Datastage New

7/25/2019 Imp Datastage New

1/158

All Datastage Stages

Datastage parallel stages groups

DataStage and QualityStage stages are grouped into the following logical sections:

General objects

Data Quality Stages

Database connectors

Development and Debug stages

File stages

Processing stages

Real ime stages

Restructure Stages

Se!uence activities

Please refer to the list below for a description of the stages used in DataStage and QualityStage"#e classified all stages in order of importancy and fre!uency of use in real$life deployments %andalso on certification e&ams'" (lso) the most widely used stages are mar*ed bold or there is a lin*to a subpage available with a detailed description with e&amples"

DataStage and QualityStage parallel stages and activities


2/158

General elements

Linkindicates a flow of the data" here are three main types of lin*s in Datastage: stream)reference and loo*up"

Container%can be private or shared' $ the main outcome of having containers is to

simplify visually a comple& datastage job design and *eep the design easy to understand"

Annotationis used for adding floating datastage job notes and descriptions on a job

canvas" (nnotations provide a great way to document the +, process and helpunderstand what a given job does"

Description (nnotation shows the contents of a job description field" -ne description

annotation is allowed in a datastage job"

Debug and development stages

Row generatorproduces a set of test data which fits the specified metadata %can be

random or cycled through a specified list of values'" .seful for testing and development"/lic* herefor more""

Column generatoradds one or more column to the incoming flow and generates test

data for this column"

Peekstage prints record column values to the job log which can be viewed in Director" 0t

can have a single input lin* and multiple output lin*s"/lic* herefor more""
http://mydatastage-notes.blogspot.in/2014/09/dummy-data-generation-using-row.htmlhttp://mydatastage-notes.blogspot.in/2014/09/peek-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/peek-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/dummy-data-generation-using-row.html


3/158

Sample stage samples an input data set" -perates in two modes: percent mode and period

mode"

1ead selects the first 2 rows from each partition of an input data set and copies them to

an output data set"

ail is similiar to the 1ead stage" 0t select the last 2 rows from each partition"

#rite Range 3ap writes a data set in a form usable by the range partitioning method"

Processing stages

Aggregatorjoins data vertically by grouping incoming data stream and calculating

summaries %sum) count) min) ma&) variance) etc"' for each group" he data can begrouped using two methods: hash table or pre$sort" /lic* herefor more""
http://mydatastage-notes.blogspot.in/p/aggregatorstage.htmlhttp://mydatastage-notes.blogspot.in/p/aggregatorstage.html


4/158

Copy$ copies input data %a single stream' to one or more output data flows

FPstage uses FP protocol to transfer data to a remote machine

Filterfilters out records that do not meet specified re!uirements"/lic* herefor more""

Funnelcombines mulitple streams into one" /lic*herefor more""

!oincombines two or more inputs according to values of a *ey column%s'" Similiar

concept to relational D43S SQ, join %ability to perform inner) left) right and full outerjoins'" /an have 5 left and multiple right inputs %all need to be sorted' and producessingle output stream %no reject lin*'" /lic*herefor more""

Lookupcombines two or more inputs according to values of a *ey column%s'" ,oo*up

stage can have 5 source and multiple loo*up tables" Records don6t need to be sorted andproduces single output stream and a reject lin*" /lic*herefor more""

"ergecombines one master input with multiple update inputs according to values of a

*ey column%s'" (ll inputs need to be sorted and unmatched secondary entries can becaptured in multiple reject lin*s" /lic*herefor more""

"odi#ystage alters the record schema of its input dataset" .seful for renaming columns)

non$default data type conversions and null handling

Remove duplicatesstage needs a single sorted data set as input" 0t removes all duplicate

records according to a specification and writes to a single output

Slowly C$anging Dimensionautomates the process of updating dimension tables) where

the data changes in time" 0t supports S/D type 5 and S/D type 7"/lic* herefor more""

Sortsorts input columns"/lic* herefor more""

rans#ormerstage handles e&tracted data) performs data validation) conversions and

loo*ups"/lic* herefor more""

/hange /apture $ captures before and after state of two input data sets and outputs a

single data set whose records represent the changes made"

/hange (pply $ applies the change operations to a before data set to compute an after

data set" 0t gets data from a /hange /apture stage

Difference stage performs a record$by$record comparison of two input data sets and

outputs a single data set whose records represent the difference between them" Similiar to/hange /apture stage"
http://mydatastage-notes.blogspot.in/p/datastagestages.htmlhttp://mydatastage-notes.blogspot.in/p/funnelstage.htmlhttp://mydatastage-notes.blogspot.in/p/funnelstage.htmlhttp://mydatastage-notes.blogspot.in/p/funnelstage.htmlhttp://mydatastage-notes.blogspot.in/p/multiple-join-stages-to-join-three.htmlhttp://mydatastage-notes.blogspot.in/p/multiple-join-stages-to-join-three.htmlhttp://mydatastage-notes.blogspot.in/p/multiple-join-stages-to-join-three.htmlhttp://mydatastage-notes.blogspot.in/p/lookupstage.htmlhttp://mydatastage-notes.blogspot.in/p/lookupstage.htmlhttp://mydatastage-notes.blogspot.in/p/lookupstage.htmlhttp://mydatastage-notes.blogspot.in/p/mergestage_5264.htmlhttp://mydatastage-notes.blogspot.in/p/mergestage_5264.htmlhttp://mydatastage-notes.blogspot.in/p/mergestage_5264.htmlhttp://mydatastage-notes.blogspot.in/p/what-is-scd-in-datastage-types-of-scd.htmlhttp://mydatastage-notes.blogspot.in/p/how-to-create-group-id-in-sort-stage-in.htmlhttp://mydatastage-notes.blogspot.in/p/surrogate-key-in-datastage-surrogate.htmlhttp://mydatastage-notes.blogspot.in/p/datastagestages.htmlhttp://mydatastage-notes.blogspot.in/p/funnelstage.htmlhttp://mydatastage-notes.blogspot.in/p/multiple-join-stages-to-join-three.htmlhttp://mydatastage-notes.blogspot.in/p/lookupstage.htmlhttp://mydatastage-notes.blogspot.in/p/mergestage_5264.htmlhttp://mydatastage-notes.blogspot.in/p/what-is-scd-in-datastage-types-of-scd.htmlhttp://mydatastage-notes.blogspot.in/p/how-to-create-group-id-in-sort-stage-in.htmlhttp://mydatastage-notes.blogspot.in/p/surrogate-key-in-datastage-surrogate.html


5/158

/hec*sum $ generates chec*sum from the specified columns in a row and adds it to the

stream" .sed to determine if there are differencies between records"

/ompare performs a column$by$column comparison of records in two presorted input

data sets" 0t can have two input lin*s and one output lin*"

+ncode encodes data with an encoding command) such as g8ip"

Decode decodes a data set previously encoded with the +ncode Stage"

+&ternal Filter permits speicifying an operating system command that acts as a filter on

the processed data

Generic stage allows users to call an -S1 operator from within DataStage stage with

options as re!uired"

Pivot +nterprise is used for hori8ontal pivoting" 0t maps multiple columns in an input row

to a single column in multiple output rows" Pivoting data results in obtaining a datasetwith fewer number of columns but more rows"

Surrogate 9ey Generator generates surrogate *ey for a column and manages the *ey

source"

Switch stage assigns each input row to an output lin* based on the value of a selector

field" Provides a similiar concept to the switch statement in most programminglanguages"

/ompress $ pac*s a data set using a G0P utility %or compress command on

,02.;


6/158

Se%uential #ileis used to read data from or write data to one or more flat %se!uential'

files"/lic* herefor more""%=="'

Data Setstage allows users to read data from or write data to a dataset" Datasets are

operating system files) each of which has a control file %"ds e&tension by default' and one

or more data files %unreadable by other applications'" /lic*herefor more info%=="'

File Setstage allows users to read data from or write data to a fileset" Filesets are

operating system files) each of which has a control file %"fs e&tension' and data files".nli*e datasets) filesets preserve formatting and are readable by other applications"

Comple& #lat #ileallows reading from comple& file structures on a mainframe machine)

such as 3>S data sets) header and trailer structured files) files that contain multiplerecord types) QS(3 and >S(3 files"/lic* herefor more info"

+&ternal Source $ permits reading data that is output from multiple source programs"

+&ternal arget $ permits writing data to one or more programs"

,oo*up File Set is similiar to FileSet stage" 0t is a partitioned hashed file which can be

used for loo*ups"

Database stages

'racle (nterpriseallows reading data from and writing data to an -racle database

%database version from ?"& to 5@g are supported'"
http://mydatastage-notes.blogspot.in/p/sequentionalstage.htmlhttp://mydatastage-notes.blogspot.in/p/sequentionalstage.htmlhttp://mydatastage-notes.blogspot.in/2013/04/dataset.htmlhttp://mydatastage-notes.blogspot.in/2013/04/dataset.htmlhttp://mydatastage-notes.blogspot.in/2013/04/dataset.htmlhttp://mydatastage-notes.blogspot.in/2013/04/complex-flat-file-stages.htmlhttp://mydatastage-notes.blogspot.in/p/sequentionalstage.htmlhttp://mydatastage-notes.blogspot.in/2013/04/dataset.htmlhttp://mydatastage-notes.blogspot.in/2013/04/complex-flat-file-stages.html


7/158

'D)C (nterprisepermits reading data from and writing data to a database defined as an

-D4/ source" 0n most cases it is used for processing data from or to 3icrosoft (ccessdatabases and 3icrosoft +&cel spreadsheets"

D)*+,D) (nterprisepermits reading data from and writing data to a D47 database"

eradatapermits reading data from and writing data to a eradata data warehouse" hree

eradata stages are available: eradata connector) eradata +nterprise and eradata3ultiload

SQLServer (nterprisepermits reading data from and writing data to 3icrosoft SQ,l

Server 7@@A amd 7@@B database"

Sybasepermits reading data from and writing data to Sybase databases"

Stored procedure stage supports -racle) D47) Sybase) eradata and 3icrosoft SQ,

Server" he Stored Procedure stage can be used as a source %returns a rowset') as a target%pass a row to a stored procedure to write' or a transform %to invo*e procedure processingwithin the database'"

3S -,+D4 helps retrieve information from any type of information repository) such as a

relational source) an 0S(3 file) a personal database) or a spreadsheet"

Dynamic Relational Stage %Dynamic D43S) DRS stage' is used for reading from or

writing to a number of different supported relational D4 engines using native interfaces)such as -racle) 3icrosoft SQ, Server) D47) 0nformi& and Sybase"

0nformi& %/,0 or ,oad'

D47 .D4 %(P0 or ,oad'

/lassic federation

Red4ric* ,oad

2ete88a +nterpise

i#ay +nterprise


8/158

Real ime stages

-"L .nputstage ma*es it possible to transform hierarchical ;3, data to flat relational

data sets

-"L 'utputwrites tabular data %relational tables) se!uential files or any datastage data

streams' to ;3, structures

-"L rans#ormerconverts ;3, documents using an ;S, stylesheet

/ebsp$ere "Qstages provide a collection of connectivity options to access 043

#ebSphere 3Q enterprise messaging systems" here are two 3Q stage types available inDataStage and QualityStage: #ebSphere 3Q connector and #ebSphere 3Q plug$instage"

#eb services client

#eb services transformer

Cava client stage can be used as a source stage) as a target and as a loo*up" he java

pac*age consists of three public classes: com"ascentialsoftware"jds"/olumn)com"ascentialsoftware"jds"Row) com"ascentialsoftware"jds"Stage

Cava transformer stage supports three lin*s: input) output and reject"

#0SD 0nput $ 0nformation Services 0nput stage

#0SD -utput $ 0nformation Services -utput stage


9/158

Restructure stages

/olumn e&port stage e&ports data from a number of columns of different data types into a

single column of data type ustring) string) or binary" 0t can have one input lin*) one outputlin* and a rejects lin*" /lic* herefor more""

/olumn import complementary to the /olumn +&port stage" ypically used to divide data

arriving in a single column into multiple columns"

/ombine records stage combines rows which have identical *eys) into vectors of

subrecords"

3a*e subrecord combines specified input vectors into a vector of subrecords whose

columns have the same names and data types as the original vectors"

3a*e vector joins specified input columns into a vector of columns

Promote subrecord $ promotes input subrecord columns to top$level columns

Split subrecord $ separates an input subrecord field into a set of top$level vector columns

Split vector promotes the elements of a fi&ed$length vector to a set of top$level columns
http://mydatastage-notes.blogspot.in/2014/09/blog-post.htmlhttp://mydatastage-notes.blogspot.in/2014/09/blog-post.html


10/158

Data %uality QualityStage stages

0nvestigate stage analy8es data content of specified columns of each record from the

source file" Provides character and word investigation methods"

3atch fre!uency stage ta*es input from a file) database or processing stages and

generates a fre!uence distribution report"

32S $ multinational address standari8ation"

QualityStage ,egacy

Reference 3atch

Standari8e

Survive

.nduplicate 3atch

#(>+S $ worldwide address verification and enhancement system"


11/158

Se%uence activity stage types

!ob Activityspecifies a Datastage server or parallel job to e&ecute"

0oti#ication Activity$ used for sending emails to user defined recipients from within

Datastage

Se%uencerused for synchroni8ation of a control flow of multiple activities in a job

se!uence"

erminator Activitypermits shutting down the whole se!uence once a certain situation

occurs"

/ait #or #ile Activity$ waits for a specific file to appear or disappear and launches the

processing"

+nd,oop (ctivity

+&ception 1andler

+&ecute /ommand

2ested /ondition

Routine (ctivity

Start,oop (ctivity

.ser>ariables (ctivity


12/158

Con#iguration #ile1

he Datastage configuration file is a master control file %a te&tfile which sits on the

server side' for jobs which describes the parallel system resources and architecture" he

configuration file provides hardware configuration for supporting such architectures

as S3P %Single machine with multiple /P. ) shared memory and dis*') Grid ) /luster or

3PP %multiple /P.) mulitple nodes and dedicated memory per node'" DataStage understands

the architecture of the system through this file"

his is one of the biggest strengths of Datastage" For cases in which you have changed

your processing configurations) or changed servers or platform) you will never have to worry

about it affecting your jobs since all the jobs depend on this configuration file for e&ecution"

Datastage jobs determine which node to run the process on) where to store the temporary data)where to store the dataset data) based on the entries provide in the configuration file" here is a

default configuration file available whenever the server is installed"

he configuration files have e&tension E"aptE" he main outcome from having the configuration

file is to separate software and hardware configuration from job design" 0t allows changing

hardware and software resources without changing a job design" Datastage jobs can point to

different configuration files by using job parameters) which means that a job can utili8e different

hardware architectures without being recompiled"

he configuration file contains the different processing nodes and also specifies the dis*

space provided for each processing node which are logical processing nodes that are specified in

the configuration file" So if you have more than one /P. this does not mean the nodes in your

configuration file correspond to these /P.s" 0t is possible to have more than one logical node on

a single physical node" 1owever you should be wise in configuring the number of logical nodes

on a single physical node" 0ncreasing nodes) increases the degree of parallelism but it does not

necessarily mean better performance because it results in more number of processes" 0f your

underlying system should have the capability to handle these loads then you will be having a

very inefficient configuration on your hands"

1. (P/-2F0GF0,+ is the file using which DataStage determines the configuration file (one

can have many configuration files for a project)to be used" 0n fact) this is what is generally usedin production" 1owever) if this environment variable is not defined then how DataStagedetermines which file to use ??

5" 0f the (P/-2F0GF0,+ environment variable is not defined then DataStage loo* for defaultconfiguration file %config"apt' in following path:

5" /urrent wor*ing directory"7" 02S(,,D0R


13/158

7" Define 2ode in configuration file( 2ode is a logical processing unit" +ach node in a configuration file is distinguished by a virtualname and defines a number and speed of /P.s) memory availability) page and swap space)networ* connectivity details) etc"

3. hat are the different options a logical node can have in the configuration file?5" #astname 2he fastname is the physical node name that stages use to open connections for high

volume data transfers" he attribute of this option is often the networ* name" ypically) you canget this name by using .ni& command Huname $nI"

7" pools 22ame of the pools to which the node is assigned to" 4ased on the characteristics of theprocessing nodes you can group nodes into set of pools"

5" ( pool can be associated with many nodes and a node can be part of many pools"7" ( node belongs to the default pool unless you e&plicitly specify apools list for it) and omit the

default pool name %JK' from the list"L" ( parallel job or specific stage in the parallel job can be constrained to run on a pool %set of

processing nodes'"5" 0n case job as well as stage within the job are constrained to run on specific processing nodesthen stage will run on the node which is common to stage as well as job"

L" resourceM resource resource!type "location# $%pools "dis&!pool!name#' resourceresource!type "value#" resourcetype can becanonicalhostname %#hich ta*es !uoted ethernetname of a node in cluster that is unconnected to /onductor node by the hight speednetwor*"' or dis* %o read


14/158

4asically the configuration file contains the different processing nodes and also

specifies the dis* space provided for each processing node" 2ow when we tal* about processing

nodes you have to remember that these can are logical processing nodes that are specified in the

configuration file" So if you have more than one /P. this does not mean the nodes in your

configuration file correspond to these /P.s" 0t is possible to have more than one logical node on

a single physical node" 1owever you should be wise in configuring the number of logical nodes

on a single physical node" 0ncreasing nodes) increases the degree of parallelism but it does not

necessarily mean better performance because it results in more number of processes" 0f your

underlying system should have the capability to handle these loads then you will be having a

very inefficient configuration on your hands"

2ow lets try our hand in interpreting a configuration file" ,ets try the below sample"

node Jnode5

fastname JS>R5

pools JK

resource dis* J/:


15/158

between those two logical nodes" 2odeL on the other hand has its own dis* and scratch dis*

space"

Pools M Pools allow us to associate different processing nodes based on their functions and

characteristics" So if you see an entry other entry li*e Jnode@K or other reserved node pools li*e

JsortK)Kdb7K)etc"" hen it means that this node is part of the specified pool" ( node will be bydefault associated to the default pool which is indicated by JK" 2ow if you loo* at nodeL can see

that this node is associated to the sort pool" his will ensure that that the sort stage will run only

on nodes part of the sort pool"

Resource dis* $ his will specify Specifies the location on your server where the processing

node will write all the data set files" (s you might *now when Datastage creates a dataset) the

file you see will not contain the actual data" he dataset file will actually point to the place where

the actual data is stored" 2ow where the dataset data is stored is specified in this line"

Resource scratchdis* M he location of temporary files created during Datastage processes) li*e

loo*ups and sorts will be specified here" 0f the node is part of the sort pool then the scratch dis*can also be made part of the sort scratch dis* pool" his will ensure that the temporary filescreated during sort are stored only in this location" 0f such a pool is not specified then Datastagedetermines if there are any scratch dis* resources that belong to the default scratch dis* pool onthe nodes that sort is specified to run on" 0f this is the case then this space will be used"

Below is the sample diagram for 1 node and 4 node resource allocation:

SAMPLE CONFIGURATION FILES

Configuration file for a simple SMP

( basic configuration file for a single machine) two node server %7$/P.' is shown below" hefile defines 7 nodes %node5 and node7' on a single devserver %0P address might be provided aswell instead of a hostname' with L dis* resources %d5 ) d7 for the data and Scratch as scratch


16/158

space'"

he configuration file is shown below:

node Enode5E fastname EdevE pool EE resource dis* E


17/158

node EnodeLE fastname EdevLE pool EE EnLE EsLE

resource dis* E


18/158

The stage executes in parallel mode

by default if reading multiple files but executes sequentially if it is only reading one file.

In order read a sequential file datastage needs to know about the format of the file.

If you are reading a delimited file you need to specify delimiter in the format tab.

Reading Fixed width File:

ouble click on the sequential file stage and go to properties tab.

Source:

File:!i"e the file name including path

Read Method:#hether to specify filenames explicitly or use a file pattern.

Important Options:

First Line is Column Names:If set true$ the first line of a file contains column names on writing and

is ignored on reading.

Keep File Partitions:Set True to partition the read data set according to the organi%ation of the input

file&s'.

Reect Mode:(ontinue to simply discard any re)ected rows* Fail to stop if any row is re)ected* +utput

to send re)ected rows down a re)ect link.

For fixed,width files$ howe"er$ you can configure the stage to beha"e differently:

- ou can specify that single files can be read by multiple nodes. This can impro"e performance oncluster systems.

- ou can specify that a number of readers run on a single node. This means$ for example$ that a

single file can be partitioned as it is read.

These two options are mutually exclusi"e.

Scenario !:

/eading file sequentially.


19/158

Scenario ":

/ead From 0ultiple 1odes 2 es

+nce we add /ead From 0ultiple 1ode 2 es then stage by default executes in 3arallel mode.


20/158

If you run the )ob with abo"e configuration it will abort with following fatal error.

sff4SourceFile: The multinode option requires fixed length records.&That means you can use thisoption to read fixed width files only'

In order to fix the abo"e issue go the format tab and add additions parameters as shown below.

1ow )ob finished successfully and

please below datastage monitor for performance impro"ements compare with reading from single

node.

Scenario #:/ead elimted file with By 5dding 1umber of /eaders 3ernode instead of multinode

option to impro"e the read performance and once we add this option sequential file stage will execute

in default parallel mode.


21/158

If we are reading from and writing to fixed width file it is always good practice to add

53T4ST/I1!435(65/ atastage 7n" "ariable and assign 898 as default "alue then it will pad with

spaces $otherwise datastage will pad null "alue&atastage efault padding character'.

5lways ;eep /e)ect 0ode 2 Fail to make sure datastage )ob will fail if we get from format from source

systems.

Se$uential File %est Per&ormance Settings'(ips

Important Scenarios using se$uential &ilestage:

Sequential file with uplicate /ecords

Splitting input files into three different files using lookup

Sequential fle with Duplicate Records

Sequential fle with Duplicate Records:

A sequential fle has 8 records with one column, below are the values in the column

separated by space,

1 1 2 2 ! " #

$n a parallel %ob a&ter readin' the sequential fle 2 more sequential fles should be

created, one with duplicate records and the other without duplicates(

)ile 1 records separated by space: 1 1 2 2

)ile 2 records separated by space: ! " #*ow will you do it

Sol1:

1( $ntroduce a sort sta'e very ne+t to sequential fle,

2( Select a property -ey chan'e column. in sort sta'e and you can assi'n /0nique
http://mydatastage-notes.blogspot.in/2014/09/sequential-file-best-performance.htmlhttp://mydatastage-notes.blogspot.in/2014/09/sequential-file-best-performance.htmlhttp://mydatastage-notes.blogspot.in/2014/09/sequential-file-with-duplicate-records.htmlhttp://mydatastage-notes.blogspot.in/2014/09/splitting-input-files-into-three.htmlhttp://mydatastage-notes.blogspot.in/2014/09/sequential-file-best-performance.htmlhttp://mydatastage-notes.blogspot.in/2014/09/sequential-file-with-duplicate-records.htmlhttp://mydatastage-notes.blogspot.in/2014/09/splitting-input-files-into-three.html


22/158

or 10 duplicate or viceversa as you wish(

( ut a flter or trans&ormer ne+t to it and now you have unique in 1 lin- and

duplicates in other lin-(

Sol2:Should chec- thou'h.

)irst o& all ta-e a source fle then connect it to copy sta'e( 3hen, 1 lin- is connected

to the a''re'ator sta'e and another lin- is connected to the loo-up sta'e or %oin

sta'e( $n A''re'ator sta'e usin' the count &unction, 4alculate how many times the

values are repeatin' in the -ey column(

A&ter calculatin' that it is connected to the flter sta'e where we flter the cnt51cnt

is new column &or repeatin' rows.(

3hen the o6p &rom the flter is connected to the loo-up sta'e as re&erence( $n the

loo-up sta'e 79 )A$7R5R;43(

3hen place two output lin-s &or the loo-up, ne collects the non0repeated values

and another collects the repeated values in re%ect lin-(

Splitting input #iles into t$ree di##erent #iles using lookup 1

Splitting input files into three different files0nput file ( contains5

7LUAXYB?5@

input file 4 contains

XYB?5@55575L


23/158

5U5A

-utput file ; contains5

7LUA

-utput file y containsXYB?5@

-utput file 8 contains55575L5U5A

Possible solution:

/hange capture stage" First) i am going to use source as ( and refrerence as 4 both of them areconnected to /hange capture stage" From) change capture stage it connected to filter stage andthen targets ;)N and " 0n the filter stage: *eychange column7 it goes to ; V5)7)L)U)AW9eychange column@ it goes to N VX)Y)B)?)5@W 9eychange column5 it goes to V55)57)5L)5U)5AW

Solution 7:/reate one p& job"src file se!5 %5)7)L)U)A)X)Y)B)?)5@'5st l*p se!7 %X)Y)B)?)5@)55)57)5L)5U)5A'o


24/158

Dataset 1

0nside a 0nfoSphere DataStage parallel job) data is moved around in data sets" hese

carry meta data with them) both column definitions and information about t$e con#igurationthat

was in effect when the data set was created" 0f for e&ample) you have a stage which limits

e&ecution to a subset of available nodes) and the data set was created by a stage using all nodes)

0nfoSphere DataStage can detect that the data will need repartitioning"

0f re!uired) data sets can be landed as persistent data sets) represented by a Data Set

stage "his is the most efficient way of moving data between lin*ed jobs" Persistent data sets are

stored in a series of files lin*ed by a control file %note that you should not attempt to manipulate

these files using .20; tools such as R3 or 3>" (lways use the tools provided with 0nfoSphere

DataStage'"

there are the two groups of Datasets $ persistent and virtual"he first type) persistent Datasets are mar*ed with 45dse&tensions) while for second type) virtual

datasets 45ve&tension is reserved" %0t6s important to mention) that no Z"v files might be visible in

the .ni& file system) as long as they e&ist only virtually) while inhabiting R(3 memory"

+&tesion Z"v itself is characteristic strictly for -S1 $ the -rchestrate language of scripting'"

Further differences are much more significant" Primarily) persistent Datasets are being stored

in ,ni& #iles using internal Datastage (( #ormat) while virtual Datasets are never stored on

disk$ they do e&ist within lin*s) and in ++ format) but in R(3 memory" Finally)

persistent Datasets are readable and rewriteable wit$ t$e DataSet Stage) and virtual

Datasets 6 mig$t be passed t$roug$ in memory"

( data set comprises a descriptor file and a number of other files that are added as the data set

grows" hese files are stored on multiple dis*s in your system" ( data set is organi8ed in terms

of partitionsand segments5

+ach partition of a data set is stored on a single processing node" +ach data segment contains all

the records written by a single job" So a segment can contain files from many partitions) and a

partition has files from many segments"

Firstly) as a single Dataset contains multiple records) it is obvious that all of them must undergo

the same processes and modifications" 0n a word) all of them must go through the same

successive stage"

Secondly) it should be e&pected that different Datasets usually have different schemas) therefore

they cannot be treated commonly"

(lias names of Datasets are


25/158

5' -rchestrate File

7' -perating System file

(nd Dataset is multiple files" hey are

a' Descriptor File

b' Data File

c' /ontrol file

d' 1eader Files

0n Descriptor File) we can see the Schema details and address of data"

0n Data File) we can see the data in 2ative format"

(nd /ontrol and 1eader files resides in -perating System"

Starting a Dataset "anager1

Choose Tools Data Set Management, a Browse Files dialog o! appears:

1. Navigate to the directory containing the data set you want to manage. By convention, data set

files have the suffix .ds.

2. Select the data set you want to manage and click OK. he !ata Set "iewer a##ears. $rom here

you can co#y or delete the chosen data set. %ou can also view its schema &column definitions'

or the data it contains.

75 rans#ormer Stage 1

85 Various functionalities of Transformer Stage:5.

6. Generating surrogate key using Transformer7.

8. Transformer stage using stripwhitespaces
http://mydatastage-notes.blogspot.in/2013/05/generating-sequence-number-in-datastage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/transformer-stage-using.htmlhttp://mydatastage-notes.blogspot.in/2013/05/generating-sequence-number-in-datastage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/transformer-stage-using.html


26/158

9.

!. T"#$S%&"'(" ST#G( T& %)*T(" T+( ,#T#.

-. T"#$S%&"'(" ST#G( S)$G /#,ST")$G %$0T)&$1. 2. 0&$0#T($#T( ,#T# S)$G T"#$S%&"'(" ST#G(5.6. %)(*, %$0T)&$ )$ T"#$S%&"'(" ST#G(7.

8. T"#$S%&"'(" ST#G( 3)T+ S)'/*( (4#'/*(9.

-!. T"#$S%&"'(" ST#G( %&" ,(/#"T'($T 3)S( ,#T#-.

--. +&3 T& 0&$V("T "&3S )$T& T+( 0&*'$S )$ ,#T#ST#G(-1.

-2. S&"T ST#G( #$, T"#$S%&"'(" ST#G( 3)T+ S#'/*( ,#T# (4#'/*(-5.

-6. %)(*, %$0T)&$ )$ T"#$S%&"'(" ST#G( 3)T+ (4#'/*(-7.-8. ")G+T #$, *(%T %$0T)&$S )$ T"#$S%&"'(" ST#G( 3)T+ (4#'/*(

29. S&'( &T+(" )'/&"T#$T %$0T)&$S:1!.

+ow to perform aggregation using a Transformer1.

,ate an time string functions1-.

$ull hanling functions11.

Vector functionTransformer12.

Type conersion functionsTransformer15.

+ow to conert a single row into multiple rows

,ata Stage Transformer sage Guielines

(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((

Sort Stage1

S&"T ST#G( /"&/("T)(S:

S&"T ST#G( 3)T+ T3& ( V#*(S
http://mydatastage-notes.blogspot.in/2013/03/transformer-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/transformer-stage-using-padstring.htmlhttp://mydatastage-notes.blogspot.in/2014/09/concatenate-data-using-transformer-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/field-function-in-transformer-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/find-totalscore-and-percentage-using.htmlhttp://mydatastage-notes.blogspot.in/2014/09/transformer-stage-for-department-wise.htmlhttp://mydatastage-notes.blogspot.in/2014/09/how-to-convert-rows-into-columns-in.htmlhttp://mydatastage-notes.blogspot.in/2014/09/sort-stage-and-transformer-stage-with.htmlhttp://mydatastage-notes.blogspot.in/2014/09/field-function-in-transformer-stage_1.htmlhttp://mydatastage-notes.blogspot.in/2014/09/right-and-left-functions-in-transformer.htmlhttp://mydatastage-notes.blogspot.in/2013/06/how-to-perform-aggregation-using.htmlhttp://mydatastage-notes.blogspot.in/2013/04/date-and-time-string-functions.htmlhttp://mydatastage-notes.blogspot.in/2013/04/null-handling-functions.htmlhttp://mydatastage-notes.blogspot.in/2013/04/vector-function-transformer.htmlhttp://mydatastage-notes.blogspot.in/2013/04/type-conversion-functions-transformer.htmlhttp://mydatastage-notes.blogspot.in/2013/06/how-to-convert-single-row-into-multiple.htmlhttp://mydatastage-notes.blogspot.in/2013/04/data-stage-transformer-usage-guidelines.htmlhttp://mydatastage-notes.blogspot.in/2013/03/transformer-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/transformer-stage-using-padstring.htmlhttp://mydatastage-notes.blogspot.in/2014/09/concatenate-data-using-transformer-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/field-function-in-transformer-stage.htmlhttp://mydatastage-notes.blogspot.in/2014/09/find-totalscore-and-percentage-using.htmlhttp://mydatastage-notes.blogspot.in/2014/09/transformer-stage-for-department-wise.htmlhttp://mydatastage-notes.blogspot.in/2014/09/how-to-convert-rows-into-columns-in.htmlhttp://mydatastage-notes.blogspot.in/2014/09/sort-stage-and-transformer-stage-with.htmlhttp://mydatastage-notes.blogspot.in/2014/09/field-function-in-transformer-stage_1.htmlhttp://mydatastage-notes.blogspot.in/2014/09/right-and-left-functions-in-transformer.htmlhttp://mydatastage-notes.blogspot.in/2013/06/how-to-perform-aggregation-using.htmlhttp://mydatastage-notes.blogspot.in/2013/04/date-and-time-string-functions.htmlhttp://mydatastage-notes.blogspot.in/2013/04/null-handling-functions.htmlhttp://mydatastage-notes.blogspot.in/2013/04/vector-function-transformer.htmlhttp://mydatastage-notes.blogspot.in/2013/04/type-conversion-functions-transformer.htmlhttp://mydatastage-notes.blogspot.in/2013/06/how-to-convert-single-row-into-multiple.htmlhttp://mydatastage-notes.blogspot.in/2013/04/data-stage-transformer-usage-guidelines.html


27/158

+&3 T& 0"(#T( G"&/ ), )$ S&"T ST#G( )$ ,#T#ST#G(

Group is are create in two ifferent ways.

3e can create group i;s


28/158

#n ,rag an ,rop in &utput

Group );s will


29/158

!12ash:


30/158

)f we hae a ata as


31/158

1@pinky@!2@lin@-!5@Him@!6@emy@1!7@pom@!8@Hem@-!9@in@1!!@en@-!

Take Bo< ,esign as


32/158

#n Take seCuential file to loa into the target.

That is we can take like this

SeC.%ile#ggregatorSeC.%ile

"ea the ata in SeC.%ie

#n in #ggregator Stage )n /roperties Select Group D,ept$o

#n Select eAsal in 0olumn for calculations

i.e


33/158

(nd we need to get the same multiple times records into the one target"(nd single records not repeated with respected to dno need to come to one target"

3y !uestion:

0 placed 7 se! files) one with count 5 and other with count O5) 5 se! file output wasthis :dno count5@ L7@ 7

7 se! file output was li*e this:

dno countU@ 5L@ 5

0nstead 0 wanted output li*e this:dno name5@ siva5@ ram5@ sam7@ tom7@ tiny

7nd output file should be:dno nameL@ emyU@ remo

!oin Stage1

')#T"(#E *O"N STAGES TO *O"N T%REE TA+#ES,

)f we hae three ta


34/158

!!-@merlin@tester@-!!!1@Honathan@eeloper@!!!2@morgan@tester@-!!!5@mary@tester@-!

softAcomA-eptAno@Aname@locAi!@eeloper@-!!-!@tester@1!!

softAcomA1locAi@aA@aA-!@mel


35/158

ou can *earn more on Boin Stage with eample here

*O"N STAGE $"T%O)T CO''ON KE- CO#)'N,

)f we like to Hoin the ta


36/158

)f we hae a Source ata as


37/158

"ea an *oa the


38/158

#e1tOuter *on,

#ll the recors from left ta


39/158

%ull &uter Boin:

#ll recors an all matching recors:


40/158

Lookup3Stage 1

Loo0up Stage:

The >ookup stage is most appropriate when the reference data for all lookup stages in a )ob

is small enough to fit into a"ailable physical memory. 7ach lookup reference requires a contiguous

block of shared memory. If the ata Sets are larger than a"ailable memory resources$ the ?+I1 or

07/!7 stage should be used.

>ookup stages do not require data on the input link or reference links to be sorted. Be aware$

though$ that large in,memory lookup tables will degrade performance because of their paging

requirements. 7ach record of the output data set contains columns from a source record plus columns

from all the corresponding lookup records where corresponding source and lookup records ha"e the

same "alue for the lookup key columns. (he loo0up 0e, columns do not ha*e to ha*e the same

names in the primar, and the re&erence lin0s.

The optional re)ect link carries source records that do not ha"e a corresponding entry in the

input lookup tables.

ou can also perform a range loo0up+which compares the "alue of a source column to a range of

"alues between two lookup table columns. If the source column "alue falls within the required range$ a

row is passed to the output link. 5lternati"ely$ you can compare the "alue of a lookup column to a

range of "alues between two source columns. /ange lookups must be based on column "alues$ not

constant "alues. 0ultiple ranges are supported.


41/158

There are some special partitioning considerations for >ookup stages. ou need to ensure that the data

being looked up in the lookup table is in the same partition as the input data referencing it. One wa,

o& doing this is to partition the loo0up ta-les using the 4ntire method.

Loo0up stage Con&iguration:4$ual loo0up


42/158

ou can specify what action need to perform if lookup fails.

Scenario=:Continue


43/158

(hoose entire partition on the reference link


44/158


45/158

Scenario:Fail


46/158

?o

b aborted with the following error:

stg5L0p+6: Failed a 0e, loo0up &or record " Ke, 7alues: C8S(OM4R5I3: #

Scenari8@:3rop


47/158

ScenarioA:Reect


48/158

If we select re)ect as lookup failure condition then we need to add re)ect link otherwise we get

compilation error.


49/158


50/158

Range Loo0up:

%usiness scenario:we ha"e input data with customer id and customer name and transaction date.#e

ha"e customer dimension table with customer address information.(ustomer can ha"e multiple

records with different start and acti"e dates and we want to select the record where incoming

transcation date falls between start and end date of the customer from dim table.

7x Input ata:

(


51/158


52/158

ou need to

specify return multiple rows from the reference link otherwise you will get following warning in the )ob

log.7"en though we ha"e two distinct rows base on customer4id$start4dt and end4dt columns but

datastage is considering duplicate rows based on customer4id key only.

stg5L0p+6: Ignoring duplicate entr, no &urther warnings will -e issued &or this ta-le

Compile and Run the o-:

Sce

nario :Specify range on reference link:


53/158

Thi

s concludes lookup stage configuration for different scenarios.

RANGE #OOK)( $"T% E&A'(#E "N ATASTAGE,

"ange *ook p is use to check the range of the recors from another ta


54/158

eAsal FDhsal

0lick &k

Than ,rag an ,rop the "eCuire columns into the output an click &k

Gie %ile name to the Target %ile.

Then 0ompile an "un the Bo< . That;s it you will get the reCuire &utput.

$hy Entre /artton s used n #OOK)( stage

(ntire partition has all ata across the noes So while matching?in lookup= the recors all ata shoul


55/158

The 0erge stage is a processing stage. It can ha"e any number of input links$ a single output

link$ and the same number of re)ect links as there are update input links.&according to S

documentation'

0erge stage combines a mster dataset with one or more update datasets based on the key

columns.the output record contains all the columns from master record plus any additional columns

from each update record that are required.

5 master record and update record will be merged only if both ha"e same key column "alues.

The data sets input to the 0erge stage must -e 0e, partitioned and sorted. This ensures

that rows with the same key column "alues are located in the same partition and will be processed by

the same node. It also minimi%es memory requirements because fewer rows need to be in memory at

any one time.

5s part of preprocessing your data for the 0erge stage$ you should also remo"e duplicate

records from the master data set. If you ha"e more than one update data set$ you must remo"e

duplicate records from the update data sets as well.

ookup stages$ the 0erge stage allows you to specify se"eral re)ect

links. ou can route update link rows that fail to match a master row down a re)ect link that is specific

for that link. ou must ha"e the same number of re)ect links as you ha"e update links. The >ink

+rdering tab on the Stage page lets you specify which update links send re)ected rows to which re)ect

links. ou can also specify whether to drop unmatched master rows$ or output them on the output

data link.

7xample :

0aster

dataset:

(


56/158

+ptions:


57/158

(o

mpile and run the )ob :

Scenario :

/emo"e a record from the updateds= and check the output:


58/158

(heck for the datastage warning in the )ob log as we ha"e selected #arn on unmatched masters 2

T/ook at the output and it is clear that merge stage

automatically dropped the duplicate record from master dataset.

Scenario A:5dded new updatedataset which

contains following data.


59/158

(


60/158

Scenario :add a duplicate row for customer4id2= in

updateds= dataset.

1ow we ha"e duplicate record both in master dataset

and updateds=./un the )ob and check the results and warnings in the )ob log.

1o change the results and merge stage automatically dropped the duplicate row.

Scenario E:modify a duplicate row for customer4id2= in updateds= dataset with %ipcode as D8E@8

instead of D8E8.

/un the )ob and check output results.


61/158

I ran the same )ob multiple times and found the

merge stage is taking first record coming as input from the updateds= and dropping the next records

with same customer id.

This post co"ered most of the merge scenarios.

5555555555555555555555555555555555555555555555555555555555555555555555555555555555555

Filter3Stage 1

Filter Stage:

Filter stage is a processing stage used to filter database based on filter condition.

The filter stage is configured by creating expression in the where clause.

Scenario=:(heck for empty "alues in the customer name field.#e are reading from sequential file and

hence we should check for empty "alue instead of null.


62/158


63/158

Scenario :(omparing incoming fields.check transaction date falls between strt4dt and end4dt and

filter those records.

Input ata:

(


64/158


65/158

5ctual +utput:

5ctual /e)ect ata:

Scenario @:7"aluating input column

data

ex:#here (


66/158

/e)ect :

This co"ers most filter stage scenarios.

!"#TER STAGE $"T% REA# T"'E E&A'(#E,

%ilter Stage is use to write the conitions on 0olumns.

3e can write 0onitions on any num


67/158

3e can take SeCuential file to rea the an filter stage for writing 0onitions.

#n ,ataset file to loa the ata into the Target.

,esign as follows:

SeC.%ile%ilter,ataset%ile

&pen SeCuential %ile #n

"ea the ata.

)n filter stage /roperties 3rite 0onition in 3here clause as

eAsalD--!!

Go to &utput ,rag an ,rop

0lick &k

Go to Target ,ataset file an gie some name to the file an that;s it

0ompile an "un

ou will get the reCuire output in Target file.

)f you are trying to write conitions on multiple columns

3rite conition in where clause

an gie output likeD?*ink orer num


68/158

Copy Stage 1

CO(- STAGE,

0opy Stage is one of the processing stage that hae one input an ;n; num


69/158

It operates in @ modes:

Continuous Funnelcombines records as they arri"e &i.e. no particular order'*

Sort Funnelcombines the input records in the order defined by one or more key fields*

Se$uencecopies all records from the first input data set to the output data set$ then all the records

from the second input data set$ etc.

Note:Metadata &or all inputs must -e identical.

Sort &unnel requires data must be sorted and partitioned by the same key columns as to be used by

the funnel operation.

2ash Partition guarantees that all records with same key column "alues are located in the same

partition and are processed in the same node.


70/158

!1Continuous &unnel:

!o to the properties of the funnel stage page and set Funnel Type to continuous funnel.


71/158

'Se$uence:


72/158

Note:In order to use se$uence &unnel ,ou need to speci&, which order the input lin0s ,ou

need to process and also ma0e sure the stage runs in se$uential mode.


73/158

Note: I& ,ou are running ,our sort &unnel stage in parallel+ ,ou should -e aware o& the

*arious

considerations a-out sorting data and partitions

Thats all about funnel stage usage in datastage.

!)NNE# STAGE $"T% REA# T"'E E&A'(#E

Some times we get ata in multiple files which


74/158

%or %unnel take the Bo< esign as

"ea an *oa the ata into two seCuential files.

Go to %unnel stage /roperties an

Select %unnel Type D 0ontinous %unnel

? &r #ny other accoring to your reCuirement =

Go to output ,rag an rop the 0olumns

? "emem


75/158

)n orer to generate column ? for e: uniCueAi=

%irst rea an loa the ata in seC.file

Go to 0olumn Generator stage /roperties Select column metho as eplicit

)n column to generate D gie column name ? %or e: uniCueAi=

)n &utput rag an rop

Go to column write column name an you can change ata type for uniCueAi in sCl type an

can gie length with suita


76/158

n records loaded"" 4y using surrogate *ey you can continue the se!uence from n[5"

Surrogate 9ey Generator:

The Surrogate ;ey !enerator stage is a processing stage that generates surrogate key columns and

maintains the key source.

5 surrogate key is a unique primary key that is not deri"ed from the data that it represents$ therefore

changes to the data will not change the primary key. In a star schema database$ surrogate keys are

used to )oin a fact table to a dimension table.

surrogate key generator stage uses:

(reate or delete the key source before other )obs run


77/158

;ey Source 5ction 2 create

Source Type : FlatFile or atabase sequence&in this case we are using FlatFile'

#hen you run the )ob it will create an empty file.

If you want to the check the content change the Liew Stat File 2 7S and check the )ob log for details.

s0e,5genstage+6: State &ile 'tmp's0e,cutomerdim.stat is empt,.

if you try to create the same file again )ob will abort with the following error.

s0e,5genstage+6: 8na-le to create state &ile 'tmp's0e,cutomerdim.stat: File exists.

3eleting the 0e, source:

8pdating the stat File:

To update the stat file add surrogate key stage to the )ob with single input link from other stage.

#e use this process to update the stat file if it is corrupted or deleted.

='open the surrogate key stage editor and go to the properties tab.


78/158

If the stat file exists we can update otherwise we can create and update it.

#e are using SkeyLalue parameter to update the stat file using transformer stage.


79/158

!o to ouput and define the mapping like below.

/owgen we are using =8 rows and hence when we run the )ob we see =8 skey "alues in the output.

I ha"e updated the stat file with =88 and below is the output.


80/158

If you want to generate the key "alue from begining you can use following property in the surrogate

key stage.

a. If the key source is a flat file$ specify how keys are generated:

o To generate keys in sequence from the highest "alue that was last used$ set

the !enerate ;ey from >ast 6ighest Lalue property to es. 5ny gaps in the key range are ignored.

o To specify a "alue to initiali%e the key source$ add the File Initial Lalue property to the

+ptions group$ and specify the start "alue for key generation.

o To control the block si%e for key ranges$ add the File Block Si%e property to the +ptions

group$ set this property to


81/158

They are

Type S0,

Type- S0,

Type1 S0,

Type S0,: )n the type S0, methoology@ it will oerwrites the oler ata

? "ecors = with the new ata ? "ecors= an therefore it will not maintain the

historical information.

This will use for the correcting the spellings of names@ an for small upates of

customers.

Tpe - S0,: )n the Type- S0S methoology@ it will tracks the complete historical

information


82/158

information.

%O$ TO )SE T-(E 32 SC "N ATASTAGES0,;S is nothing


83/158

surrogate_key customer_id customer_name Location

------------------------------------------------

1 1 Marspton Illions

+ere the customer name is misspelt. )t shoul


84/158

$ow again if the customer moes from seattle to $ework@ then the upate ta


85/158

1 1 Marston Illions 1

! 1 Marston Seattle !

$ow again if the customer is moe to another location@ a new recor will


86/158

! 1 Marston Seattle !1-%e&-!#11 NULL

The $** in the (nA,ate inicates the current ersion of the ata an the remaining recorsinicate the past ata.

S0,- )mplementation in ,atastage:

Slowly changing dimension Type is a model where the whole history is stored in the database. 5n

additional dimension record is created and the segmenting between the old record "alues and the new&current' "alue is easy to extract and the history is clear.The fields Meffecti"e dateM and Mcurrent indicatorM are "ery often used in that dimension and the facttable usually stores dimension key and "ersion number.S( implementation in atastageThe )ob described and depicted below shows how to implement S( Type in atastage. It is one ofmany possible designs which can implement this dimension.For this example$ we will use a table with customers data &itMs name is 4(ookups transformer does a lookup into a hashed file and maps new and old "alues toseparate columns.S( lookup transformer

N 5 T884(heck4iscrepacies4exist transformer compares old and new "alues of records and passesthrough only records that differ.S( check discrepancies transformer

N 5 T88@ transformer handles the


87/158

3i"ot enterprise stage is a processing stage which pi"ots data "ertically and hori%ontally depending

upon the requirements. There are two types

=. 6ori%ontal

. Lertical

6ori%ontal 3i"ot operation sets input column to the multiple rows which is exactly opposite to the

Lertical 3i"ot +peration. It sets input rows to the multiple columns.

>etJs try to understand it one by one with following example.

=. 2ori=ontal Pi*ot Operation.

(onsider following Table.

Product ype Color3: Color3* Color37

Pen Nellow 4lue GreenDress Pin* Nellow Purple

Step !: esign our ?ob Structure >ike below.

(onfigure abo"e table with input sequential stage Pse_product_clr_detJ.

Step ": >etJs configure>3i"ot enterprise stageJ. ouble click on it. Following window will pop up.


88/158

Select PHorizontalJ for Pi*ot (,pefrom drop,down menu under PProperties tab for hori%ontal 3i"ot

operation.

Step #: (lick onPPivot PropertiesJ tab.


89/158

Step ?: 1ow we ha"e to mention columns to be pi"oted under Derivation against columnColor.

ouble click on it. Following #indow will pop up.

Select columns to be pi"oted from Available columnJ pane as shown. (lick O.

Step @:


90/158

(onfigure output stage. !i"e the file path. See below image for reference.

Step A: (ompile and /un the )ob. >etJs see what is happen to the output.


91/158

This is how we can set multiple input columns to the single column &5s here for colors'.

Vertical Pivot Operation:

6ere$ we are going to use Pivot !nterprisestage to "ertically pi"ot data. #e are going to set multiple

input rows to a single row. The main ad"antage of this stage is we can use aggregation functions like

a"g$ sum$ min$ max$ first$ last etc. for pi"oted column. >etJs see how it works.

(onsider an output data of 6ori%ontal +peration as input data for the 3i"ot 7nterprise stage. 6ere$ we

will be adding one extra column for aggregation function as shown in below table.

Product Color Pri;e

Pen Nellow LB

Pen 4lue UL

Pen Green 7A

Dress Pin* 5@@@

Dress Nellow X?A

Dress purple YLB

>etJs study for "ertical pi"ot operation step by step.

Step !: esign your )ob structure like below. (onfigure abo"e table data with input sequential file

se_product_det.

Step ": +pen Pivot !nterprisestage and select Pivot t"pe as *ertical underpropertiestab.


92/158

Step #: ets see how to use

P5ggregation functionsJ in next step.

Step ?: +n clickingA##re#ation $unctions re%uired $or t&is column for particular column following

window will pop up. In which we can select functions whiche"er required for that particular column.

6ere we are using PminJ$ JmaxJ and avera#efunctions with proper precision and scale for Prize columnas shown.


93/158

Step @:1ow we )ust ha"e to do mapping under output tab as shown below.

Step A: compile and /un the )ob. >ets see what will be the output is.

Output :


94/158

One more approach:

)any #eo#le have the following misconceptionsa*out +ivot stage.

5' t converts rows into columns2' By using a #ivot stage, we can convert 1- rows into 1-- columns and 1-- columns

into 1- rows

' %ou can add more #oints here//

0et me first tell you that a +ivot stage only ON"3S O04)NS NO 3O5S and

nothing else. Some !S +rofessionals refer to this as NO3)6076ON. 6nother fact a*out

the +ivot stage is that it8s irre#lacea*le i.e no other stage has this functionality of

converting columns into rows/// So , that makes it uni9ue, doesn8t///

0et8s cover how exactly it does it....

$or exam#le, lets take a file with the following fields: tem, ;uantity1, ;uantity2,

;uantity....

tem


95/158

2ow connect a Pivot stage from the ool pallette to the above output lin* and

create an output lin* for the Pivot stage itself %fr enabling the -utput tab for the

pivot stage'"

.nli*e other stages) a pivot stage doesn6t use the generic G.0 stage page" 0t has a

stage page of its own" (nd by default the -utput columns page would not have

any fields" 1ence) you need to manually type in the fields" 0n this case just type in

the 7 field names : 0tem and Quantity" 1owever manual typing of the columns

becomes a tedious process when the number of fields is more" 0n this case you can

use the 3etadata Save $ ,oad feature" Go the input columns tab of the pivot stage)

save the table definitions and load them in the output columns tab" his is the way

0 use it\\\

2ow) you have the following fields in the -utput /olumn6s tab"""0tem and

Quantity""""1ere comes the tric*y part i"e you need to specify theD+R0>(0-2 """"0n case the field names of -utput columns tab are same as the

0nput tab) you need not specify any derivation i"e in this case for the 0tem field)

you need not specify any derivation" 4ut if the -utput columns tab has new field

names) you need to specify Derivation or you would get a R.2$03+ error for

free""""

For our e&ample) you need to type the Derivation for the Quantity field as

/olumn name Derivation

0tem 0tem (or you can leave this blan&)Quantity Quantity5) Quantity7) QuantityL"

Cust attach another file stage and view your output\\\ So) objective met\\\

Se%uence3Activities 1

In this article i will explain how to use datastage looping acit"ities in sequencer.

I ha"e a requirement where i need to pass file id as parameter reading from a file.In Future file idJs

will increase so that i donJt ha"e to add )ob or change sequencer if I take ad"antage of datastage

looping.

(ontents in the File:

=Q88


96/158

Q@88

@QA88

I need to read the abo"e file and pass second field as parameter to the )ob.I ha"e created one parallel

)ob with pFileI as parameter.

Step:= (ount the number of lines in the file so that we can set the upper limit in the datastage startloop acti"ity.

sample routine to count lines in a file:

5rgument : File1ame&Including path'

effun S/0essage&5=$ 5$ 5@' (alling R-ataStage-S/407SS5!7

7quate /outine1ame To R(ount>ines

(ommand 2 Rwc ,l: R:File1ame:Q awk Pprint U=VJ

(all S>ogInfo&R7xecuting (ommand To !et the /ecord (ount R$(ommand'

- call support routine that executes a Shell command.

(all S7xecute&RogInfo&R6ere is the /ecord (ount In R:File1ame: 2 R:+utput$+utput'

5ns 2 +utput

-!oTo 1ormal7xit

7nd 7lse

(all S>ogInfo&R7rror when executing command R$(ommand'

(all S>ogFatal&+utput$ /outine1ame'

5ns 2 =

7nd


97/158

1ow we use start>oop.U(ounter "ariable to get the file id by using combination of grep and awk

command.

for each iteration it will get file id.


98/158

Finally the seq )ob looks like below.

I hope e"ery one likes this post.

222222222222222222222222222222222222222222222222222222222222222

RA0SF'R"(R SAG( ' F.L(R


99/158

-@tom@eeloper@-!1@Him@clerck@!2@on@tester@1!5@Ieera@eeloper@-!6@arun@clerck@!7@luti@prouction@2!8@raHa@priuction@2!

#n our reCuirement is to get the target ata as


100/158

defiantly u get successful massage$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Question: 0 want to process L files in se!uentially one by one how can i do that" while processingthe files it should fetch files automatically "

(ns:0f the metadata for all the files r same then create a job having file name as parameter thenuse same job in routine and call the job with different file name"""or u can create se!uencer to usethe job""$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$Parameteri8e the file name"4uild the job using that parameter4uild job se!uencer which will call this job and will accept the parameter for file name"#rite a .20; shell script which will call the job se!uencer three times by passing different fileeach time"R+: #hat 1appens if R/P is disable ]

0n such case -sh has to perform 0mport and e&port every time whenthe job runs and theprocessing time job is also increased"""$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$Runtime column propagation %R/P': 0f R/P is enabled for any job and specifically for thosestages whose output connects to the shared container input then meta data will be propagated atrun time so there is no need to map it at design time"0f R/P is disabled for the job in such case -S1 has to perform 0mport and e&port every timewhen the job runs and the processing time job is also increased"hen you have to manually enter all the column description in each stage"R/P$ Runtime columnpropagation

Question:Source: arget

+no +name +no +name5 a)b 5 a7 c)d 7 bL e)f L c

Di##erence )etween !oin>Lookup and "erge 1


101/158

Datastage Scenarios and solutions 1

Field mapping using ransformer stage:

Re!uirement:field will be right justified 8ero filled) a*e last 5B characters

Solution:Right%E@@@@@@@@@@E:rim%,n*;fmrans"lin*')5B'

Scenario 5:

#e have two datasets with U cols each with different names" #e should create a dataset with Ucols L from one dataset and one col with the record count of one dataset"

#e can use aggregator with a dummy column and get the count from one dataset and do a loo*up from other dataset and map it to the L rd dataset


102/158

Something similar to the below design:

Scenario 7:

Following is the e&isting job design" 4ut re!uirement got changed to: 1ead and trailer datasetsshould populate even if detail records is not present in the source file" 4elow job don6t do thatjob"

1ence changed the above job to this following re!uirement:


103/158

.sed row generator with a copy stage" Given default value% 8ero' for col% count' coming in from

row generator" 0f no detail records it will pic* the record count from row generator"

#e have a source which is a se!uential file with header and footer" 1ow to remove the headerand footer while reading this file using se!uential file stage of Datastage]Sol:3ype command in putty: sed d< fle?name@new?fle?name type thisin %ob be&ore %ob subroutine then use new fle in seq sta'e.

0F 0 1(>+ S-.R/+ ,09+ /-,5 ( ( 4 (2D (RG+ ,09+ /-,5 /-,7 ( 5 ( 7 45" 1-#- (/10+>+ 10S -.P. .S02G S(G+ >(R0(4,+ 02 R(2SF-R3+R S(G+]

0f *ey/hange 5 hen 5 +lse stagevaraible[5

Suppose that U job control by the se!uencer li*e %job 5) job 7) job L) job U 'if job 5 have 5@)@@@row )after run the job only A@@@ data has been loaded in target table remaining are not loaded andyour job going to be aborted then"" 1ow can short out the problem"Suppose job se!uencersynchronies or control U job but job 5 have problem) in this condition should go director andchec* it what type of problem showing either data type problem) warning massage) job fail or jobaborted) 0f job fail means data type problem or missing column action "So u should go Runwindow $/lic*$ racing$Performance or 0n your target table $general $ action$ select this

option here two option%i' -n Fail $$ commit ) /ontinue%ii' -n S*ip $$ /ommit) /ontinue"First u chec* how many data already load after then select on s*ip option then continue and whatremaining position data not loaded then select -n Fail ) /ontinue """""" (gain Run the jobdefiantly u get successful massageQuestion: 0 want to process L files in se!uentially one by one how can i do that" while processingthe files it should fetch files automatically "


104/158

(ns:0f the metadata for all the files r same then create a job having file name as parameter thenuse same job in routine and call the job with different file name"""or u can create se!uencer to usethe job""

Parameteri8e the file name"

4uild the job using that parameter4uild job se!uencer which will call this job and will accept the parameter for file name"#rite a .20; shell script which will call the job se!uencer three times by passing different fileeach time"R+: #hat 1appens if R/P is disable ]

0n such case -sh has to perform 0mport and e&port every time when the job runs and theprocessing time job is also increased"""Runtime column propagation %R/P': 0f R/P is enabled for any job and specifically for thosestages whose output connects to the shared container input then meta data will be propagated atrun time so there is no need to map it at design time"

0f R/P is disabled for the job in such case -S1 has to perform 0mport and e&port every timewhen the job runs and the processing time job is also increased"hen you have to manually enter all the column description in each stage"R/P$ Runtime columnpropagation

Question:Source: arget

+no +name +no +name5 a)b 5 a7 c)d 7 bL e)f L c

source has 7 fields li*e

/-3P(2N ,-/(0-2043 1ND/S 4(2043 /1+1/, 1ND/S /1+043 4(21/, 4(21/, /1+

,09+ 10S"""""""

1+2 1+ -.P. ,--9S ,09+ 10S""""


105/158

/ompany loc count

/S 1ND L 4(2 /1+043 1ND L 4(2 /1+1/, 1ND L 4(2 /1+7'input is li*e this:no)char5)a7)bL)aU)bA)aX)aY)bB)a

4ut the output is in this form with row numbering of Duplicate occurence

output:

no)char)/ountE5E)EaE)E5EEXE)EaE)E7EEAE)EaE)ELEEBE)EaE)EUEELE)EaE)EAEE7E)EbE)E5EEYE)EbE)E7EEUE)EbE)ELEL'0nput is li*e this:file55@7@5@5@7@


106/158

L@

-utput is li*e:

file7 fileL%duplicates'5@ 5@7@ 5@L@ 7@

U'0nput is li*e:file55@7@5@5@7@L@

-utput is li*e 3ultiple occurrences in one file and single occurrences in one file:file7 fileL5@ L@5@5@7@7@

A'0nput is li*e this:file55@7@5@5@7@L@

-utput is li*e:file7 fileL5@ L@7@

X'0nput is li*e this:file557LU


107/158

AXYB?

5@

-utput is li*e:file7%odd' fileL%even'5 7L UA XY B? 5@

Y'1ow to calculate Sum%sal') (vg%sal') 3in%sal') 3a&%sal' with out

using (ggregator stage""

B'1ow to find out First sal) ,ast sal in each dept with out using aggregator stage

?'1ow many ways are there to perform remove duplicates function with out usingRemove duplicate stage""

Scenario:

source has 7 fields li*e

/-3P(2N ,-/(0-2043 1ND/S 4(2043 /1+1/, 1ND/S /1+043 4(21/, 4(21/, /1+

,09+ 10S"""""""

1+2 1+ -.P. ,--9S ,09+ 10S""""

/ompany loc count

/S 1ND L 4(2


108/158

/1+043 1ND L 4(2 /1+1/, 1ND L

4(2 /1+

Solution:

Seqfile......XSort......XTrans......X/emo"euplicates..........ataset

Sort Trans:

;ey2(ompany create stage "ariable as (ompany=Sort order25sc (ompany=2If&in.keychange2=' then in.>ocation 7lse

(ompany=:M$M:in.>ocationcreate keychange2True rag and rop in deri"ation

(ompany ....................(ompany

(ompany=........................>ocation/emo"eup:

;ey2(ompany

uplicates To /etain2>ast

=='The input isShirtQredQblueQgreen3antQpinkQredQblue

+utput should be$

Shirt:red

Shirt:blueShirt:green

pant:pinkpant:red

pant:blue

Solution:

it is re"erse to pi"ote stage

useseq,,,,,,sort,,,,,,tr,,,,rd,,,,,tr,,,,tg

in the sort stage use create key change column is true

in trans create stage "ariable2if colu2= then key c."alue else key "::columrd stage use duplicates retain last

tran stage use field function superate columns

similar Scenario: :


109/158

sourcecol= col@

= samsung= nokia

= ercisson iphone

motrolla@ la"a

@ blackberry@ reliance

7xpected +utput

col = col col@ colA= samsung nokia ercission

iphone motrolla@ la"a blackberry reliance

ou can get it by using Sort stage ,,, Transformer stage ,,, /emo"euplicates ,,,Transformer ,,tgt

+k

First /ead and >oad the data into your source file& For 7xample Sequential File '

5nd in Sort stage select key change column 2 True & To !enerate !roup ids'

!o to Transformer stage

(reate one stage "ariable.

ou can do this by right click in stage "ariable go to properties and name it as your wish& For example temp'

and in expression write as below

if keychange column 2= then column name else temp:M$M:column name

This column name is the one you want in the required column with delimited commas.

+n remo"e duplicates stage key is col= and set option duplicates retain to,,X >ast.

in transformer drop col@ and define @ columns like col$col@$colAin col= deri"ation gi"e Field&Input(olumn$W$W$=' and

in col= deri"ation gi"e Field&Input(olumn$W$W$' and

in col= deri"ation gi"e Field&Input(olumn$W$W$@'

Scenario:='(onsider the following employees data as source

employee4id$ salary,,,,,,,,,,,,,,,,,,,

=8$ =8888$ 888


110/158

@8$ @888A8$ 888

(reate a )ob to find the sum of salaries of all employees and this sum should repeat for all

the rows.

The output should look like as

employee4id$ salary$ salary4sum,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

=8$ =888$ ==8888$ 888$ ==888

@8$ @888$ ==888A8$ 888$ ==888

Scenario:

I ha"e two source tablesHfiles numbered = and .In the the target$ there are three output tablesHfiles$ numbered @$A and .

The scenario is that$

to the out put A ,X the records which are common to both = and should go.

to the output @ ,X the records which are only in = but not in should go

to the output ,X the records which are only in but not in = should go.

sltn:src=,,,,,Xcopy=,,,,,,X,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Xoutput4=&only left table'?oin&inner type',,,,X ouput4=

src,,,,,Xcopy,,,,,,X,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Xoutput4@&only right table'

(onsider the following employees data as source

employee4id$ salary,,,,,,,,,,,,,,,,,,,

=8$ =8888$ 888

@8$ @888A8$ 888

Scenario:

(reate a )ob to find the sum of salaries of all employees and this sum should repeat for allthe rows.

The output should look like as

employee4id$ salary$ salary4sum

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,=8$ =888$ ==888

8$ 888$ ==888


111/158

@8$ @888$ ==888A8$ 888$ ==888

sltn:

Take Source ,,,XTransformer&5dd new (olumn on both the output links and assign a "alueas = ',,,,,,,,,,,,,,,,,,,,,,,,X =' 5ggregator &o group by using that

new column''lookupH)oin& )oin on that new column',,,,,,,,Xtgt.

Scenario:

sno$sname$mark=$mark$mark@=$ra)esh$Y8$E$YD

$mamatha$@D$A$Y@$an)ali$EY$@D$Y

A$pa"ani$D$E$A

$indu$E$EY$Y

out put issno$snmae$mark=$mark$mark@$delimetercount

=$ra)esh$Y8$E$YD$A

$mamatha$@D$A$Y$A@$an)ali$EY$@D$Y$A

A$pa"ani$D$E$A$A$indu$E$EY$Y$A

seq,,,Xtrans,,,Xseq

create one stage "ariable as delimiter..

and put deri"ation on stage as S>inkA.sno : W$W : S>inkA.sname : W$W : S>inkA.mark= :

W$W :S>inkA.mark : W$W : S>inkA.mark@

and do mapping and create one more column count as integer type.

and put deri"ation on count column as (ount&delimter$ W$W'

scenario:

sname total4"owels4count5llen

Scott =#ard =

[email protected]$WaW'Z(ount&S>[email protected]$WeW'

Z(ount&S>[email protected]$WiW'Z(ount&S>[email protected]$WoW'

Z(ount&S>[email protected]$WuW'.

Scenario:


112/158

='+n daily we r getting some huge files data so all files metadata is same we ha"e to load into target table how we can load

[email protected]='[2A +/

aysSinceFromate&(urrentate&'$ S>[email protected]='[2AE

where date4= column is the column ha"ing that date which needs to be less or equal to= months and A is no. of days for = months and for leap year it is AE&these

numbers you need to check'.

Bhat is di&&erences -etween Force Compile and Compile ;

3i&& -'w Compile and 7alidate;

(ompile option only checks for all mandatory requirements like link requirements$ stage

options and all. But it will not check if the database connections are "alid.Lalidate is equi"alent to /unning a )ob except for extractionHloading of data. That is$

"alidate option will test database connecti"ity by making connections to databases.

%o; to !"nd Out u/lcate alues )sng Trans1ormer

ou can capture the uplicate recors


113/158

3y source is ,i*e

Srno) 2ame5@)a

5@)b7@)c

L@)dL@)e

U@)f

3y target Should ,i*e:

arget 5:%-nly uni!ue means which records r only once'7@)cU@)f

arget 7:%Records which r having more than 5 time'5@)a5@)bL@)dL@)e

1ow to do this in DataStage""""

ZZZZZZZZZZZZZZ

use aggregator and transformer stages

source$$aggregator$$transformat$$target

perform count in aggregator) and ta*e two op lin*s in trasformer) filter data count5 for one llin*

and put count5 for second lin*"

Scenario1

in my i


114/158

ZZZZZZZZZZZZZZZZZ

source$$$trans$$$$target

in trans use conditions on constraints

mod%empno)L'5

mod%empno)L'7

mod%empno)L'@Scenario1

im having i


115/158

baba@@Ysaid"""

1ow to Find -ut Duplicate >alues .sing ransformer]

another way to find the duplicate value can be using a sorter stage before transformer"

0n sorter: ma*e /luster 9ey change R.+on the 9eythen in ransformer filter the oulput on basic of value of cluste *ey change which can beput in stage variable"

cenarios_Unix :

1) Convert single column to single row:Input: flename : try

R)?R$DR$D?ABA443?CA74DR?4DRD43R;43SB3?4DAR3R

R$$E$77$?A44RA7

Output:R)?R$D R$D?AB A443?CA7 4DR?4D RD43 R;43SB3?4D AR3R R$$ E$77$?A44RA7

Command:cat try F aw- GHprint& IJs I,>1KL

2) Print the list o employees in Technologydepartment :

ow department name is available as a &ourth feld, so need to chec- i& >!matches with the strin' I3echnolo'yM, i& yes print the line(

Command:> aw- G>! N63echnolo'y6L employee(t+t

2// ;ason Developer 3echnolo'y >","//// San%ay Sysadmin 3echnolo'y >O,///"// Randy DEA 3echnolo'y >#,///
http://www.blogger.com/profile/13597086736904978295http://www.blogger.com/profile/13597086736904978295


116/158

perator N is &or comparin' with the re'ular e+pressions( $& it matches thede&ault action i(e print whole line will be per&ormed(

) Convert single column to multiple column :

!or eg:$nput fle contain sin'le column with 8! rows then output should

be sin'le column data converted to multiple o& 12 columns i(e( 12 column P Orows with feld separtor &s =.

"cript:

#!/bin/sh

rows=`cat input_file | wc -l`

cols=12

fs=;

awk -v r=$rows -v c=$cols -v t=$fs '

NR output_file

#) $ast feld print:

input:a56Data6)iles62/102/11(csv

output:2/102/11(csv

Command:echo >a F aw- 0)6 GHprint >)KL

%)Count no5 o# #ields in #ile1fle1:a) b) c) d) 5) 7) man) fruitCommand:cat fle1 F aw- GE$H)S5M,MK=Hprint )KL

and you will 'et the output as:8

&) !ind ip address in uni' server:Command:'rep 0i your?hostname 6etc6hosts

() eplace the word corresponding to searchpattern:


117/158

>cat file

the black cat was chased by the brown dog.

the black cat was not chased by the brown dog.

>sed -e '/not/s/black/white/g' file

the black cat was chased by the brown dog.

the white cat was not chased by the brown dog.

*) The +elow i have shown the demo or the ,-. and ,&%/0

Ascii value o& character: $t can be done in 2 ways:

1( print& IJdM IGAM2( echo IAM F tr 0d IQnM F od 0An 0t d4

4haracter value &rom Ascii: aw- 0v char5#" GE$ H print& IJcQnM, char= e+itKL

) Input fle:

crmplp1 cmis!#1 o nlinecmis!#2 o ine

crmplp2 cmis!#2 o nlinecmis!# o inecrmplp cmis!# o nlinecmis!#1 o ine

utput T@crmplp1 cmis!#1 o nline cmis!#2 o inecrmplp2 cmis!#2 o nline cmis!# o ine

4ommand:aw- GRJ2URS5)S:RS5RSL fle

1) 3aria+le can used in -45

aw- 0)M>cM 0v var5M>cM GHprint >1var>2KL flename

11) "earch pattern and use special character in sed command:


118/158

sed $e H


119/158

5" 1ow to display the 5@th line of a file]head $5@ filename ` tail $5

7" 1ow to remove the header from a file]

sed $i 65 d6 filename

L" 1ow to remove the footer from a file]

sed $i 6 d6 filename

U" #rite a command to find the length of a line in a file]

he below command can be used to get a line from a file"

sed Mn 6On p6 filename

#e will see how to find the length of 5@th line in a file

sed $n 65@ p6 filename`wc $c

A" 1ow to get the nth word of a line in .ni&]

cut MfOn $d6 6

X" 1ow to reverse a string in uni&]

echo EjavaE ` rev

Y" 1ow to get the last word from a line in .ni& file]

echo Euni& is goodE ` rev ` cut $f5 $d6 6 ` rev

B" 1ow to replace the n$th line in a file with a new line in .ni&]

sed $i66 65@ d6 filename ^ d stands for delete

sed $i66 65@ i new inserted line6 filename ^ i stands for insert

?" 1ow to chec* if the last command was successful in .ni&]

echo ]

5@" #rite command to list all the lin*s from a directory]

ls $lrt ` grep ElE

55" 1ow will you find which operating system your system is running on in .20;]

uname $a

57" /reate a read$only file in your home directory]

touch file chmod U@@ file


120/158

5L" 1ow do you see command line history in .20;]

he 6history6 command can be used to get the list of commands that we are e&ecuted"

5U" 1ow to display the first 7@ lines of a file]

4y default) the head command displays the first 5@ lines from a file" 0f we change the option ofhead) then we can display as many lines as we want"

head $7@ filename

(n alternative solution is using the sed command

sed 675) d6 filename

he d option here deletes the lines from 75 to the end of the file

5A" #rite a command to print the last line of a file]

he tail command can be used to display the last lines from a file"

tail $5 filename

(lternative solutions are:

sed $n 6 p6 filename

aw* 6+2Dprint @T6 filename

5X" 1ow do you rename the files in a directory with new as suffi&]

ls $lrt`grep 6$6` aw* 6print Emv E?E E?E"newET6 ` sh

5Y" #rite a command to convert a string from lower case to upper case]

echo EappleE ` tr Va$8W V($W

5B" #rite a command to convert a string to 0nitcap"echo apple ` aw* 6print toupper%substr%5)5)5'' tolower%substr%5)7''T6

5?" #rite a command to redirect the output of date command to multiple files]he tee command writes the output to multiple files and also displays the output on the terminal"date ` tee $a file5 file7 fileL

7@" 1ow do you list the hidden files in current directory]

ls $a ` grep 6"6

75" ,ist out some of the 1ot 9eys available in bash shell]

/trl[l $ /lears the Screen"

/trl[r $ Does a search in previously given commands in shell"

/trl[u $ /lears the typing before the hot*ey"

/trl[a $ Places cursor at the beginning of the command at shell"

/trl[e $ Places cursor at the end of the command at shell"


121/158

/trl[d $ 9ills the shell"

/trl[8 $ Places the currently running process into bac*ground"

77" 1ow do you ma*e an e&isting file empty]cat


122/158

7Y" 1ow do you write the contents of L files into a single file]cat file5 file7 fileL file7B" 1ow to display the fields in a te&t file in reverse order]aw* 64+G02 -RSEET for%i2Fi@i$$' print i)E E print EnET6 filename

7?" #rite a command to find the sum of bytes %si8e of file' of all files in a directory"ls $l ` grep 6$6` aw* 64+G02 sum@T sum sum [ AT +2D print sumT6

L@" #rite a command to print the lines which end with the word EendE]grep 6end6 filenamehe 66 symbol specifies the grep command to search for the pattern at the end of the line"

L5" #rite a comm

Imp Datastage New

Documents

Transcript of Imp Datastage New