Datastage Imp Doc

download Datastage Imp Doc

of 177

  • date post

    18-Nov-2014
  • Category

    Documents

  • view

    239
  • download

    9

Embed Size (px)

description

Datastage IMP

Transcript of Datastage Imp Doc

Information Integration Solutions Center of Excellence

Parallel Framework Standard PracticesInvestigate, Design, Develop: Data Flow Job DevelopmentPrepared by IBM Information Integration Solutions Center of Excellence July 17, 2006

CONFIDENTIAL, PROPRIETARY, AND TRADE SECRET NATURE OF ATTACHED DOCUMENTSThis document is Confidential, Proprietary and Trade Secret Information (Confidential Information) of IBM, Inc. and is provided solely for the purpose of evaluating IBM products with the understanding that such Confidential Information will be disclosed only to those who have a need to know. The attached documents constitute Confidential Information as they include information relating to the business and/or products of IBM (including, without limitation, trade secrets, technical, business, and financial information) and are trade secret under the laws of the State of Massachusetts and the United States. Copyrights 2006 IBM Information Integration Solutions All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of IBM. While every precaution has been taken in the preparation of this document to reflect current information, IBM assumes no responsibility for errors or omissions or for damages resulting from the use of information contained herein.

Information Integration Solutions Center of Excellence

Document GoalsIntended Use This document presents a set of standard practices, methodologies, and examples for IBM WebSphere DataStage Enterprise Edition (DS/EE) on UNIX, Windows, and USS. Except where noted, this document is intended to supplement, not replace the installation documentation. The primary audience for this document is DataStage developers who have been trained in Enterprise Edition. Information in certain sections may also be relevant for Technical Architects, System Administrators, and Developers This document is intended for the following product releases: - WebSphere DataStage Enterprise Edition 7.5.1 (UNIX, USS) - WebSphere DataStage Enterprise Edition 7.5x2 (Windows)

Target Audience Product Version

Document Revision HistoryDateApril 16, 2004 June 30, 2005 December 9, 2005 January 31, 2006 February 17, 2006 March 10, 2006 March 31, 2006

Rev.1.0 2.0 3.0 3.1 4.0 4.1 4.2

DescriptionInitial Services release First version based on separation of EE BP into four separate documents, merged new material on Remote DB2, configuring DS for multiple users. Significant updates, additional material Updates based on review feedback. Added patch install checklist item (7.10) and Windows 7.5x2 patch list. Significant updates, new material on ETL overview, data types, naming standards, USS, design standards, database stage usage, database data type mappings, updated styles and use of cross-references. Corrected missing Figure 9. Added new material on establishing job boundaries, balancing job resource requirements / startup time with required data volume and processing windows, and minimizing number of runtime processes. Moved Baselining Performance discussion to Performance Tuniing BP. Expanded performance tuning section. Removed Architecture Overview (now a separate document). Expanded file stage recommendations. Updated directory naming standards for consistency with DS/EE Automation Standards and Toolkit. Segmented content into Red Book and Standards. Clarified terminology (Best Practices). Incorporated additional field feedback.

May 08, 2006 July 17, 2006

4.3 5.0

Document ConventionsThis document uses the following conventions: Convention Usage Bold In syntax, bold indicates commands, function names, keywords, and options that must be input exactly as shown. In text, bold indicates keys to press, function names, and menu selections. Italic In syntax, italic indicates information that you supply. In text, italic also indicates UNIX commands and options, file names, and pathnames. Plain In text, plain indicates Windows NT commands and options, file names, and pathnames. Bold Italic Indicates: important information.Parallel Framework Red Book: Data Flow Job Design July 17, 2006 2 of 179

2006 IBM Information Integration Solutions. All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of IBM.

Information Integration Solutions Center of Excellence Lucida Console Lucida Bold

Lucida Console text indicates examples of source code and system output. In examples, Lucida Console bold indicates characters that the user types or keys the user presses (for example, ). In examples, Lucida Blue will be used to illustrate operating system command line prompt. A right arrow between menu commands indicates you should choose each command in sequence. For example, Choose File Exit means you should choose File from the menu bar, and then choose Exit from the File pull-down menu. The continuation character is used in source code examples to indicate a line that is too long to fit on the page, but must be entered as a single line on screen.

Lucida Blue

This line continues

The following are also used: Syntax definitions and examples are indented for ease in reading. All punctuation marks included in the syntaxfor example, commas, parentheses, or quotation marksare required unless otherwise indicated. Syntax lines that do not fit on one line in this manual are continued on subsequent lines. The continuation lines are indented. When entering syntax, type the entire syntax entry, including the continuation lines, on the same input line. Text enclosed in parenthesis and underlined (like this) following the first use of proper terms will be used instead of the proper term. Interaction with our example system will usually include the system prompt (in blue) and the command, most often on 2 or more lines. If appropriate, the system prompt will include the user name and directory for context. For example:%etl_node%:dsadm /usr/dsadm/Ascential/DataStage > /bin/tar cvf /dev/rmt0 /usr/dsadm/Ascential/DataStage/Projects

Parallel Framework Red Book: Data Flow Job Design

July 17, 2006

3 of 179

2006 IBM Information Integration Solutions. All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of IBM.

Information Integration Solutions Center of Excellence

Table of Contents1 DATA INTEGRATION OVERVIEW...............................................................................................................................6 1.1 JOB SEQUENCES....................................................................................................................................................................7 1.2 JOB TYPES...........................................................................................................................................................................8 2 STANDARDS.....................................................................................................................................................................13 2.1 DIRECTORY STRUCTURES.....................................................................................................................................................13 2.2 NAMING CONVENTIONS.......................................................................................................................................................18 2.3 DOCUMENTATION AND ANNOTATION......................................................................................................................................29 2.4 WORKING WITH SOURCE CODE CONTROL SYSTEMS.................................................................................................................31 2.5 UNDERSTANDING A JOBS ENVIRONMENT...............................................................................................................................35 3 DEVELOPMENT GUIDELINES....................................................................................................................................39 3.1 MODULAR DEVELOPMENT ................................................................................................39 3.2 ESTABLISHING JOB BOUNDARIES...........................................................................................................................................39 3.3 JOB DESIGN TEMPLATES......................................................................................................................................................40 3.4 DEFAULT JOB DESIGN.........................................................................................................................................................41 3.5 JOB PARAMETERS................................................................................................................................................................42 3.6 PARALLEL SHARED CONTAINERS...........................................................................................................................................43 3.7 ERROR AND REJECT RECORD HANDLING................................................................................................................................43 3.8 COMPONENT USAGE............................................................................................................................................................51 4 DATASTAGE DATA TYPES........................................................................................................