B308 Tpump

download B308 Tpump

of 23

Transcript of B308 Tpump

  • 8/22/2019 B308 Tpump

    1/23

    Module 8: TPump

    After completing this module, you will be able to:

    State the capabilities and limitations of TPump.

    Describe TPump commands and parameters.

    Prepare a TPump script.

  • 8/22/2019 B308 Tpump

    2/23

    TPump

    Allows near real-time updates from transactional systems into the warehouse.

    Performs INSERT, UPDATE, and DELETE operations, or a combination, from the same

    source. Up to 63 DML statements can be included for one IMPORT task.

    Alternative to MultiLoad for low-volume batch maintenance of large databases;

    replacement for BulkLoad.

    Allows target tables to:

    Have secondary indexes and Referential Integrity constraints. Be MULTISET or SET.

    Be populated or empty.

    Have triggers - invoked as necessary

    Allows conditional processing.

    Supports automatic restarts; uses Support Environment.

    No session limit use as many sessions as necessary.

    No limit to the number of concurrent instances.

    Uses row-hash locks, allowing concurrent updates on the same table.

    Can always be stopped and locks dropped with no ill effect.

    Designed for highest possible throughput.

    User can specify how many updates occur minute by minute; can be changed as thejob runs.

  • 8/22/2019 B308 Tpump

    3/23

    TPump Limitations

    Use of SELECT is not allowed.

    Concatenation of data files is not supported.

    Exponential operators are not allowed.

    Aggregate operators are not allowed.

    Arithmetic functions are not supported.

    There is a limit of four IMPORT commands within a single TPump "load" task.

    In using TPump with dates before 1900 or after 1999, the year portion of the

    date must be represented by four numerals (yyyy).

    The default of two numerals (yy) to represent the year is interpreted to be the 20th

    century. The correct date format must be specified at the time of table creation.

  • 8/22/2019 B308 Tpump

    4/23

    .BEGIN LOAD Statement

    Many of the .BEGIN parameters are comparable to those for MultiLoad.

    .BEGIN LOAD

    SESSIONS max [min] (required)

    ERRORTABLE tablename (defaults to jobname_ET)

    ERRLIMIT errcount [errpercent]

    CHECKPOINT f requency (default is 15 minutes)

    TENACITY hours (default is 4)SLEEP minutes (default is 6)

    However, TPump has numerous parameters on the .BEGIN LOAD statement

    that are unique to TPump.

    SERIALIZE ON| OFF (default ON if UPSERT)

    PACK number (default is 20, max is 600)

    PACKMAXIMUM (use maximum pack factor)

    RATE number (default is unlimited)

    LATENCY number (range is 10 600 seconds)

    NOMONITOR (default is monitoring on)

    ROBUSTON

    |OFF

    (default is ON)MACRODB dbname (default is logtable dbase) ;

  • 8/22/2019 B308 Tpump

    5/23

    TPump Specific Parameters

    Specific TPump .BEGIN LOAD parameters are:

    SERIALIZE ON | OFF ON guarantees that operations on a given key combination (row)occur serially. Used only when a primary index is specified.

    KEY option must be specified if SERIALIZE ON.

    PACK statements Number of statements to pack into a multiple-statement request.

    PACKMAXIMUM Number of statements to pack into a multiple-statement request.

    RATE statement Initial maximum rate at which statements are sent per minute.rate If the statement rate is zero or unspecified, the rate is unlimited.

    LATENCY seconds # of seconds before a partial buffer is sent to the database.

    NOMONITOR Prevents TPump from checking for statement rate changes fromor update status information for the TPump Monitor.

    ROBUST ON | OFF OFF signals TPump to use simple restart logic; TPump willbegin where the last checkpoint occurred.

    MACRODB dbname Indicate a database to contain any macros used by TPump.

  • 8/22/2019 B308 Tpump

    6/23

    .BEGIN LOAD PACK

    PACK specifies the number of statements to pack into a multi-statement

    request.

    Improves network/channel efficiency by reducing the number of sends and

    receives between the application and Teradata.

    Increasing the PACK rate improves throughput performance to a certain

    level.

    Restrictions to consider:

    64K message size limit

    TPump limit of 600 statements Teradata USING clause limit of 2560 columns (from 507)

    Teradata Plastic Steps limit

  • 8/22/2019 B308 Tpump

    7/23

    .BEGIN LOAD SERIALIZE OFF

    With SERIALIZE OFF, transactions are processed in the order they are encountered and

    placed in the first available buffer. Buffers are sent to PE sessions and different PEsprocess the data independently of other PEs.

    SERIALIZE OFF does not guarantee the order in which transactions are processed.

    TransactionFile

    PI Time

    01 8:00

    03 8:01

    02 8:02

    01 8:03

    04 8:0405 8:05

    03 8:06

    01 8:07

    08 8:08

    06 8:09

    07 8:10

    01 8:11

    03 8:1202 8:13

    Session 1

    01 8:00

    03 8:01

    02 8:02

    01 8:03

    08 8:08

    06 8:09

    07 8:10

    01 8:11

    Session 2

    04 8:04

    05 8:05

    03 8:06

    01 8:07

    03 8:12

    02 8:13

    AMP

    0

    AMP

    1

    AMP

    2

    AMP

    3

    AMP

    N

    AMP

    TPumpBuffers

    01 8:00

    03 8:01

    02 8:02

    01 8:03

    04 8:04

    05 8:0503 8:06

    01 8:07

    08 8:08

    06 8:09

    07 8:10

    01 8:11

    :: Teradata

    This set of transactions

    may be processed first.

  • 8/22/2019 B308 Tpump

    8/23

    .BEGIN LOAD SERIALIZE ON

    SERIALIZE ON can eliminate lock delays or potential deadlocks caused by primary index

    collisions, improving performance. SERIALIZE guarantees both input record order and all records with the same PI value

    will be handled in the same session. It is recommended to specify the PI in the

    statement column(s) as KEY.

    KEY Fields determine the PE session in which TPump send the transaction to.

    TransactionFile

    PI Time

    01 8:00

    03 8:01

    02 8:02

    01 8:03

    04 8:0405 8:05

    03 8:06

    01 8:07

    08 8:08

    06 8:09

    07 8:10

    01 8:11

    03 8:1202 8:13

    Session 1

    01 8:00

    02 8:02

    01 8:03

    01 8:07

    08 8:08

    06 8:09

    01 8:11

    02 8:13

    Session 2

    03 8:01

    04 8:04

    05 8:05

    03 8:06

    07 8:10

    03 8:12

    AMP

    0

    AMP

    1

    AMP

    2

    AMP

    3

    AMP

    N

    AMP

    TPumpBuffers

    01 8:00

    02 8:02

    01 8:03

    01 8:07

    03 8:01

    04 8:0405 8:05

    03 8:06

    08 8:08

    06 8:09

    01 8:11

    02 8:13

    :: Teradata

  • 8/22/2019 B308 Tpump

    9/23

    .BEGIN LOAD ROBUST ON

    ROBUST ON is the default for all TPUMP jobs.

    This option avoids re-applying rows that have already been processed in the

    event of a restart.

    Causes a row to be written to the log table each time a buffer has successfully

    completed its updates.

    The larger the TPump PACK factor, the less overhead involved in this activity.

    These rows are deleted from the log when a checkpoint is taken.

    ROBUST ON is recommended for these specific conditions:

    INSERTS into multi-set tables, as such tables will allow re-insertion of the samerows multiple times.

    When UPDATEs are based on calculations or percentage increases.

    If PACK factors are large, and applying and rejecting duplicates after a restartwould be time-consuming.

    If data is time-stamped at the time it is inserted into the database.

    ROBUST ON is always a good idea for TPump jobs that read from queues. Itkeeps duplicates from being re-inserted into the table in the event of a restart.

  • 8/22/2019 B308 Tpump

    10/23

    Sample TPump Script (1 of 2)

    .LOGTABLE restart_log_tpp;

    .LOGON tdpid/username,password;

    .BEGIN LOAD SESSIONS 4 SERIALIZE OFF

    PACK 40 RATE 4800

    ERRORTABLE Errors_tpp ERRLIMIT 50 ;

    .LAYOUT layout12;

    .FIELD table_code 1 CHAR(1);

    .FIELD A_Account_Number 2 INTEGER;

    .FIELD A_Number * INTEGER;.FIELD A_Street * CHAR(25);

    .FIELD A_City * CHAR(20);

    .FIELD A_State * CHAR(2);

    .FIELD A_Zip_Code * INTEGER;

    .FIELD A_Balance_Forward * DECIMAL(10,2);

    .FIELD A_Balance_Current * DECIMAL (10,2);

    .FIELD C_Customer_Number 2 INTEGER;

    .FIELD C_Last_Name * CHAR(30);

    .FIELD C_First_Name * CHAR(20);

    .FIELD C_Social_Security * INTEGER;

    .FIELD T_Trans_Number 2 INTEGER;

    .FIELD T_Trans_Date * CHAR(10);

    .FIELD T_Account_Number * INTEGER;

    .FIELD T_Trans_ID * CHAR(4);

    .FIELD T_Trans_Amount * DECIMAL(10,2);

  • 8/22/2019 B308 Tpump

    11/23

    Sample TPump Script (2 of 2)

    .DML LABEL lns_Account;

    INSERT INTO Accounts (account_number, number, street, city, state, zip_code, balance_forward,balance_current )

    VALUES ( :A_Account_Number, :A_Number, :A_Street, :A_City, :A_State, :A_Zip_Code,

    :A_Balance_Forward, :A_Balance_Current );

    .DML LABEL lns_Trans;

    INSERT INTO Trans (trans_number, trans_date, account_number, trans_id, trans_amount)

    VALUES ( :T_Trans_Number, :T_Trans_Date, :T_Account_Number, :T_Trans_Id, :T_Trans_Amount );

    .DML LABEL lns_Customer;

    INSERT INTO Customer (customer_number, last_name, first_name, social_security)

    VALUES ( :C_Customer_Number, :C_Last_Name, :C_First_Name, :C_Social_Security);

    .IMPORT INFILE datafile1 LAYOUT layout12

    APPLY lns_Account WHERE table_code = 'A'

    APPLY lns_Trans WHERE table_code = 'T'APPLY lns_Customer WHERE table_code = 'C';

    .IMPORT INFILE datafile2 LAYOUT layout12

    APPLY lns_Account WHERE table_code = 'A'

    APPLY lns_Trans WHERE table_code = 'T'

    APPLY lns_Customer WHERE table_code = 'C';

    .END LOAD;

    .LOGOFF;

  • 8/22/2019 B308 Tpump

    12/23

    TPump Compared with MultiLoad

    MultiLoad performance improves as the volume of changes increases.

    TPump does better on relatively low volumes of changes.

    TPump improves performance via a multiple statement request.

    TPump uses macros to modify tables rather than the actual DML

    commands.

    Ex. of macro name - M2000216_105642_01_0001

    MultiLoad uses the DML statements.

    TPump uses row hash locking to allow for concurrent read and write access

    to target tables. It can be stopped with target tables fully accessible.

    In Phase 4, MultiLoad locks tables for write access until it completes.

  • 8/22/2019 B308 Tpump

    13/23

    Additional TPump Statements

    DATABASE Changes the default database qualification for all DML statements.

    EXEC(UTE) Specifies a user-created macro for execution. The macro named is

    resident in the Teradata database.

    EXECUTE [database.]macro_name UPDATE/UPDINSERT/INS ;

    DELETE/DEL

    UPSERT/UPS

    DATABASE database ;

    Commands and statements in common with MultiLoad:

    ACCEPT IMPORT RUN

    DELETE INSERT SET

    DISPLAY LAYOUT SYSTEM

    DML LOGON TABLE

    FIELD LOGOFF UPDATE

    FILLER LOGIF / ELSE / ENDIF ROUTE

  • 8/22/2019 B308 Tpump

    14/23

    Invoking TPump

    Network Attached Systems: tpump [PARAMETERS] < scr ip tname >outf i lename

    Channel-Attached MVS Systems: // EXEC TDSTPUMP PARM= [PARAMETERS]

    Channel-Attached VM Systems: EXEC TPUMP [PARAMETERS]

    Channel Network DescriptionParameter Parameter

    BRIEF -b Reduces print output runtime to the least informationrequired to determine success or failure.

    CHARSET=charsetname -c charsetname Specify a character set or its code. Examples are EBCDIC,ASCII, or Kanji sets.

    ERRLOG=f i lename -e f i lename Alternate file specification for error messages; produces aduplicate record.

    "t pump command" -r 't pump cmd ' Signifies the start of a TPump job; usually a RUN FILEcommand that specifies the script file.

    MACROS -m keep macros that were created during the job run..

    VERBOSE -v Additional statistical data in addition to the regular statistics.

    .< scr ip tname Name of file that contains TPump commands and SQLstatements.

    > outf i lename Name of output file for TPump messages.

  • 8/22/2019 B308 Tpump

    15/23

    TPump Statistics

    . IMPORT 1 Total thus far

    . ========= ===========

    Candidate records considered:..... 200 200Apply conditions satisfied:....... 200 200

    Candidate records not applied:....... 0 0

    Candidate records rejected:.......... 0 0

    ** Statistics for Apply Label : UPS_ACCOUNT

    Type Database Table or Macro Name Activity

    U TLJC25 Accounts 100

    I TLJC25 Accounts 100

    **** 17:33:50 UTY0821 Error table TLJC25.errtable_tpp is EMPTY, dropping table.

    0018 .LOGOFF;

    ====================================================================== =

    = Logoff/Disconnect == ======================================================================

    **** 17:34:08 UTY6216 The restart log table has been dropped.

    **** 17:34:08 UTY6212 A successful disconnect was made from the RDBMS.

    **** 17:34:08 UTY2410 Total processor time used = '2.43 Seconds'

    . Start : 17:33:13 - TUE MAY 06, 2003

    . End : 17:34:08 - TUE MAY 06, 2003

    . Highest return code encountered = '0'.

    Note: These

    statistics are not for

    the example TPump

    job shown earlier in

    this module.

  • 8/22/2019 B308 Tpump

    16/23

    TPump Monitor

    Tool to control and track TPump imports.

    The table SysAdmin.TPumpStatusTbl is updated once a minute.

    Alter the statement rate on an import by updating this table using

    macros.

    Use macros and views to access this table.

    DBA Tools

    View

    SysAdmin.TPumpStatus - view allows DBAs to view all of the TPump jobs.

    Macro

    SysAdmin.TPumpUpdateSelect - allows DBAs to manage individual TPump jobs.

    User Tools

    View

    SysAdmin.TPumpStatusX - allows users to view their own TPump jobs.

    Macro

    TPumpMacro.UserUpdateSelect - allows users to manage their own jobs.

  • 8/22/2019 B308 Tpump

    17/23

    INMODs

    INMOD to TPump

    return codes: After

    a READ call.

    0 Valid data record in Buffer EOF not reached. Length field reflects correct

    length of output record. If an input record was supplied to the INMOD from

    TPump and is to be skipped, the length field should be set to zero. If no input

    record was supplied, setting the length to zero indicates EOF.

    Non 0 INMOD indicates end-of-file condition.

    INMOD to TPump

    return codes: After a

    non-READ call.

    0 Indicates a successful operation.

    1 Indicates a processing error; TPump will terminate

    TPump to INMODadditional return

    codes.

    0 Calling for the first time. INMOD should open files to read data. TPump expects a

    record.

    1 TPump expects a record.

    2 TPump has been restarted. INMOD should position itself to the last checkpoint.There may be re-positioning information in the data buffer.

    3 Request for INMOD to take a Checkpoint. The INMOD should return any

    information (up to 100 bytes) needed to reposition itself. TPump saves this in the

    Restart Log Table.

    4 Request for INMOD to reposition itself at the last checkpoint. Repositioning

    information may be in the data buffer.

    5 Request for INMOD to wrap up at termination.

    6 Request for INMOD to initialize and receive record.7 Request for INMOD to receive next record.

    TPump INMODDataTeradata

    Note:

    TPump can also use

    the MultiLoad INMODreturn codes.

  • 8/22/2019 B308 Tpump

    18/23

    Application Utility Checklist

    Feature BTEQ FastLoad FastExport MultiLoad TPump

    DDL Functions ALL LIMITED No ALL ALL

    DML Functions ALL INSERT SELECT INS/UPD/DEL INS/UPD/DEL

    Multiple DML Yes No Yes Yes Yes

    Multiple Tables Yes No Yes Yes Yes

    Multiple Sessions Yes Yes Yes Yes Yes

    Protocol Used SQL FASTLOAD EXPORT MULTILOAD SQL

    Conditional Expressions Yes No Yes Yes Yes

    Arithmetic Calculations Yes No Yes Yes NoData Conversion Yes 1percolumn Yes Yes Yes

    Error Files No Yes No Yes Yes

    Error Limits No Yes No Yes Yes

    User-written Routines No Yes Yes Yes Yes

  • 8/22/2019 B308 Tpump

    19/23

    Summary

    Allows near real-time updates from transactional systems into the

    warehouse.

    Performs INSERTs, UPDATEs, and DELETEs to more than 60 tables at a

    time.

    Alternative to MultiLoad for low-batch maintenance of large databases;

    replacement for BulkLoad.

    Uses row-hash locks, allowing concurrent updates on the same table.

    Can always be stopped and locks dropped with no ill effect.

    User can specify how many updates occur minute by minute; can be

    changed as the job runs.

    No arithmetic functions or file concatenations.

  • 8/22/2019 B308 Tpump

    20/23

    Review Questions

    Match the item in the f irst column to its correspo nding statement in the second co lumn .

    _____ 1. TPump purpose A. Query against TPump status table

    _____ 2. MultiLoad purpose B. Concurrent updates on same table

    _____ 3. Row hash locking C. Low-volume changes

    _____ 4. PACK D. Use to specify how many statements to put in a multi-statement

    request

    _____ 5. MACRO E. Large volume changes

    _____ 6. Statement rate change F. Used instead of DML

  • 8/22/2019 B308 Tpump

    21/23

    Module 8: Review Question Answers

    Match the item in the f irst column to its correspo nding statement in the second co lumn .

    __C__ 1. TPump purpose A. Query against TPump status table

    __E__ 2. MultiLoad purpose B. Concurrent updates on same table

    __B__ 3. Row hash locking C. Low-volume changes

    __D__ 4. PACK D. Use to specify how many statements to put in a multi-statement

    request

    __F__ 5. MACRO E. Large volume changes

    __A__ 6. Statement rate change F. Used instead of DML

  • 8/22/2019 B308 Tpump

    22/23

    Lab Exercises

    Lab Exercise 8-1

    Purpose

    In this lab, you will perform an operation similar to lab 7-2, using TPump instead of MultiLoad. For this

    exercise, use a PACK of 20 and a RATE of 2400.

    What you need

    Data file (data8_1) created from macro AU.Lab8_1.

    Tasks1. Delete all rows from the Accounts Table and use the following INSERT/SELECT to create 100 rows of

    test data:

    INSERT INTO Accounts SELECT * FROM AU.Accounts WHERE Account_Number LT 20024101 ;

    2. Export data to the file data8_1using the macro AU.lab8_1.

    3. Prepare a TPump script which performs an UPSERT operation (INSERT MISSING UPDATE) on yourAccounts table as a single operation. Use the data from data8_1as input to the UPSERT script. If the

    row exists, UPDATE the Balance_Current with the appropriate incoming value. If not, INSERT a row

    into the Accounts table. In your script, be sure to set a statement rate.

    4. Run the script.

    5. Validate your results. TPump should have performed 100 UPDATES and 100 INSERTS with a finalreturn code of zero.

  • 8/22/2019 B308 Tpump

    23/23

    Lab Solutions for Lab 8-1

    cat lab813.tpp

    .LOGTABLE Restartlog813_tpp ;

    .LOGON u4455/tljc30,tljc30 ;

    .BEGIN LOAD SESSIONS 4 PACK 40 RATE 4800;

    .LAYOUT Record_Layout_813;

    .FIELD in_accountno 1 INTEGER KEY;

    .FIELD in_number * INTEGER;

    .FIELD in_street * CHAR(25);

    .FIELD in_city * CHAR(20);

    .FIELD in_state * CHAR(2);

    .FIELD in_zip_code * INTEGER;

    .FIELD in_balancefor * DECIMAL (10,2);

    .FIELD in_balancecur * DECIMAL (10,2);

    .DML LABEL Fix_Account DO INSERT FOR MISSING UPDATE ROWS ;

    UPDATE Accounts SET Balance_Current = :in_balancecur

    WHERE Account_Number = :in_accountno ;

    INSERT INTO Accounts VALUES (:in_accountno, :in_number, :in_street, :in_city,

    :in_state, :in_zip_code, :in_balancefor, :in_balancecur);

    .IMPORT INFILE data8_1 LAYOUT Record_Layout_813 APPLY Fix_Account;

    .END LOAD;

    .LOGOFF;

    tp mp < lab813 tpp > lab813 o t