22393651 Teradata Utilities FastLoad

download 22393651 Teradata Utilities FastLoad

of 21

Embed Size (px)

Transcript of 22393651 Teradata Utilities FastLoad

  • Teradata Utilities: FastLoad

    Reprinted for KV Satish Kumar, [email protected]

    Reprinted with permission as a subscription benefit of Books24x7,http://www.books24x7.com/

  • Table of Contents Chapter 3: FastLoad........................................................................................................................1

    Why it is Called "FAST" Load................................................................................................1How FastLoad Works.......................................................................................................1FastLoad Has Some Limits..............................................................................................2Three Key Requirements for FastLoad to Run................................................................2Maximum of 15 Loads......................................................................................................5

    FastLoad Has Two Phases....................................................................................................5Phase 1: Acquisition.........................................................................................................5Phase 2: Application........................................................................................................6

    FastLoad Commands.............................................................................................................6Fastload Sample....................................................................................................................6Executing a FastLoad Script..................................................................................................8Another Sample FastLoad Script...........................................................................................9Checkpoints.........................................................................................................................12Converting Data Types with FastLoad.................................................................................13A FastLoad Conversion Example........................................................................................13When You Cannot RESTART FastLoad..............................................................................14When You Can RESTART FastLoad...................................................................................15

    Step Two: Run the FastLoad script.................................................................................16What Happens When FastLoad Finishes............................................................................17

    You Receive an Outcome Status...................................................................................17You Receive a Status Report.........................................................................................17You can Troubleshoot....................................................................................................17

    Restarting FastLoad: A More In-Depth Look.......................................................................18How the CHECKPOINT Option Works...........................................................................18Restarting with CHECKPOINT.......................................................................................18

    Restarting without CHECKPOINT........................................................................................18Using INMODs with FastLoad..............................................................................................19

    i

  • Chapter 3: FastLoad"Where there is no patrol car, there is no speed limit."- Al Capone

    Why it is Called "FAST" LoadFastLoad is known for its lightning-like speed in loading vast amounts of data from flat files from ahost into empty tables in Teradata. Part of this speed is achieved because it does not use theTransient Journal. You will see some more of the reasons enumerated below. But, regardless of thereasons that it is fast, know that FastLoad was developed to load millions of rows into a table.

    The way FastLoad works can be illustrated by home construction, of all things! Let's look at threescenarios from the construction industry to provide an amazing picture of how the data gets loaded.

    Scenario One: Builders prefer to start with an empty lot and construct a house on it, from thefoundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot.The fewer barriers there are to deal with, the quicker the new construction can progress. Buildingcustom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to startwith an empty table, like an empty lot, and then populate it with rows of data from another source.Because the target table is empty, this method is typically the fastest way to load data. FastLoad willnever attempt to insert rows into a table that already holds data.

    Scenario Two: The second scenario in this analogy is when someone buys the perfect piece ofland on which to build a home, but the lot already has a house on it. In this case, the person maydetermine that it is quicker and more advantageous just to demolish the old house and start freshfrom the ground up allowing for brand new construction. FastLoad also likes this approach toloading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure,and then 3) populate it with the latest and greatest data. When dealing with huge volumes of newrows, this process will run much quicker than using MultiLoad to populate the existing table. Anotheroption is to DELETE all the data rows from a populated target table and reload it. This requires lessupdating of the Data Dictionary than dropping and recreating a table. In either case, the result is aperfectly empty target table that FastLoad requires!

    Scenario Three: Sometimes, a customer has a good house already but wants to remodel a portionof it or to add an additional room. This kind of work takes more time than the work described inScenario One. Such work requires some tearing out of existing construction in order to build thenew section. Besides, the builder never knows what he will encounter beneath the surface of theexisting home. So you can easily see that remodeling or additions can take more time than newconstruction. In the same way, existing tables with data may need to be updated by adding newrows of data. To load populated tables quickly with large amounts of data while maintaining the datacurrently held in those tables, you would choose MultiLoad instead of FastLoad. MultiLoad isdesigned for this task but, like renovating or adding onto an existing house, it may take more time.

    How FastLoad WorksWhat makes FastLoad perform so well when it is loading millions or even billions of rows? It isbecause FastLoad assembles data into 64K blocks (64,000 bytes) to load it and can use multiplesessions simultaneously, taking further advantage of Teradata's parallel processing.

    This is different from BTEQ and TPump, which load data at the row level. It has been said, "If youhave it, flaunt it!" FastLoad does not like to brag, but it takes full advantage of Teradata's parallelarchitecture. In fact, FastLoad will create a Teradata session for each AMP (Access ModuleProcessor the software processor in Teradata responsible for reading and writing data to thedisks) in order to maximize parallel processing. This advantage is passed along to the FastLoaduser in terms of awesome performance. Teradata is the only data warehouse loads data, processesdata and backs up data in parallel.

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • FastLoad Has Some LimitsThere are more reasons why FastLoad is so fast. Many of these become restrictions and therefore,cannot slow it down. For instance, can you imagine a sprinter wearing cowboy boots in a race? Ofcourse, not! Because of its speed, FastLoad, too, must travel light! This means that it will havelimitations that may or may not apply to other load utilities. Remembering this short list will save youmuch frustration from failed loads and angry colleagues. It may even foster your reputation as asmooth operator!

    Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will onlyallow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI andNUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build onlydata rows. A secondary index is stored in a subtable block and many times on a different AMP fromthe data row. This would slow FastLoad down and they would have to call it: get ready now,HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, justdrop them. You may easily recreate them after completing the load.

    Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that aredefined with Referential Integrity (RI). This would require too much system checking to preventreferential constraints to a different table. FastLoad only does one table. In short, RI constraints willneed to be dropped from the target table prior to the use of FastLoad.

    Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to payattention to the needs of other tables, which is what Triggers are all about. Additionally, theserequire more than one AMP and more than one table. FastLoad does one table only. Simply ALTERthe Triggers to the DISABLED status prior to using FastLoad.

    Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables thatallow duplicate rows that is when the values in every column are identical. When FastLoad findsduplicate rows, they are discarded. While FastLoad can load data into a multi-set table, FastLoadwill not load duplicate rows into a multi-set table because FastLoad discards duplicate rows!

    Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMPmust be repaired before the load process can be restarted. Other than this, FastLoad can recoverfrom system glitches and perform restarts. We will discuss Restarts later in this chapter.

    Rule #6: No more than one data type conversion is allowed per column during a FastLoad.Why just one? Data type conversion is highly resource intensive job on the system, which requires a"search and replace" effort. And that takes more time. Enough said!

    Three Key Requirements for FastLoad to RunFastLoad can be run from either MVS/ Channel (mainframe) or Network (LAN) host. In either case,FastLoad requires three key components. They are a log table, an empty target table and two errortables. The user must name these at the beginning of each script.

    Log Table: FastLoad needs a place to record information on its progress during a load. It uses thetable called Fastlog in the SYSADMIN database. This table contains one row for every FastLoadrunning on the system. In order for your FastLoad to use this table, you need INSERT, UPDATEand DELETE privileges on that table.

    Empty Target Table: We have already mentioned the absolute need for the target table to beempty. FastLoad does not care how this is accomplished. After an initial load of an empty targettable, you are now looking at a populated table that will likely need to be maintained.

    If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of speedand for less interaction with the Data Dictionary, just to delete all the rows from that table and thenreload it with fresh data. The syntax DELETE . should be used forthis. But sometimes, as in some of our FastLoad sample scripts below (see Figure 4-1), you want todrop that table and recreate it versus using the DELETE option. To do this, FastLoad has the abilityto run the DDL statements DROP TABLE and CREATE TABLE. The problem with putting DDL in

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 2

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • the script is that is no longer restartable and you are required to rerun the FastLoad from thebeginning. Otherwise, we recommend that you have a script for an initial run and a different scriptfor a restart.

    AXSMOD Short for Access Module, this command specifies input protocollike OLEDB or reading a tape from REEL Librarian. Thisparameter is for network-attached systems only. When used, itmust precede the DEFINE command in the script.

    BEGIN LOADING This identifies and locks the FastLoad target table for theduration of the load. It also identifies the two error tables to beused for the load. CHECKPONT and INDICATORS aresubordinate commands in the BEGIN LOADING clause of thescript. CHECKPOINT, which will be discussed below in detail, isnot the default for FastLoad. It must be specified in the script.INDICATORS is a keyword related to how FastLoad handlesnulls in the input file. It identifies columns with nulls and uses abitmap at the beginning of each row to show which fields containa null instead of data. When the INDICATORS option is on,FastLoad looks at each bit to identify the null column. TheINDICATORS option does not work with VARTEXT.

    CREATE TABLE This defines the target table and follows normal syntax. If used,this should only be in the initial script. If the table is being loaded,it cannot be created a second time.

    DEFINE This names the input file and describes the columns in that fileand the data types for those columns.

    DELETE Deletes all the rows of a table. This will only work in the initial runof the script. Upon restart, it will fail because the table is locked.

    DROP TABLE Drops a table and its data. It is used in FastLoad to drop previousTarget and error tables. At the same time, this is not a good thingto do within a FastLoad script since it cancels the ability torestart.

    END LOADING Success! This command indicates the point at which that all thedata has been transmitted. It tells FastLoad to proceed to PhaseII. As mentioned earlier, it can be used as a way to partition dataloads to the same table by omitting if from the script. This is truebecause the table remains empty until after Phase II. You canalso use .End Loading to go to Phase 2. Instead of then beingfinished, Fastload will instead be paused.

    ERRLIMIT Specifies the maximum number of rejected ROWS allowed inerror table 1 (Phase I). This handy command can be a lifesaverwhen you are not sure how corrupt the data in the input file is.The more corrupt it is, the greater the clean up effort requiredafter the load finishes. ERRLIMIT provides you with a safetyvalve. You may specify a particular number of error rows beyondwhich FastLoad will precede to the abort. This provides theoption to restart the FastLoad or to scrub the input data morebefore loading it. Remember, all the rows in the error table arenot in the data table. That becomes your responsibility.

    HELP Designed for online use, the Help command provides a list of allpossible FastLoad commands along with brief, but pertinent tipsfor using them.

    HELP TABLE Builds the table columns list for use in the FastLoad DEFINEstatement when the data matches the Create Table statementexactly. In real life this does not happen very often.

    INSERT This is FastLoad's favorite command! It inserts rows into thetarget table.

    LOGON/LOGOFFor, QUIT

    No, this is not the WAX ON / WAX OFF from the movie, TheKarate Kid! LOGON simply begins a session. LOGOFF ends a

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 3

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • session. QUIT is the same as LOGOFF.NOTIFY Just like it sounds, the NOTIFY command used to inform the job

    that follows that some event has occurred. It calls a user exit orpredetermined activity when such events occur. NOTIFY is oftenused for detailed reporting on the FastLoad job's success.

    RECORD Specifies the beginning record number (or with THRU, theending record number) of the Input data source, to be read byFastLoad. Syntactically, This command is placed before theINSERT keyword. Why would it be used? Well, it enablesFastLoad to bypass input records that are not needed such astape headers, manual restart, etc. When doing a partition dataload, RECORD is used to over-ride the checkpoint.

    SET RECORD Used only in the LAN environment, this command states in whatformat the data from the Input file is coming: FastLoad,Unformatted, Binary, Text, or Variable Text. The default is theTeradata RDBMS standard, FastLoad.

    SESSIONS This command specifies the number of FastLoad sessions toestablish with Teradata. It is written in the script just before thelogon. The default is 1 session per available AMP. The purposeof multiple sessions is to enhance throughput when loading largevolumes of data. Too few sessions will stifle throughput. Toomany will preclude availability of system resources to otherusers. You will need to find the proper balance for yourconfiguration.

    SLEEP Working in conjunction with TENACITY, the SLEEP commandspecifies the amount of time in minutes to wait before retrying tologon and establish all sessions. This situation can occur if all ofthe loader slots are used or if the number of requested sessionsare not available. The default is 6 minutes. For example,suppose that Teradata sessions are already maxed-out whenyour job is set to run. If TENACITY were set at 4 and SLEEP at10, then FastLoad would attempt to logon every 10 minutes forup to 4 hours. If there were no success by that time, all efforts tologon would cease.

    TENACITY Sometimes there are too many sessions already established withTeradata for a FastLoad to obtain the number of sessions itrequested to perform its task or all of the loader slots arecurrently used. TENACITY specifies the amount of time, in hours,to retry to obtain a loader slot or to establish all requestedsessions to logon. The default for FastLoad is "no tenacity",meaning that it will not retry at all. If several FastLoad jobs areexecuted at the same time, we recommend setting theTENACITY to 4, meaning that the system will continue trying tologon for the number of sessions requested for up to four hours.

    Figure 4-1Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only bepopulated should errors occur during the load process. These are required by the FastLoad utility,which will automatically create them for you; all you must do is to name them. The first error table isfor any translation errors or constraint violations. For example, a row with a column containing awrong data type would be reported to the first error table. The second error table is for errorscaused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just oneoccurrence for every UPI. The other occurrences will be stored in this table. However, if the entirerow is a duplicate, FastLoad counts it but does not store the row. These tables may be analyzedlater for troubleshooting should errors occur during the load. For specifics on how you cantroubleshoot, see the section below titled, "What Happens When FastLoad Finishes."

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 4

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Maximum of 15 LoadsThe Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, orFastExports at the same time. This maximum is determined by a value stored in the DBS Controlrecord. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5concurrent jobs.Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata willprotect the amount of system resources available by queuing up the extra load. For example, if themaximum number of jobs are currently running on the system and you attempt to run one more, thatjob will not be started. You should view this limit as a safety control. Here is a tip for rememberinghow the load limit applies: If the name of the load utility contains either the word "Fast" or the word"Load", then there can be only a total of fifteen of them running at any one time.

    FastLoad Has Two PhasesTeradata is famous for its end-to-end use of parallel processing. Both the data and the tasks aredivided up among the AMPs. Then each AMP tackles its own portion of the task with regard to itsportion of the data. This same "divide and conquer" mentality also expedites the load process.FastLoad divides its job into two phases, both designed for speed. They have no fancy names butare typically known simply as Phase 1 and Phase 2. Sometimes they are referred to as AcquisitionPhase and Application Phase.

    Phase 1: AcquisitionThe primary function of Phase 1 is to transfer data from the host computer to the Access ModuleProcessors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradatadoes not take the time to hash each row of data based on the Primary Index. That will be done later.Instead, it does the following:

    When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse theSQL just once. The PE is the Teradata software processor responsible for parsing syntax andgenerating a plan to execute the request. It then opens a Teradata session from the FastLoad clientdirectly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems,it is normally a good idea to limit the number of sessions using the SESSIONS command. Thiscapability is shown below.

    Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transferto an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. Toaccomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs withoutany concern for which AMP gets the block. The result is that data rows arrive on different AMPsthan those they would live, had they been hashed.

    So how do the rows get to the correct AMPs where they will permanently reside? Following thereceipt of every data block, each AMP hashes its rows based on the Primary Index, andredistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMPbut remain unsorted until Phase 1 is complete.

    Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shippingindustry today. How do the key players in this industry handle a parcel? When the shippingcompany receives a parcel, that parcel is not immediately sent to its final destination. Instead, forthe sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from thathub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way thatthe shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP.This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards them totheir true destination. This is like the shipping parcel being sent from a hub city to its destination city!

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 5

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Phase 2: ApplicationFollowing the scenario described above, the shipping vendor must do more than get a parcel to thedestination city. Once the packages arrive at the destination city, they must then be sorted by streetand zip code, placed onto local trucks and be driven to their final, local destinations.

    Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address (i.e.,where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then itwrites the rows into the table space on disks where they will permanently reside. Rows of a tableare stored on the disks in data blocks. The AMP uses the block size as defined when the targettable was created. If the table is Fallback protected, then the Fallback will be loaded after thePrimary table has finished loading. This enables the Primary table to become accessible as soon aspossible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!

    FastLoad CommandsHere is a table of some key FastLoad commands and their definitions. They are used to provideflexibility in control of the load process. Consider this your personal redireference guide! You willnotice that there are only a few SQL commands that may be used with this utility (Create Table,Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additionalfunctions that would slow it down.

    Fastload Sample

    "Mistakes are a part of being human. Appreciate your mistakes for what they are:precious life lessons that can only be learned the hard way. Unless it's a fatalmistake, which, at least, others can learn from." Al Franken

    Fastload is a utility we can use to populate empty tables. Make no mistake about how usefulFastload can be or how fatal errors can occur. The next 2 slides illustrate the essentials neededwhen constructing your fastload script. The first will highlight the important areas about theFastLoad script, and the second slide is a blank copy of the script that you can use to create yourown FastLoad script. Use the flat file we created in the BTEQ chapter to help run the script.

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 6

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Simply copy the following text into notepad, then save it with a name and location that you caneasily remember (we saved ours as c:\temp\Fastload_First_Script.txt).

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 7

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • This script is going to create a table called Employee_Table02. After the table is created, it's goingto take the information from our flat file and insert it into the new table. Afterwards, theEmployee_Table and Employee_Table02 should look identical.

    Executing a FastLoad Script

    "A good plan, violently executed now, is better than a perfect plan next week."- George S. Patton

    We can execute the Fastload utility like we do with BTEQ; however we use the command "fastload"instead of "BTEQ". If we get a return code of 0 then the Fastload worked perfectly. What didGeneral Patton say when his Fastload gave him a return code of 12? I shall return 0!

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 8

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Executing our Fastload script

    Let's see if it worked:

    The load utilities often scare people because there are many things that appear complicated. Inactuality, the load scripts are very simple. Think of FastLoad as:

    Logging onto Teradata

    Defining the Teradata table that you want to load (target table)

    Defining the INPUT data file

    Telling the system to start loading

    Another Sample FastLoad ScriptNormally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. Thereason is that when any of the tables that FastLoad is using are dropped, the script cannot berestarted. It can only be rerun from the beginning. Since FastLoad has restart logic built into it, arestart is normally the better solution if the initial load attempt should fail. However, for purposes of

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 9

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • this example, it shows the table structure and the description of the data being read.

    Let's look at another FastLoad script that you might see in the real world. In the script below, everycomment line is placed inside the normal Teradata comment syntax, [/*. . . . */]. FastLoad and SQLcommands are written in upper case in order to make them stand out. In reality, Teradata utilities,like Teradata itself, are by default not case sensitive. You will also note that when column namesare listed vertically we recommend placing the comma separator in front of the following column.Coding this way makes reading or debugging the script easier for everyone. The purpose of thisscript is to load the Employee_Profile table in the SQL01 database. The input file used for the loadis named EMPS.TXT. Below the sample script each step will be described in detail.

    /* FASTLOAD SCRIPT TO LOAD THE *//* Employee_Profile TABLE *//* Created by Coffing Data Warehousing */

    /* Setup the FastLoad Parameters */

    Since this script does notdrop the target or errortables, it is restartable.This is a good thing forproduction jobs.

    SESSIONS 100; /*or, the number of sessions supportable*/Specify the number ofsessions to logon.

    TENACITY 4; /* the default is no tenacity, means no retry */SLEEP 10; /* the default is 6, means retry in 6 minutes */

    LOGON CW/SQL01,SQL01;

    SHOW VERSIONS; /* Shows the Utility's release number */

    /* Set the Record type to a comma delimited for FastLoad */RECORD 2;

    Tenacity is set to 4 hr; Wait10 Min between retries.

    SET RECORD VARTEXT ",";Starts with the secondrecord.

    /* Define the Text File Layout and Input File */Specifies if record layout isvartext with a commadelimiter.

    DEFINE Employee_No (VARCHAR(10)) , Last_name (VARCHAR(20)) , First_name (VARCHAR(12)) , Salary (VARCHAR(5)) , Dept_No (VARCHAR(6))

    FILE= EMPS.TXT;

    /* Optional to show the layout of the input */SHOW;

    Notice that all fields aredefined as VARCHAR.When using VARTEXT, thefields do not contain thelength field like in theseformats: text, FastLoad, orunformatted.

    /* Begin the Load and Insert Process into the *//* Employee_Profile Table */

    Specifies table to load andlock

    BEGIN LOADING SQL01. Employee Profile ERRORFILESSQLOLEmp Err1.SQL01.Emp Err2 CHECKPOINT 100000;

    Names the error tablesSets the number of rows atwhich to pause & recordprogress in the restart logbefore loading further.

    INSERT INTO SQL01.Employee_Profile VALUES ( :Employee_No ,:Last_name ,:First_name ,:Salary ,:Dept_No );

    Defines the insertstatement to use forloading the rows

    END LOADING;Continues loading processwith Phase 2.

    LOGOFF;Logs off of Teradata.

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 10

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Step One: Before logging onto Teradata, it is important to specify how many sessions you need.The syntax is [SESSIONS {n}].Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commandsin FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlyingcommands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot["."] in front of them and therefore need a semicolon. At this point we chose to have Teradata tell uswhich version of FastLoad is being used for the load. Why would we recommend this? We dobecause as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts mayhave to be revisited.

    Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structurein the DEFINE statement, you must first set the RECORD layout type for the file being passed byFastLoad. We have used VARTEXT in our example with a comma delimiter. The other options areFastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input fileahead of time.

    Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name ofthe flat file to be used as the input FILE, or source file for the load.

    Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to whatyou want loaded. In the BEGIN LOADING statement, the script must name the target table and thetwo error tables for the load. Did you notice that there is no CREATE TABLE statement for the errortables in this script? FastLoad will automatically create them for you once you name them in thescript. In this instance, they are named "Emp_Err1" and "Emp_Err2". Phase 1 uses "Emp_Err1"because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You maycall them whatever you like. At the same time, they must be unique within a database, so using acombination of your userid and target table name helps insure this uniqueness between multipleFastLoad jobs occurring in the same database.In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter.We included [CHECKPOINT 100000]. Although not required, this optional parameter performs avital task with regard to the load. In the old days, children were always told to focus on the three"R's' in grade school ("reading, 'riting, and 'rithmatic"). There are two very different, yet equallyimportant, R's to consider whenever you run FastLoad. They are RERUN and RESTART. RERUNmeans that the job is capable of running all the processing again from the beginning of the load.RESTART means that the job is capable of running the processing again from the point where it leftoff when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allowsFastLoad to resume loading from the first row following the last successful CHECKPOINT. We willlearn more about CHECKPOINT in the section on Restarting FastLoad.

    Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier'sdo when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed toPhase 2 without the END LOADING command.

    In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at thestart of the job, it prevents loading rows as they arrive from different time zones. However, toaccomplish this processing, simply omit the END LOADING on the load job. Then, you can run thesame FastLoad multiple times and continue loading the worktables until the last file is received.Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs intosmaller segments instead of one huge job. This makes FastLoad even faster!Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP orCREATE commands within the script. Additionally, every script is exactly the same with theexception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase2. That's a pretty clever way to do a partitioned type of data load.

    Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be thelast utility command in your script. At this point the table lock is released and if there are no rows inthe error tables, they are dropped automatically. However, if a single row is in one of them, you areresponsible to check it, take the appropriate action and drop the table manually.

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 11

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Checkpoints

    "Once the game is over, the king and the pawn go back in the same box."- Italian Proverb

    Fastload has the ability to save checkpoints during the loading process. Checkpoints are whatenable utilities to pick up from where they left off if the loading process was interrupted in any way.Choosing a correct checkpoint can be easily calculated:

    Determining a Checkpoint

    Add up the approximate byte count of 1 row. The row below adds up to:

    Employee_No: Integer = 4 bytes

    Dept_No: Smallint = 2 bytes

    Last_Name: Char(20) = 20 bytes

    First_Name: VarChar(12) = 14 bytes

    Salary: Decimal(8,2) = 5 bytes

    Total: = 45 bytes

    Now take the total number of bytes per row (45 bytes in our case) and divide 64,000 by thatnumber. (64,000 / 45 = 1422.2) The number you come up with is the number of rows that will bebundled together in each data block set.

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 12

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Setting the checkpoint to 1000 would be pointless because the computer would take a checkpointevery data block! A 1,000,000 checkpoint would work well here, sending approximately 703 datablocks between checkpoints.

    Converting Data Types with FastLoad

    "You don't drown by falling in the water; you drown by staying in the water."- Edwin Louis Cole

    Converting data is easy. Just define the input data types in the input file. Then, FastLoad willcompare that to the column definitions in the Data Dictionary and convert the data for you! But thecardinal rule is that only one data type conversion is allowed per column. In the example below,notice how the columns in the input file are converted from one data type to another simply byredefining the data type in the CREATE TABLE statement.

    FastLoad allows six kinds of data conversions. Here is a chart that displays them:IN FASTLOAD YOU MAY CONVERT

    CHARACTER DATA TO NUMERIC DATAFIXED LENGTH DATA TO VARIABLE LENGTH DATACHARACTER DATA TO DATEINTEGERS TO DECIMALSDECIMALS TO INTEGERSDATE TO CHARACTER DATANUMERIC DATA TO CHARACTER DATA

    Figure 4-5When we said that converting data is easy, we meant that it is easy for the user. It is actually quiteresource intensive, thus increasing the amount of time needed for the load. Therefore, if speed isimportant, keep the number of columns being converted to a minimum!

    A FastLoad Conversion ExampleThis next script example is designed to show how FastLoad converts data automatically when theINPUT data type differs from the Target Teradata Table data type. The actual script is in the leftcolumn and our comments are on the right.

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 13

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • LOGON CDW/jones, cowboys;LOGON TO TERADATA

    CREATE TABLE SQL01.Department(

    Dept_No INTEGER ,Dept_Name CHAR(20) ,Dept_Start_Date DATE ,Dept_Finish_Date DATE ,Dept_Name CHAR(20) )

    UNIQUE PRIMARY INDEX ( Dept_No );

    NOTICE THAT DEPT_NO IS AN INTEGERHERE IN THE TARGET TABLE, BUT ACHAR(4) IN THE FLAT FILE DEFINITIONBELOW - CHAR(4) will convert to integerThese date columns are DATE data typewill be converted from CHAR(10)

    DEFINE Department_No (CHAR(4)) ,Department Name (CHAR(20)) ,SDate (CHAR(10)) ,FDate (CHAR(10))

    CHAR(4) converts to INTEGERCharacter dates in different style in thefile:

    CHAR(10) comes in as YYYY-MM-DDCHAR(10) comes in as MM/DD/YYYY

    FILE= Dept_Flat.txt;DEFINES THE FLAT FILE AND NAMEINPUT FILE

    BEGIN LOADING SQL01.Department ERRORFILES SQL01.Dept_Err1, SQL01.Dept_Err2 CHECKPOINT 15000;

    Names the target table and error tables,don't let the word "errorfiles" fool you,they are tables.

    Will check point every 15000 rows

    The INSERT does automatic conversion:

    INSERT INTO SQL01.DepartmentVALUES ( :Department_No ,:Department_Name ,:SDate ,:FDate(DATE, FORMAT 'mm/dd/yyyy') );

    Converts character to integer

    Converts character from ANSI date toDATE Converts character as other date toDATE by describing the input format in thefile. Without the format, this row goes intothe error table.

    END LOADING;LOGOFF;

    Figure 4-5

    When You Cannot RESTART FastLoadThere are two types of FastLoad scripts: those that you can restart and those that you cannotwithout modifying the script. If any of the following conditions are true of the FastLoad script that youare dealing with, it is NOT restartable:

    The Error Tables are DROPPED

    The Target Table is DROPPED

    The Target Table is CREATED

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 14

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Can you tell from the following sample FastLoad script why it is not restartable?

    LOGON CDW/tommy, cowboys;LOGON TO TERADATA

    DROP TABLE SQL01.Department;DROP TABLE SQL01.Dept_Err1;DROP TABLE SQL01.Dept_Err2;

    DROPS TARGET TABLE ANDERROR TABLES

    CREATE TABLE SQL01.Department(Dept_No INTEGER

    ,Dept_Name CHAR(20))

    UNIQUE PRIMARY INDEX (Dept_No);

    CREATES THE DEPARTMENTTARGET TABLE IN THE SQL01DATABASE IN TERADATA.

    DEFINE Department_No (INTEGER) ,Department_Name (CHAR(20))

    FILE= Dept_Flat.txt;

    DEFINES THE FLATFILESTRUCTURE AND NAME OFTHE INPUT FILE.

    BEGIN LOADING SQL01.Department ERRORFILES SQL01.Dept_Err1, SQL01.Dept_Err2 INDICATORS CHECKPOINT 5000;

    SPECIFIES TABLE TO LOADAND ERROR TABLES.INDICATORS are defined on theBEGIN

    INSERT INTO SQL01.Department VALUES(:Department_No,:Department_Name);

    INSERT COMMAND

    /* since data file and table are the same, self-definingINSERT would also work:INSERT INTO SQL01.Department.*; */

    Optional use of self-definingINSERT, the DEFINE wouldcontain only the FILE=

    END LOADING;LOGOFF;

    START PHASE 2

    LOGOFF

    Figure 4-7Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience asystem reset or some glitch that stops the job one half way through it. Maybe the mainframe wentdown. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probablyjust RERUN the job for small data loads.However, when you are loading a billion rows, this is not a good idea because it wastes time. So themost common way to deal with these situations is simply to RESTART the job. But what if thenormal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rowsloaded? In that case, you might want to make sure that the job is totally restartable. Let's see howthis is done.

    When You Can RESTART FastLoadIf all of the following conditions are true, then FastLoad is ALWAYS restartable:

    The Error Tables are NOT DROPPED in the script

    The Target Table is NOT DROPPED in the script

    The Target Table is NOT CREATED in the script

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 15

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • You have defined a checkpoint

    So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have atable whose data changes so much that you typically drop it monthly and build it again. Let's goback to the script we just reviewed above and see how we can break it into the two parts necessaryto make it fully RESTARTABLE. It is broken up below.

    STEP ONE: Run the following SQL statements in Queryman or BTEQ beforeyou start FastLoad:

    DROP TABLE SQL01.Department;DROP TABLE SQL01.Dept_Err1;DROP TABLE SQL01.Dept_Err2;

    DROPS TARGET TABLE ANDERROR TABLES

    CREATE TABLE SQL01.Department(Dept_No INTEGER

    ,Dept_Name CHAR(20))

    UNIQUE PRIMARY INDEX (Dept_No);

    CREATES THE DEPARTMENTTARGET TABLE IN THE SQL01DATA BASE IN TERADATA

    Figure 4-8First, you ensure that the target table and error tables, if they existed previously, are blown away. Ifthere had been no errors in the error tables, they would be automatically dropped. If these tables didnot exist, you have not lost anything. Next, if needed, you create the empty table structure neededto receive a FastLoad.

    Step Two: Run the FastLoad script

    This is the portion of the earlier script that carries out these vital steps:

    Defines the structure of the flat file

    Tells FastLoad where to load the data and store the errors

    Specifies the checkpoint so a RESTART will not go back to row one

    Loads the data

    If these are true, all you need do is resubmit the FastLoad job and it starts loading data again withthe next record after the last checkpoint. Now, with that said, if you did not request a checkpoint, theoutput message will normally indicate how many records were loaded.

    You may optionally use the RECORD command to manually restart on the next record after the oneindicated in the message.

    Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGINLOADING and END LOADING. It will then restart right into Phase 2.

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 16

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • What Happens When FastLoad FinishesYou Receive an Outcome Status

    The most important thing to do is verify that FastLoad completed successfully. This is accomplishedby looking at the last output in the report and making sure that it is a return code or status code ofzero (0). Any other value indicates that something wasn't perfect and needs to be fixed.The locks will not be removed and the error tables will not be dropped without a successfulcompletion. This is because FastLoad assumes that it will need them for its restart. At the sametime, the lock on the target table will not be released either. When running FastLoad, yourealistically have two choices once it is started. First choice is that you get it to run to a successfulcompletion, or lastly, rerun it from the beginning. As you can imagine, the best course of action isnormally to get it to finish successfully via a restart.

    You Receive a Status ReportWhat happens when FastLoad finishes running? Well, you can expect to see a summary report onthe success of the load. Following is an example of such a report.

    Line 1: TOTAL RECORDS READ = 1000000Line 2: TOTAL ERRORFILE1 = 50Line 3: TOTAL ERRORFILE2 = 0Line 4: TOTAL INSERTS APPLIED = 999950Line 5: TOTAL DUPLICATE ROWS = 0

    Figure 4-9

    The first line displays the total number of records read from the input file. Were all of them loaded?Not really. The second line tells us that there were fifty rows with constraint violations, so they werenot loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows thatthere were zero entries into the second error table, indicating that there were no duplicate UniquePrimary Index violations. Line 4 shows that there were 999950 rows successfully loaded into theempty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, theduplicates would only have been counted. They are not stored in the error tables anywhere. WhenFastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total thenumber of records read in line 1.

    Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally beduplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint(quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number ofrows will be sent to the AMPs again because the restart starts on the next record after the valuestored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and someof the consecutive rows are sent a second time. These will be caught as duplicate rows after thesort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISETtable. It assumes they are duplicates because of this logic.

    You can TroubleshootIn the example above, we know that the load was not entirely successful. But that is not enough.Now we need to troubleshoot in order identify the errors and correct them. FastLoad generates twoerror tables that will enable us to find the culprits. The first error table, which we named Errortable1,contains just three columns: The column ErrorCode contains the Teradata FastLoad code numberto a corresponding translation or constraint error. The second column, named ErrorField, specifieswhich column in the table contained the error. The third column, DataParcel, contains the row withthe problem. Errortable2 contains the same columns as the target table.

    As a user, you can select from either error table. To check errors in Errortable1 you would use thissyntax:

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 17

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • SELECT DISTINCT ErrorCode, Errorfieldname FROM Errortable1;

    Corrected rows may be inserted to the target table using another utility that does not require anempty table.

    To check errors in Errortable2 you would the following syntax:SELECT * FROM Errortable2;

    The definition of the second error table is exactly the same as the target table with all the samecolumns and data types.

    Restarting FastLoad: A More In-Depth Look

    "Never engage in a battle of wits against an unarmed person."- Anonymous

    How the CHECKPOINT Option WorksCHECKPOINT option defines the points in a load job where the FastLoad utility pauses to recordthat Teradata has processed a specified number of rows. When the parameter "CHECKPOINT [n]"is included in the BEGIN LOADING clause the system will stop loading momentarily at incrementsof [n] rows.At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly.Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log containsa row for all currently running FastLoad jobs with the last successfully reached checkpoint for eachjob. Should an error occur that requires the load to restart, FastLoad will merely go back to the lastsuccessfully reported checkpoint prior to the error. It will then restart from the record immediatelyfollowing that checkpoint and start building the next block of data to load. If such an error occurs inPhase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row. If this is notdesirable, the RECORD statement can be used to force a restart at the next record after the failure.

    Restarting with CHECKPOINTSometimes you may need to restart FastLoad. If the FastLoad script requests a CHECKPOINT(other than 0), then it is restartable from the last successful checkpoint. Therefore, if the job fails,simply resubmit the job. Here are the two options: Suppose Phase 1 halts prematurely; the DataAcquisition phase is incomplete. Resubmit the FastLoad script. FastLoad will begin from RECORD1 or the first record past the last checkpoint. If you wish to manually specify where FastLoad shouldrestart, locate the last successful checkpoint record by referring to the SYSADMIN.FASTLOG table.To manually specify where a restart will start, use the RECORD command. Normally, it is notnecessary to use the RECORD command let FastLoad automatically determine where to restartfrom.

    If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We knowthat the error is in the Application Phase. In this case, resubmit the FastLoad script with only theBEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of thetarget table.

    Restarting without CHECKPOINTWhen a failure occurs and the FastLoad Script did not utilize the CHECKPOINT (i.e.,CHECKPOINT 0), one procedure is to DROP the target table and error tables and rerun the job.Here are some other options available to you:

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 18

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

  • Resubmit job again and hope there is enough PERM space for all the rows already sent tothe unsorted target table plus all the rows that are going to be sent again to the same targettable. Other than using space, these rows will be rejected as duplicates. As you canimagine, this is not the most efficient way since it processes many of the same rows twice.

    1.

    If CHECKPOINT wasn't specified, then CHECKPOINT defaults to 0 (no checkpoint). Youcan perform a manual restart using the RECORD statement. If the output print file showsthat record 100000 was read, use something like the following command: [RECORD100001;]. This statement will skip records 1 through 100000 and resume on record 100001.

    2.

    Using INMODs with FastLoadWhen you find that FastLoad does not read the file type you have or you wish to control the accessfor any reason, then it might be desirable to use an INMOD. An INMOD (Input Module), is fullycompatible with FastLoad in either mainframe or LAN environments, providing that the appropriateprogramming languages are used. However, INMODs replace the normal mainframe DDNAME orLAN defined FILE name with the following statement: DEFINE INMOD=. For amore indepth discussion of INMODs, see the chapter of this book titled, "INMOD Processing".

    Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 19

    Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

    Table of Contents Chapter 3: FastLoad Why it is Called "FAST" Load How FastLoad Works FastLoad Has Some Limits Three Key Requirements for FastLoad to Run Maximum of 15 Loads

    FastLoad Has Two Phases Phase 1: Acquisition Phase 2: Application

    FastLoad Commands Fastload Sample Executing a FastLoad Script Another Sample FastLoad Script Checkpoints Converting Data Types with FastLoad A FastLoad Conversion Example When You Cannot RESTART FastLoad When You Can RESTART FastLoad Step Two: Run the FastLoad script

    What Happens When FastLoad Finishes You Receive an Outcome Status You Receive a Status Report You can Troubleshoot

    Restarting FastLoad: A More In-Depth Look How the CHECKPOINT Option Works Restarting with CHECKPOINT

    Restarting without CHECKPOINT Using INMODs with FastLoad