Teradata Day 1

8/12/2019 Teradata Day 1

1/134

1 / 25 May 2009 / EDS INTERNAL

Teradata TrainingHema Venkatesh Ramasamy

HP Global Soft Private Ltd.


2/134


Teradata ProductOverview


3/134

3 / 25 MAY 2009 / EDS INTERNAL

Teradata Training

After completing this module, you will be able to:

Describe the purpose of the Teradata product

Give a brief history of the product

List major architectural features of the product


4/134


Teradata Training

What is Teradata?Teradata is a Relational Database Management System (RDBMS).

Designed to run the worlds largest commercial databases. Preferred solution for enterprise data warehousing

Executes on UNIX MP-RAS and Windows 2000 operating systems

Compliant with ANSI industry standards

Runs on a single or multiple nodes

Acts as a database server to client applications throughout the enterprise

Uses parallelism to manage terabytes of data

Capable of supporting many concurrent users from various client platforms (over aTCP/IP or IBM channel connection).

Win XPWin 2000

UNIXClient

MainframeClient

TeradataDATABASE


5/134


Teradata Training

Teradata A Brief History

1979 Teradata Corp founded in Los Angeles, California

Development begins on a massively parallel computer

1982 YNET technology is patented

1984 Teradata markets the first database computer DBC/1012 First system purchased by Wells Fargo Bank of Cal. Total revenue for year - $3 million

1987 First public offering of stock

1989 Teradata and NCR partner on next generation of DBC

1991 NCR Corporation is acquired by AT&T Teradata revenues at $280 million

1992 Teradata is merged into NCR

1996 AT&T spins off NCR Corp. with Teradata product

1997 Teradata database becomes industry leader in data warehousing

2000 100+ Terabyte system in production

2002 Teradata V2R5 released 12/2002; major release including features such as PPI, roles and profiles, multi-value compression, and more.

2003 Teradata V2R5.1 released 12/2003; includes UDFs, BLOBs, CLOBs, and more.


6/134


Teradata Training

How Large is a Trillion?

1 Kilobyte = 103 = 1000 bytes1 Megabyte = 106 = 1,000,000 bytes

1 Gigabyte = 109 = 1,000,000,000 bytes1 Terabyte = 1012 = 1,000,000,000,000 bytes1 Petabyte = 1015 = 1,000,000,000,000,000 bytes

1 million seconds = 11.57 days1 billion seconds = 31.6 years1 trillion seconds = 31,688 years

1 million inches = 15.7 miles1 trillion inches = 15,700,000 miles (30 roundtrips to the moon)

1 million square inches = .16 acres = .0002 square miles1 trillion square inches = 249 square miles (larger than Singapore)

$1 million = < $ .01 for every person in U.S.$1 billion = $ 3.64 for every person is U.S.$1 trillion = $ 3,636 for every person in U.S.


7/134 7 / 25 MAY 2009 / EDS INTERNAL

Teradata Training

Designed for Todays Business

Teradatas Charter meets the business needs of today

and tomorrow with:

Relational databasestandard for database design

Enormous capacitybillions of rows, terabytes ofdata

High performance parallel processing

Single database server for multiple clientsSingleVersion of the Truth

Network and mainframe connectivity

Industry standard access languageStructuredQuery Language (SQL)

Manageable growth via modularity

Fault tolerance at all levels of hardware andsoftware

Data integrity and reliability


8/134 8 / 25 MAY 2009 / EDS INTERNAL

Teradata Training

Evolution of Data Processing

Type Example Number of Rows ResponseAccessed Time

OLTP Update a checking account Small Secondsto reflect a deposit

DSS How many child size blue Large Seconds or minutesjeans were sold acrossall of the our Eastern storesin the month of March?

OLCP Instant creditHow much Small to moderate; Minutescredit can be extended to possibly acrossthis person? multiple databases

OLAP Show the top ten selling Large number of Seconds or minutesitems across all stores detail rows orfor 2003. moderate number

of summary rows

TRADITIONAL

Tod

ay

The need to process DSS, OLCP, and OLAP type requests across anenterprise and its data leads to the concept of a Data Warehouse.


9/134


10/13410 / 25 MAY 2009 / EDS INTERNAL

Teradata Training

Data Warehouse Usage Evolution

STAGE 1REPORTING

WHAThappened?

STAGE 2ANALYZING

WHYdid it happen?

STAGE 3PREDICTING

WHYwill it happen?

PrimarilyBatch

Increase inAd Hoc

Queries

AnalyticalModeling

Grows

Batch Ad Hoc Analytics

Continuous Update &Time Sensitive Queries

Become ImportantContinuous Update

Short Queries

STAGE4OPERATIONALIZING

WHATIS Happening?

STAGE 5ACTIVE WAREHOUSING

MAKINGit happen!

Event-Based

Triggering

Event BasedTriggering

Takes Hold


11/134


Teradata Training

What is Active Data Warehousing?

Data Warehousing is the timely, integrated, logically consistent store of

detailed data available for analytic business decision making.

Primarily batch feeds and updates Ad hoc queries to support strategic decisions that return in minutes and maybe

hours

Active Data Warehousing is the timely, integrated, logically consistentstore of detailed data available for strategic, tactical driven businessdecisions.

Timely updatesclose to real time Short, tactical queries that return in seconds Event driven activity plus strategic queries

Business requirements for an ADW (Active Data Warehouse)?

Performanceresponse within seconds Scalabilitysupport for large data volumes, mixed workloads, and concurrent

users Availability7 x 24 x 365

Data FreshnessAccurate, up to the minute, data


12/134


Teradata Training

Teradatas Competitive Advantages

Unlimited, Proven Scalabilityamount of data and number of users; allowsfor an enterprise wide model of the data.

Unlimited Parallelismparallel access, sorts, and aggregations.

Mature Optimizerhandles complex queries, up to 64 joins per query, ad-hocprocessing.

Models the Business3NF, robust view processing, & provides star schemacapabilities.

Provides a single version of the truth.

Low TCO (Total Cost of Ownership)ease of setup, maintenance, &administration; no re-orgs, lowest disk to data ratio, and robust expansionutility (reconfig).

High Availabilityno single point of failure.

Parallel Load and Unload utilitiesrobust, parallel, and scalable load andunload utilities such as FastLoad, MultiLoad, TPump, and FastExport.


13/134


Teradata Training

Teradata Manageability

Things a Teradata DBAneverhas to do!

Reorganize data or index space

Pre-allocate table/index space, format partitions

Pre-prepare data for loading (convert, sort, split, etc.)

Ensure that queries run in parallel

Unload/reload data spaces due to expansion

Design, implement and support partition schemes.

Write or run programs to split the input source files into partitions forloading

A DBA knows that if the data doubles, the system canexpand easily to accommodate it.

The command and workload for creating a table that willhave 100,000 rows is the same as creating a table that willhave 1,000,000,000 rows!


14/134


Teradata Basics


15/134


Teradata Training


List and describe the major components of the Teradataarchitecture.

Describe how the components interact to manageincoming and outgoing data

List 5 types of Teradata database objects


16/134


Teradata Training

Teradata Storage Architecture

Notes:

The Parsing Enginedispatchesrequest to insert a row.

The Message Passing Layerinsures that a row gets to theappropriate AMP (Access ModuleProcessor).

The AMPstores the row on itsassociated (logical) disk.

An AMP manages a logical orvirtual diskwhich is mapped tomultiple physical disks in a diskarray.

Teradata

AMP 4AMP 3AMP 1 AMP 2

ParsingEngine(s)

Message Passing Layer

18

254

41

1290

75

80

32 667

25

Records From Client (in random sequence)

2 32 67 12 90 6 54 75 18 25 80 41


17/134


Teradata Training

Teradata Retrieval Architecture

Notes:

The Parsing Enginedispatches arequest to retrieve one or morerows.

The Message Passing Layerinsures that the appropriateAMP(s) are activated.

The AMP(s)locate and retrievedesired row(s) in parallel access.

Message Passing Layerreturns toretrieved rows to PE.

The PEreturns row(s) torequesting client application.

Teradata

AMP 4AMP 3AMP 1 AMP 2

ParsingEngine(s)


18

254

41

1290

75

80

32 667

25

Rows retrieved from table

2 32 67 12 90 6 54 75 18 25 80 41


18/134


Teradata Training

Multiple Tables on Multiple AMPs

EMPLOYEE RowsDEPARTMENT RowsJOB Rows

EMPLOYEE Table DEPARTMENT Table JOB Table

Parsing Engine

AMP #1 AMP #2 AMP #3 AMP #4


Notes:

Some rows from each table maybe found on each AMP.

Each AMP may have rows fromall tables.

Ideally, each AMP will holdroughly the same amount ofdata.





19/134


Teradata Training

Linear Growth and Expandability

AMP

AMP

Parsing

Engine

AMP

Disk

Disk

Disk

ParsingEngine

ParsingEngine

Notes:

Teradata is a linearlyexpandable RDBMS.

Components may be added asrequirements grow.

Linear scalability allows forincreased workload withoutdecreased throughput.

Performance impact of addingcomponents is shown below.

USERS AMPs DATA Performance

Same Same Same SameDouble Double Same SameSame Double Double SameSame Double Same Double


20/134


Teradata Training

Teradata Objects

There are eight types of objects which may be found in a Teradata database/user.

Tablesrows and columns of data

Viewspredefined subsets of existing tablesMacrospredefined, stored SQL statements

TriggersSQL statements associated with a table

Stored Proceduresprogram stored within Teradata

Join and Hash Indexesseparate index structures stored as objects within a database

Permanent Journalstable used to store before and/or after images for recovery

DEFINITIONS OFALL DATABASE

OBJECTS

DD/D

These objects are created, maintainedand deleted using Structured QueryLanguage (SQL).

Object definitions are stored in theData Dictionary / Directory (DD/D).

DATABASE or USER

TABLE 2 TABLE 3TABLE 1

VIEW 2 VIEW 3VIEW 1

MACRO 2 MACRO 3MACRO 1

TRIGGER 2 TRIGGER 3TRIGGER 1

Stored Procedure 1 Stored Procedure 2 Stored Procedure 2

Join/Hash Index 1 Join/Hash Index 2 Join/Hash Index 3

Permanent Journal These aren't directly accessed by users.


21/134


Teradata Training

The Data Dictionary Directory (DD/D)

The DD/D ...

is an integrated set of system tables

contains definitions of and information about all objects in the system

is entirely maintained by the RDBMS

is data about the data or metadata

is distributed across all AMPs like all tables

may be queried by administrators or support staff

is accessed via Teradata supplied views

Examples of DD/D views:

DBC.Tables - information about all tables

DBC.Users - information about all users

DBC.AllRights - information about access rights

DBC.AllSpace - information about space utilization


22/134


Teradata Training

Structured Query Language (SQL)

SQL is a query language for Relational Database Systems. A fourth-generation language A set-oriented language A non-procedural language

(e.g, doesnt have IF, GO TO, DO, FOR NEXT, or PERFORM statements)

SQL consists of:

Data Definition Language (DDL) Defines database structures (tables, users, views, macros, triggers, etc.)

CREATE DROP ALTER

Data Manipulation Language (DML) Manipulates rows and data values

SELECT INSERT UPDATE DELETE

Data Control Language (DCL) Grants and revokes access rights

GRANT REVOKE

Teradata SQL also includes Teradata Extensions to SQL

HELP SHOW EXPLAIN CREATE MACRO


23/134

23 / 25 MAY 2009 / EDS INTERNALTeradata Training

CREATE TABLE Example of DDL

CREATE TABLE Employee

,FALLBACK(employee_number INTEGER NOT NULL,manager_emp_number INTEGER,dept_number SMALLINT,job_code INTEGER COMPRESS,last_name CHAR(20) NOT NULL,first_name VARCHAR (20),hire_date DATE FORMAT 'YYYY-MM-DD'

,birth_date DATE FORMAT 'YYYY-MM-DD',salary_amount DECIMAL (10,2))UNIQUE PRIMARY INDEX (employee_number),INDEX (dept_number);

Other DDL Examples

CREATE INDEX (job_code) ON Employee ;

DROP INDEX (job_code) ON Employee ;

DROP TABLE Employee ;


24/134


Views

Views are pre-defined subsets of existing tables consisting of specified columns and/orrows from the table(s).

A single table view:

is a window into an underlying table

allows users to read and update a subset of the underlying table

has no data of its own

MANAGEREMPLOYEE EMP DEPT JOB LAST FIRST HIRE BIRTH SALARYNUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT

1006 1019 301 312101 Stein John 861015 631015 3945000

1008 1019 301 312102 Kanieski Carol 870201 680517 3925000

1005 0801 403 431100 Ryan Loretta 861015 650910 4120000

1004 1003 401 412101 Johnson Darlene 861015 560423 4630000

1007 1005 403 432101 Villegas Arnando 870102 470131 5970000

1003 0801 401 411100 Trader James 860731 570619 4785000

EMPLOYEE (Table)

PK FK FK FK

EMP NO DEPT NO LAST NAME FIRST NAME HIRE DATE

1005 403 Villegas Arnando 870102801 403 Ryan Loretta 861015

Emp_403 (View)


25/134


Multi-Table Views

A multi-table view allows users to access data from multiple tables as if it were in a singletable. Multi-table views are also called join views. Join views are used for reading only,

not updating.EMPLOYEE (Table)

1006 1019 301 312101 Stein John 861015 631015 3945000

1008 1019 301 312102 Kanieski Carol 870201 680517 3925000

1005 0801 403 431100 Ryan Loretta 861015 650910 41200001004 1003 401 412101 Johnson Darlene 861015 560423 4630000


1003 0801 401 411100 Trader James 860731 570619 4785000


PK FK FK FK

MANAGERDEPT DEPARTMENT BUDGET EMPNUMBER NAME AMOUNT NUMBER

501 marketing sales 80050000 1017

301 research and development 46560000 1019

302 product planning 22600000 1016403 education 93200000 1005

402 software support 30800000 1011

401 customer support 98230000 1003

201 technical operations 29380000 1025

PK FK

DEPARTMENT (Table)

LAST DEPARTMENT

NAME NAME

Stein research & developmentKanieski research & developmentRyan educationJohnson customer supportVillegas educationTrader customer support

EmpDept (View)

"Joined Together"


26/134


SELECT Example of DML

The SELECT statement is used to retrieve data from tables.

Who was hired on October 15, 1986?

1006 1019 301 312101 Stein John 861015 631015 3945000

1008 1019 301 312102 Kanieski Carol 870201 680517 3925000

1005 0801 403 431100 Ryan Loretta 861015 650910 4120000

1004 1003 401 412101 Johnson Darlene 861015 560423 4630000


1003 0801 401 411100 Trader James 860731 570619 4785000

EMPLOYEE (partial listing)


PK FK FK FK

SELECT Last_Name,First_Name

FROM EmployeeWHERE Hire_Date = '1986-10-15';

Result

LAST

NAME

Stein

Ryan

Johnson

FIRST

NAME

John

Loretta

Darlene


27/134


The JOIN OperationA join operation is used when the SQL query requires information from multipletables.

Who works in Research and Development?EMPLOYEE

1006 1019 301 312101 Stein John 861015 631015 3945000

1008 1019 301 312102 Kanieski Carol 870201 680517 3925000

1005 0801 403 431100 Ryan Loretta 861015 650910 41200001004 1003 401 412101 Johnson Darlene 861015 560423 4630000


1003 0801 401 411100 Trader James 860731 570619 4785000


PK FK FK FK

MANAGERDEPT DEPARTMENT BUDGET EMPNUMBER NAME AMOUNT NUMBER

501 marketing sales 80050000 1017

301 research and development 46560000 1019302 product planning 22600000 1016

403 education 93200000 1005

402 software support 30800000 1011

401 customer support 98230000 1003

201 technical operations 29380000 1025

PK FK

DEPARTMENTResult

LASTNAME

Stein

Kanieski

FIRSTNAME

John

Carol


28/134


Macros Teradata SQL Extension

A MACRO is a predefined set of SQL statements which is logically stored in a database.

Macros may be created for frequently occurring queries of sets of operations.Macros have many features and benefits:

Simplify end-user access

Control which operations may be performed by users

May accept user-provided parameter values

Are stored on the RDBMS, thus available to all clients

Reduces query size, thus reduces LAN/channel traffic Are optimized at execution time

May contain multiple SQL statements

To create a macro:

CREATE MACRO Customer_List AS (SELECT customer_name FROM Customer;);

To execute a macro:

EXEC Customer_List;

To replace a macro:

REPLACE MACRO Customer_List AS

(SELECT customer_name, customer_number FROM Customer;);


29/134


HELP Commands Teradata SQL Extension

Databases and Users:

HELP DATABASE Customer_Service ;

HELP USER Dave_Jones ;

Tables, Views, and Macros:

HELP TABLE Employee ;

HELP VIEW Emp;

HELP MACRO Payroll_3;

HELP COLUMN Employee.*;

Employee.last_name;

Emp.* ;

Emp.last;

HELP INDEX Employee;

HELP STATISTICS Employee;

HELP CONSTRAINT Employee.over_21;


30/134


31/134

l d


32/134


EXPLAIN Facility Teradata SQLExtension

The EXPLAIN modifier in front of any SQL statement generates an English translation of the Parsers

plan.

The request is fully parsed and optimized, but not actually executed.

EXPLAIN returns:

Text showing how a statement will be processed (a plan)

An estimate of how many rows will be involved

A relative cost of the request (in units of time)

This information is useful for: predicting row counts

predicting performance

testing queries before production

analyzing various approaches to a problem EXPLAIN

EXPLAIN SELECT last_name, department_number FROM Employee ;

Explanation (partial):

3) We do an all-AMPs RETRIEVEstep from CUSTOMER_SERVICE.Employee by way of an all-rowsscanwith no residual conditions into Spool 1, which is built locally on the AMPs. The size ofSpool 1 is estimated to be 24 rows. The estimated time for this step is 0.15 seconds.


33/134


Teradata Features Review Designed for decision-support and tactical queries

Ideal for data warehouse applications

Parallelism makes possible access to very large tables Performance increase is linear as components are added

Uses standard SQL

Runs as a database server to client applications

Runs on multiple hardware platforms

Open architectureuses industry standard components

Win XPWin 2000

UNIXClient

MainframeClient

TeradataDATABASE


34/134


Teradata RDBMSArchitecture


35/134



Describe the purpose of the PE and the AMP.

Describe the overall RDBMS parallel architecture.

Describe the relationship of the RDBMS to its client

side applications.


36/134


37/134


Teradata Functional Overview

Teradata RDBMS


Channel-Attached System

LAN

Network-Attached System

ParsingEngine

ParsingEngine

AMP

ClientApplication

CLI or ODBC

MTDP

MOSI

ClientApplication

CLI

TDP

AMP AMP AMP

Channel


38/134


Channel-Attached Client Software Overview

Client Application- Your own application(s)- Teradata utilities (BTEQ, etc.)

CLI (Call-Level Interface) Service Routines- Request and Response Control- Parcel creation and blocking/unblocking- Buffer allocation and initialization

TDP (Teradata Director Program)- Session balancing across multiple PEs- Insures proper message routing to/from RDBMS- Failure notification (application failure, Teradata restart)

Channel (ESCON or Bus/Tag)

Channel-Attached System

TDP

ClientApplication

CLI

ClientApplication

CLI

Parsing

Engine

Parsing

Engine

Host ChannelAdapter PBSA or PBCA

N t k Att h d Cli t S ft


39/134


Network-Attached Client SoftwareOverview

CLI (Call Level Interface)- Library of routines for blocking/unblocking requests and responses to/from the

RDBMS

ODBC (Open Database Connectivity) Driver

- Uses open standards-based ODBC interface to provide client applications access toTeradata

MTDP (Micro Teradata Director Program)- Library of session management routines

MOSI (Micro Operating System Interface)- Library of routines providing OS independent interface

LAN-Attached Servers

LAN (TCP/IP)ClientApplication(ex., FastLoad)

CLI

MTDP

MOSI

Client

Application(ex., SQLAssistant)

ODBC

MTDP

MOSI

ParsingEngine

ParsingEngine

Gateway Software (tgtw)

ClientApplication(ex., BTEQ)

CLIMTDP

MOSI

Ethernet Adapter


40/134


The Parsing Engine

The Parsing Engine is responsible for:

Managing individual sessions (upto 120)

Parsing and Optimizing your SQLrequests

Dispatching the optimized plan tothe AMPs

Input conversion (EBCDIC / ASCII)- if necessary

Sending the answer set response

back to the requesting client

Answer Set Response

ParsingEngine

SQL Request

Parser

Optimizer

Dispatcher


AMP AMP AMP AMP


41/134



Answer Set Response

ParsingEngine

SQL Request


(PDE and BYNET)

AMP AMP AMP AMP

The Message Passing Layer isresponsible for:

Carrying messages between the AMPsand PEs

Point-to-Point, Multi-Cast, andBroadcast communications

Merging answer sets back to the PE

Making Teradata parallelism possible

The Message Passing Layer is acombination of:

Parallel Database Extensions (PDE)Software

BYNET Software

BYNET Hardware for MPP systems


42/134


The Access Module Processor (AMP)

Answer Set Response

ParsingEngine

SQL Request


AMP AMP AMP AMP

AMPs store and retrieve rows to and from disk

The AMPs are responsible for:

Finding the rows requested

Lock management

Sorting rows

Aggregating columns

Join processing

Output conversion and formatting

Creating answer set for client

Disk space management

Accounting

Special utility protocols

Recovery processing


43/134


44/134


Storing andAccessing DataRows


45/134


46/134

Creating a Primary Index


47/134


Creating a Primary Index A Primary Index is defined at table creation.

It may consist of a single column, or a combination of columns

Limit of 16 columns with V2R4.1 and prior releases

Limit of 64 columns with V2R5.

CREATE TABLE sample_1(col_a INTEGER,col_b INTEGER

,col_c INTEGER)UNIQUE PRIMARY INDEX (col_b);

UPIIf the index choice of column(s) is unique,we call this a UPI (Unique Primary Index).

A UPI choice will result in even distributionof the rows of the table across all AMPs.

CREATE TABLE sample_2(col_x INTEGER

,col_y INTEGER,col_z INTEGER)

PRIMARY INDEX (col_x);

NUPI If the index choice of column(s) isnt unique,we call this a NUPI (Non-Unique Primary

Index).A NUPI choice will result in evendistribution of the rows of the tableproportional to the degree of uniqueness ofthe index.Note: Changing the choice of Primary Index

requires dropping and recreating the table.


48/134


Primary Index Values The value of the Primary Index for a specific row determines the AMP assignment for

that row.

This is done using a hashing algorithm.

PE

Row assignmentRow access

HashingAlgorithm

AMP AMP AMP

PI Value

Accessing the row by its Primary Index value is:

always a one-AMP operation the most efficient way to access a row

Other table access techniques:

Secondary index access Full table scans


49/134


50/134


Accessing Via a Non-Unique Primary IndexA NUPI access is a one-AMP operation which may access multiple rows.

CREATE TABLE sample_2(col_x INTEGER,col_y INTEGER,col_z INTEGER)

PRIMARY INDEX (col_x);

SELECT col_x,col_y,col_z

FROM sample_2WHERE col_x = 25;

PE

HashingAlgorithm

AMP

NUPI = 25

AMP AMP

col_x col_y col_z

10 30 A

10 30 B

35 40 B

col_x col_y col_z

20 50 A

25 55 A

25 60 B

col_x col_y col_z

5 70 B

30 80 B

30 80 A

Both UPI and NUPIaccesses are oneAMP operations.

Primary Keys and Primary Indexes


51/134


Primary Keys and Primary Indexes Indexes are conceptually different from keys.

A PK is a relational modeling convention which allows each row to be uniquely identified.

A PI is a Teradata convention which determines how the row will be stored and accessed.

A significant percentage of tables may use the same columns for both the PK and the PI.

A well-designed database will use a PI that is different from the PK for some tables.

Primary Key Primary Index

Logical concept of data modeling Physical mechanism for access and storage

Teradata doesnt need to recognize Each table must have exactly one primary index

No limit on number of columns 16 column limit (V2R4.1); 64 column limit (V2R5)

Documented in data model Defined in CREATE TABLE statement

(Optional in CREATE TABLE)

Must be unique May be unique or non-unique

Identifies each row May be unique or non-unique

Values should not change Values may be changed (Delete + Insert)

May not be NULLrequires a value May be NULL

Does not imply an access path Defines most efficient access path

Chosen for logical correctness Chosen for physical performance


52/134


Duplicate RowsA duplicate row is a row of a table whosecolumn values are all identical toanother row in the same table.

col_a col_b col_c

20 50 A

25 50 A

25 50 A

Duplicate Rows

Because a PK uniquely identifies each row, ideally a relational table should not haveduplicate rows!

The ANSI standard, however, permits duplicate rows for specialized situations, thusTeradata permits them as well.

You may select whether your table will or will not allow them.

* Note: If a UPI is selected on a SET table, the duplicate row check is replaced by acheck for duplicate index values.

CREATE SET TABLE table_A:

:

CREATE MULTISET TABLE table_B:

:

Checks for * and disallows duplicate rows. Doesnt check for and allows duplicate rows.

The Teradata default The ANSI default


53/134

R Di ib i U i NUPI C 2


54/134


Row Distribution Using a NUPI Case 2Notes:

Customer_Number may be the preferred accesscolumn for ORDER table, thus a good index

candidate. Values for Customer_Number are somewhat non-

unique.

Choice of Customer_Number is therefore a NUPI.

Rows with the same PI value distribute to the sameAMP.

Row distribution is less uniform or skewed.

o_# c_# o_dt o_st

7325 2 4/13 O

7202 2 4/09 C

7225 2 4/15 C

o_# c_# o_dt o_st

7384 1 4/12 C

7103 1 4/10 O

7415 1 4/13 C

7188 1 4/13 C

o_# c_# o_dt o_st

7402 3 4/16 C

7324 3 4/13 O

AMP AMP AMP AMP

Order

Number

Customer

Number

Order

Date

Order

Status

PK

NUPI

7325

7324

7415

7103

7225

7384

74027188

7202

2

3

1

1

2

1

31

2

4/13

4/13

4/13

4/10

4/15

4/12

4/164/13

4/09

O

O

C

O

C

C

CC

C

Order

Row Distribution Using a Highly Non-Unique


55/134


Row Distribution Using a Highly Non UniquePrimary Index (NUPI) Case 3

Order

Number

Customer

Number

Order

Date

Order

Status

PK

NUPI

7325

7324

7415

7103

7225

7384

7402

7188

7202

2

3

1

1

2

1

3

1

2

4/13

4/13

4/13

4/10

4/15

4/12

4/16

4/13

4/09

O

O

C

O

C

C

C

C

C

Order Notes:

Values for Order_Status are highly non-

unique. Choice of Order_Status column is a NUPI.

Only two values exist, so only two AMPswill ever be used for this table.

Table will not perform well in paralleloperations.

Highly non-unique columns are poor PIchoices generally.

The degree of uniqueness is critical toefficiency.

AMP AMP AMP AMP

o_# c_# o_dt o_st

7402 3 4/16 C

7202 2 4/09 C

7225 2 4/15 C

7415 1 4/13 C

7188 1 4/13 C

7384 1 4/12 C

o_# c_# o_dt o_st

7103 1 4/10 O

7324 3 4/13 O

7325 2 4/13 O


56/134


Primary IndexMechanics


57/134



Explain the role of the hashing algorithm and the hash map in

locating a row.

Explain the makeup of the Row ID and its role in row storage.

Describe the sequence of events for locating a row given its PI

value.


58/134

H hi D t th AMP


59/134


Hashing Down to the AMPs

Index value(s)

hashing algorithm

Hash Map

AMP #

The hashing algorithm is designed to insure even distribution ofunique values across all AMPs.

Different hashing algorithms are used for different internationalcharacter sets.

A Row Hash is the 32-bit result of applying a hashing algorithm to

an index value.The DSW or Hash Bucket is represented by the high order 16 bitsof the Row Hash.

A Hash Map is uniquely configured for each system.

It is a array of 65,536 entries (buckets) which associates bucketnumbers with specific AMPs.

Two systems with the same number of AMPs will have the sameHash Map.

Changing the number of AMPs in a system requires a change to

the Hash Map.

{

{

{

{

DSW orHash Bucket #

Row Hash

A H hi E l


60/134


A Hashing Example

Order

OrderNumber

PK

UPI

CustomerNumber

OrderDate

OrderStatus

7325 2 4/13 O7324 3 4/13 O7415 3 4/13 O7415 1 4/13 C7103 1 4/10 O7225 2 4/15 C7384 1 4/12 C7402 3 4/12 C7188 1 4/13 C7202 2 4/09 C

SELECT * FROM orderWHERE order_number = 7202;

7202

Hashing Algorithm

691B 14AE

32 bit Row Hash

Remaining 16 bitsDestination Selection Word

0110 1001 0001 1011 0001 0100 1010 1110

6 9 1 B


61/134

Identifying Rows


62/134


Identifying Rows

Consideration #1

A Row Hash = 32 bits = 4.2 billion possible values

Because there is an infinite number of possible datavalues, some data values will have to share thesame row hash.

Hash Algorithm

1254 7769

10A2 2936 10A2 2936 Hash Synonyms

Data values input

Consideration #2

A Primary Index may be non-unique (NUPI).

Different rows will have the same PI value and thus

the same row hash.

A row hash is not adequate to uniquely identify a row.

Conclusion

A row hash is no t adequate to un iquely ident i fy a row.

Hash Algorithm

(John)'Smith'

0016 5557

(Dave)'Smith' NUPI Duplicates

Rows havesame hash

0016 5557

The Row ID


63/134


The Row ID

TO UNIQUELY IDENTIFY A ROW, WE ADD A 32-BIT UNIQUENESS VALUE.

THE COMBINED ROW HASH AND UNIQUENESS VALUE IS CALLED A ROW

ID.Row Hash(32 bits)

Uniqueness Id(32 bits)

Row ID

Each stored rowhas a Row ID as a

prefix.

Rows are logicallymaintained in RowID sequence.

Row ID Row Data

3B11 5032 0000 0001 1018 Reynolds Jane3B11 5032 0000 0002 1020 Davidson Evan3B11 5032 0000 0003 1031 Green Jason3B11 5033 0000 0001 1014 Jacobs Paul3B11 5034 0000 0001 1012 Chevas Jose3B11 5034 0000 0002 1021 Carnet Jean

: : : : :

Row Hash Unique ID Emp_No Last_Name First_Name

Row ID Row Data


64/134

Storing Rows (2 of 2)


65/134


Storing Rows (2 of 2)Add a row for 'Fred Smith' - (NUPI Duplicate)

Row ID Row Data

Row Hash Unique ID Last_Name First_Name Etc.

0016 5557 0000 0001 Smith John

0016 5557 0000 0002 Smith Fred

1058 9829 0000 0001 Adams Sam

'Smith' Hash Algorithm 0016 5557 Hash Map AMP #3

Add a row for 'Dan Jones' - (Hash Synonym)

'Jones' Hash Algorithm 0016 5557 Hash Map AMP #3

Row ID Row Data

Row Hash Unique ID Last_Name First_Name Etc.0016 5557 0000 0001 Smith John

0016 5557 0000 0002 Smith Fred

0016 5557 0000 0003 Jones Dan

1058 9829 0000 0001 Adams Sam

Given the row hash, what other information would be needed to find the 'Dan Jones' row? The 'Fred Smith' row?


66/134


67/134


68/134


69/134


70/134


71/134

Non-Unique Secondary Index (NUSI)


72/134


Access

CREATE INDEX (Name) ONCustomer;

SELECT *FROM Customer

WHERE Name = 'Adams';

Create NUSI

Access via NUSI

HashingAlgorithm

NUSI Value = 'Adams'

PE


AMP 1 AMP 2 AMP 3 AMP 4

CustomerTable ID = 100

Table ID Row Hash NUSI Value

100 567 Adams

to MPL

NUSI Subtable NUSI Subtable NUSI Subtable NUSI Subtable

RowID Name RowID

432, 8 Smith 640, 1

448, 1 White 107, 1

567, 3 Adams 638, 1

656, 1 Rice 536, 5

RowID Name RowID

432, 1 Smith 147, 1

448, 4 Black 822, 1

567, 6 Jones 338, 1

770, 1 Young 147, 2

RowID Name RowID

155, 1 Marsh 915, 9

396, 1 Peters 778, 3

432, 5 Smith 778, 7

567, 1 Jones 639, 1

RowID Name RowID

432, 3 Smith 884, 1

567, 2 Adams 471, 1

717, 2

852, 1 Brown 555, 6

AMP 1 AMP 2 AMP 3 AMP 4

Base Table Base Table Base Table Base Table

RowIDCust Name Phone

NUSI NUPI

471, 1 45 Adams 444-6666

555, 6 98 Brown 333-9999

717, 2 72 Adams 666-7777

884, 1 74 Smith 555-6666


NUSI NUPI

147, 1 49 Smith 111-6666

147, 2 12 Young 777-4444

388, 1 27 Jones 222-8888

822, 1 62 Black 444-5555


NUSI NUPI

107, 1 37 White 555-4444

536, 5 84 Rice 666-5555

638, 1 31 Adams 111-2222

640, 1 40 Smith 222-3333


NUSI NUPI

639, 1 77 Jones 777-6666

778, 3 95 Peters 555-7777

778, 7 56 Smith 555-7777

915, 9 51 Marsh 888-2222


73/134


74/134


75/134


76/134


Teradata Training


Explain the concept of FALLBACK tables.

List the types and levels of locking provided by Teradata.

Describe the Recovery, Transient and Permanent Journals

and their function.

List the utilities available for archive and recovery.


77/134

Disk Arrays


78/134


Teradata Training

Disk Arrays

DAC

DAC

Host Operating System

Utilities Applications

Why Disk Arrays?

High availabilitythrough data mirroring or data parity protection.

Better I/O performancethrough implementation of RAID technology at the hardwarelevel.

Convenience- automatic disk recovery and data reconstruction when mirroring ordata parity protection is used.


79/134


80/134

RAID 1 Summary


81/134


Teradata Training

RAID 1 Summary

Characteristics

data is fully replicated striped mirroring is possible with multiple pairs of disks in a drive group

transparent to operating system

Advantages

maximum data availability

read performance gains

no performance penalty with write operations

fast recovery and restoration

Disadvantages

50% of disk space is used for mirrored data

Summary

RAID 1 provides high data availability and performance, but storage costs are higher.

Striped Mirro ring is NOT necessary with Teradata.


82/134


83/134

Teradata RAID 1 and RAID 5


84/134


Teradata Training

e adata a d 5

RAID 1 for Teradata

Most useful with typical Teradata data warehouses (e.g., Active Data Warehouses).

RAID 5 for Teradata

Most useful when creating archival data warehouses that require less expensivestorage and where performance is not as important.

Why?

RAID 1 provides Superior Performance Mirroring provides the best read and write throughput.

Maximizes the performance capabilities of controllers and disk drives.

Best performance when a drive has failed.

Less reconstruction impact when a drive has failed.

RAID 1 provides Superior Availability Less susceptible to a double disk failure in a RAID drive group.

Faster reconstruction of a failed drive - shorter vulnerability period duringreconstruction.


85/134

Teradata Vproc Migration


86/134


Teradata Training

p gVproc Migrationvprocs in the failed node are started in the remaining

nodes within the clique.

SMP Fails

DAC-A DAC-BDAC-A DAC-BDAC-A DAC-B DAC-A DAC-B

SMP001-4 AMPs

0 3 39

SMP001-5 AMPs

1 4 37.

SMP002-4 AMPs

2 5 38.

SMP002-5 AMPs

36


87/134


88/134


89/134


90/134


91/134

Fallback Clusters


92/134


Teradata Training

A Fallback cluster is a defined set of AMPs across which fallback is implemented.

All Fallback rows for AMPs in a cluster must reside within the cluster.

Loss of one AMP in the cluster permits continued table access.

Loss of two AMPs in the cluster causes the RDBMS to halt.

Primaryrows

Fallbackrows

AMP 1

62 278

5 34 14

AMP 2 AMP 3 AMP 4Cluster 0

34 5022 5 1978 14 381

19 38 8 22 62 1 50 27 78

Primaryrows

Fallback

rows

AMP 5 AMP 6 AMP 7 AMP 8Cluster 1

41 766

93 72 88

58 2093 88 452 17 7237

45 7 17 37 58 41 20 2 66


93/134


94/134


95/134


96/134

Fallback and RAID 1 Example (cont.)


97/134


Teradata Training

RAID 1 -Mirrored

Pair ofPhysicalDiskDrives

Primary 342250

Fallback 1938

8

Primary 342250

Fallback 1938

8

Primary 141

38Fallback 50

27

78

Primary 141

38Fallback 50

27

78

Primary 628

27Fallback 5

34

14

Primary 628

27Fallback 5

34

14

Primary 57819

Fallback 2262

1

Primary 57819

Fallback 2262

1

Assume two disk drives have failed in the same drive group. Is Fallback needed?

Primaryrows

Fallbackrows

AMP 1

62 278

5 34 14

AMP 2 AMP 3 AMP 4

Vdisk

34 5022 5 1978 14 381

19 38 8 22 62 1 50 27 78


98/134


99/134


100/134

Recovery Journal for Down AMPs


101/134


Teradata Training

Automatically activated when an AMP is taken off-line.Maintained by other AMPs in the cluster.Totally transparent to users of the system.

Recovery Journal is:

While AMP is off-line Journal is active.Table updates continue as normal.Journal logs Row IDs of changed rows for down-AMP.

When AMP is back on-line Restores rows on recovered AMP to current status.Journal discarded when recovery complete.

Primaryrows

Fallbackrows

AMP 1

62 278

5 34 14

AMP 2 AMP 3 AMP 4

Vdisk

34 5022 5 1978 14 381

19 38 8 22 62 1 50 27 78

RecoveryJournal Row ID for 62Row ID for 34 Row ID for 14


102/134


103/134

Archiving and Recovering Data


104/134


Teradata Training

ARC

The Archive/Restore utility (arcmain)

Runs on IBM, UNIX, and Windows 2000 systems

Archives and restores data from/to Teradata RDBMS

Restores or copies data from archive media

Permits data recovery to a specified checkpoint (using Permanent Journals)

ARC 7.0 is required to archive/restore with Teradata V2R5

Open Teradata Backup

Two choices from different NCR Partners

NetVault - from BakBone software

NetBackup - from VERITAS software (limited support)

Provides Windows front end for ARC

Easy creation of scripts for archive/recovery

Provides job scheduling and tape management functions

ASF2 no longer supported with Teradata V2R5


105/134


106/134

Data Dictionary / Directory


107/134


Teradata Training

DBC

Sys_Calendar SysAdmin SystemFECrashdumps SYSDBA

Data Dictionary / Directory Tables

Object definitionsSystem event logsSystem message tableJournals and Restart control tablesAccounting informationAccess control tables

Views of DD/D Tables

AdministrativeSecuritySupervisoryEnd UserOperational

Macros

Add calculation sequenceGenerate utilization reportsReset accounting valuesAuthorize secured functions


108/134


109/134


110/134

System Views


111/134


Teradata Training

Clarify tables Re-title tables and/or columns. Reorder and format columns. Compute (derive) new column data.

Simply operations Supply join operation syntax. Select and project relevant rows and columns.

Limit access to data Exclude certain rows and/or columns from selection. Limit update to selected table rows and/or columns.

Reduce maintenance When you add or drop columns, applications are not affected (unless a view references a

dropped column). You can drop and recreate tables without affecting access rights granted to views.

Applications

SystemViews

DictionaryTABLE

Utilities

CoordinatedProducts

DictionaryTABLE

DictionaryTABLE


112/134


113/134


114/134


115/134


116/134


117/134


118/134


119/134


120/134


121/134

ShowTblChecks View

Provides information about check constraints at the table level and named


122/134


Teradata Training

SELECT TableName (CHAR(10)),CheckName (CHAR(10)),TblCheck

FROM DBC.ShowTblChecksWHERE DatabaseName = 'TFACT';

Provides information about check constraints at the table level and named

column constraints.

Example Results:

Example:Display table constraintinformation.

DBC.ShowTblChecks

DatabaseName TableName CheckName TblCheckCreatorName CreateTimeStamp

TableName CheckName TblCheck

DEPARTMENT Dept_Chk1 CONSTRAINT "Dept_Chk1" CHECK ( "Dept_nu

EMPLOYEE Emp_Chk1 CONSTRAINT "Emp_Chk1" CHECK ( "Employe

JOB ? CHECK ( "Job_code" >= 3000 )

Note: The first two are named constraints and the third is an unnamedconstraint. All three of these constraints were created at the table level.


123/134


124/134


125/134


126/134

IndexConstraints ViewProvides information about partitioned primary index constraints.


127/134


Teradata Training

p p y

This view only displays tables with an index constraint type of "Q".

Q indicates a table with a PPI

SELECT TableName AS "Table Name",ConstraintText AS "Constraint Text"

FROM DBC.IndexConstraintsWHERE DatabaseName = DATABASE;

Example Results:

Example:List all of the partitioningexpression constraintsfor all tables in thecurrent database.

DBC.IndexConstraints

DatabaseName TableName IndexName IndexNumberConstraintType ConstraintText ConstraintCollation CollationNameCreatorName CreateTimeStamp

Table Name Constraint Text

Sales_History CHECK ((RANGE_N("sales_date" BETWEEN ...Store_Sales CHECK ((store_id ) BETWEEN 1 and 65535)Store_Item CHECK ((((store_id - 1000)* 1000) + (item_id - Store_Revenue CHECK ((CASE_N(total_revenue < 2000, ...

AllTempTables View


128/134


Teradata Training

SELECT HostNo,SessionNo,UserName (CHAR(10)),B_DatabaseName

AS "DataBase",B_TableName AS "Table Name"

FROM DBC.AllTempTables;

Provides information about all global temporary tables materialized in thesystem.

Example Results:

Example:Show all temporary tablesmaterialized in thesystem.

DBC.AllTempTables[X]

HostNo SessionNo UserName B_DatabaseNameB_TableName E_TableID

HostNo SessionNo UserName Database Table Name

01 20887 TFACT02 PD GT_DEPTSALARY01 20908 TFACT01 PD GT_DEPTSALARY


129/134


130/134


131/134

Teradata AdministratorObject Options


132/134


Teradata Training

TeradataAdministrator canalso be used to

display object details.

For example, right-click on the object(e.g., Departmenttable) and a menu ofoptions is displayed.

In this example, theIndexes option wasselected.


133/134


134/134

Teradata Day 1

Documents

Transcript of Teradata Day 1