Introduction to Teradata And How Teradata Works
-
Author
bigclasses-com -
Category
Education
-
view
1.708 -
download
9
Embed Size (px)
Transcript of Introduction to Teradata And How Teradata Works
MicroStrategy
1
Introduction to Teradata
2
How Teradata Works
3
How Does Teradata Store Rows?
Teradata uses hashing algorithm to randomly and evenly distribute data across all AMPs.The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all AMPs.Each AMP is responsible for a subset of the rows of each table.Evenly distributed tables result in evenly distributed workloads.The data is not placed in any particular order
The benefits of unordered data include:No maintenance needed to preserve order, andIt is independent of any query being submitted.
The benefits of automatic data placement include:Distribution is the same regardless of data volumeDistribution is based on row content, not data demographics
4
Primary Indexes
The mechanism used to assign a row to an AMPA table must have a Primary IndexThe Primary Index cannot be changedUPIIf the index choice of column(s) is unique, we call this a UPI (Unique Primary Index).A UPI choice will result in even distribution of the rows of the table across all AMPs.NUPIIf the index choice of column(s) isnt unique, we call this a NUPI (Non-Unique Primary Index).A NUPI choice will result in even distribution of the rows of the table proportional to the degree of uniqueness of the index.UPIs guarantee even data distribution and eliminate duplicate row checking.Why would you choose an Index that is different from the Primary Key? Join performance Known access paths
5
Data Storage based on Primary IndexThe value of the Primary Index for a specific row determines its AMP assignment.This is done using the hashing algorithm.
PI Value
AMPAMPAMP
PE
Row assignmentRow access
HashingalgorithmAccessing the row by its Primary Index value is:Always a one-AMP operation The most efficient way to access a row
6
Row Distribution Using a UPI
The PK column(s) willoften be used as a UPI.PI values for Order_Number are known to be unique (its a PK).Teradata will distribute different index values evenly across AMPs.Resulting row distribution among AMPs is uniform.
AMP 1AMP 2AMP 3AMP 4
720224/09C
740234/16C
732524/13C
722524/15C
718814/13C
738414/12C
732434/13C
710314/10C
741514/13C
Order
7
Row Distribution Using a NUPI
OrderCustomer_Number may be the referred access column for ORDER table, thus a good index candidate.Values for Customer_Number are non-unique and therefore a NUPI.Rows with the same PI value distribute to the same AMP causing row distribution to be less uniform or skewed.
722524/15C
732524/130
741514/13C
738414/12C
732434/130
740234/16C
710314/10C
AMP 1AMP 2AMP 4720224/09C
718814/13C
AMP 3
7
8
Secondary IndexesThree general ways to access a table: Primary index access(one-AMP access) Secondary index access(two-or all-AMP access) Full Table Scan(all-AMP access)A secondary index is an alternate path to the rows of a table.A table can have from 0 to 32 secondary indexes.Secondary indexes:Do not affect table distribution.Add overhead, both in terms of disk space and maintenance.May be added or dropped dynamically as needed.Are chosen to improve table performance.
8
9
Customer table Id = 100USI Value = 56Table IDRow HashUSI Value
10060256
Hashing AlgorithmPECREATE UNIQUE INDEX (cust) on customer;SELECT *FROM customerWHERE cust = 56;Create USIAccess via USI- * -
AMP 1
AMP 2
AMP 3
AMP 4
RowIDCustRowID
RowIDCustRowID
RowIDCustRowID
RowIDCustRowID
BYNET
AMP 2
Table ID100Row Hash778Unique Val7
USI Subtable
USI Subtable
USI Subtable
USI SubtableBYNETAMP 1AMP 3AMP 4
74775127884, 1639, 1915, 9388, 1244, 1505, 1744, 4757, 184985649536, 5555, 6778, 7147, 1296, 1135, 1602, 1969, 131404595638, 1640, 1471, 1778, 3288, 1339, 1372, 2588, 1175, 1 37107, 1489, 1 72 717, 2838, 1 12147, 2919, 1 62 822, 1
AdamsSmithRiceWhite555-4444111-2222222-3333666-555531374084107, 1536, 5638, 1640, 1RowIDCustNamePhoneNUPI
Base TableUSI
RowIDCustNamePhoneNUPI
Base TableUSI
Base Table
Base TableAdamsSmithBrownAdams444-6666666-7777555-6666333-999972457498471, 1555, 6717, 2884, 1
JonesBlackYoungSmith111-6666222-8888444-5555777-444427496212147, 1147, 2388, 1822, 1RowIDCustNamePhoneNUPIUSI
SmithMarshPetersJones777-6666555-7777888-2222555-777756775195639, 1778, 3778, 7915, 9RowIDCustNamePhoneNUPIUSI
Unique Secondary Index (USI) Access
9
10
Non-Unique Secondary Index (NUSI) Access
Table ID100Row Hash567NUSI ValueAdams
Hashing Algorithm
Customer table Id = 100
BYNET
AMP 2
NUSI Value = Adams
PECREATE INDEX (name) on customer;SELECT *FROM customerWHERE name = Adams;Create NUSIAccess via NUSIAMP 1
BrownAdamsSmith
555, 6471, 1 717, 2884, 1852, 1567, 2432, 3
RowIDNameRowID
WhiteRiceAdamsSmith107, 1536, 5638, 1640, 1448, 1656, 1567, 3432, 8RowIDNameRowID
NUSI Subtable
NUSI SubtableSmithYoungJonesBlack147, 1147, 2338, 1822, 1432, 1770, 1567, 6448, 4RowIDNameRowID
NUSI SubtableJonesPetersSmithMarsh639, 1778, 3778, 7915, 9262, 1396, 1432, 5155, 1RowIDNameRowID
NUSI Subtable
AMP 4AMP 3
AdamsSmithRiceWhite555-4444111-2222222-3333666-555531374084107, 1536, 5638, 1640, 1RowIDCustNamePhoneNUPI
Base Table
RowIDCustNamePhoneNUPI
Base Table
Base Table
Base TableAdamsSmithBrownAdams444-6666666-7777555-6666333-999972457498471, 1555, 6717, 2884, 1
JonesBlackYoungSmith111-6666222-8888444-5555777-444427496212147, 1147, 2388, 1822, 1RowIDCustNamePhoneNUPI
SmithMarshPetersJones777-6666555-7777888-2222555-777756775195639, 1778, 3778, 7915, 9RowIDCustNamePhoneNUPINUSINUSINUSINUSI
10
11
Comparison of Primary and Secondary Indexes
11
12
Primary Keys and Primary Indexes
Primary Key
Logical concept of data modeling
Teradata doesnt need to recognizeNo limit on column numbersDocumented in data model(Optional in CREATE TABLE)Must be uniqueUniquely identifies each row
Values should not changeMay not be NULLrequires a valueDoes not imply an access pathChosen for logical correctnessPrimary Index
Physical mechanism for access andstorageEach table must have exactly one16-column limitDefined in CREATE TABLE statement
May be unique or non-uniqueUsed to place and locate each rowon an AMPValues may be changed (Del+ Ins)May be NULLDefines most efficient access pathChosen for physical performance
Indexes are conceptually different from keys:A PK is a relational modeling convention which uniquely identified each row.A PI is a Teradata convention which determines how the rows are stored and accessed.A significant percentage of tables may use the same columns for both the PK and PI.A well-designed database will use a PI that is different from the PK for some tables.
12
Learn Teradata Online Contact: USA: +1 732 325 1626 India: +91 800 811 4040Mail: [email protected]
/bigclasses/bigclasses/bigclasseshttp://bigclasses.com/teradata-online-training.html
Thank you 14
Watch Teradata DEMO Video On YouTube
www.youtube.com/user/bigclassescom