DB-20_Bascom D+L
-
Upload
auperu-dauru -
Category
Documents
-
view
222 -
download
0
Transcript of DB-20_Bascom D+L
-
8/3/2019 DB-20_Bascom D+L
1/40
DB-20: Highly ParallelDump & Load
Tom BascomVP TechnologyWhite Star [email protected]
-
8/3/2019 DB-20_Bascom D+L
2/40
DB-20: Highly Parallel Dump and Load - 2
Agenda
Why Dump and Load?
What Methods are Available?
Using a Highly Parallel Approach
Demo!
-
8/3/2019 DB-20_Bascom D+L
3/40
DB-20: Highly Parallel Dump and Load - 3
Why Dump and Load?
Because it is X months since the last time.
Because the vendor said so.
Because we must in order to upgrade
Progress.
Because it will improve performance.
-
8/3/2019 DB-20_Bascom D+L
4/40
DB-20: Highly Parallel Dump and Load - 4
Why Dump and Load?
Because it is X months since the last time.
Because the vendor said so.
Because we must in order to upgrade
Progress.
Because it will improve performance.
-
8/3/2019 DB-20_Bascom D+L
5/40
DB-20: Highly Parallel Dump and Load - 5
Good Reasons to D&L
Improve sequential data access (reports).
Consolidate fragmented records.
To recover corrupted data.
In order to move to a new platform.
In order to reconfigure a storage management
decision.
In order to take advantage of hot new dbfeatures!
-
8/3/2019 DB-20_Bascom D+L
6/40
DB-20: Highly Parallel Dump and Load - 6
Hot Features That Need D&L
Variable Block Sizes
Storage Areas
Variable Rows Per Block
Data Clusters (Type II Areas)
-
8/3/2019 DB-20_Bascom D+L
7/40
DB-20: Highly Parallel Dump and Load - 7
Some Dump & Load Choices
Classic Dictionary Dump & Load
Bulk Loader
Binary Dump & Load
Vendor Supplied Utilities
Parallelization
Many Benefits Very difficult to effectively balance manually.
Automating the process
-
8/3/2019 DB-20_Bascom D+L
8/40
DB-20: Highly Parallel Dump and Load - 8
Classic Dictionary D&L
Order-lineCust order hist
cust Order-line order
1st the dump
and then the load.
-
8/3/2019 DB-20_Bascom D+L
9/40
DB-20: Highly Parallel Dump and Load - 9
Classic Dictionary D&L
This can take DAYS (or evenWEEKS!)
-
8/3/2019 DB-20_Bascom D+L
10/40
DB-20: Highly Parallel Dump and Load - 10
Multi-threaded D&L
Order-line
cust
order hist
cust
Order-line order
1st the dump
and then the load.
-
8/3/2019 DB-20_Bascom D+L
11/40
DB-20: Highly Parallel Dump and Load - 11
Multi-threaded D&L
Much more reasonable than classic.
Generally can be done in a weekend.
But still too long for many situations: Threads arent easily balanced.
Longest thread determines total time.
Timing varies it is hard to predict.
-
8/3/2019 DB-20_Bascom D+L
12/40
DB-20: Highly Parallel Dump and Load - 12
Overlapping, Parallel, Balanced D&L
Order-line
cust
order hist
cust
Order-line
order
Both the dump
and the load inparallel, multi-threaded
and load-balanced.
-
8/3/2019 DB-20_Bascom D+L
13/40
DB-20: Highly Parallel Dump and Load - 13
Parallel Balanced D&L
Usually a big improvement.
Often can be done in hours.
But difficult to configure and manage: You need an intelligent and dynamic control
program.
-
8/3/2019 DB-20_Bascom D+L
14/40
DB-20: Highly Parallel Dump and Load - 14
Highly Parallel D&L
-
8/3/2019 DB-20_Bascom D+L
15/40
DB-20: Highly Parallel Dump and Load - 15
Demo !!!
-
8/3/2019 DB-20_Bascom D+L
16/40
DB-20: Highly Parallel Dump and Load - 16
Resources
SUSE Linux 9 Server
2 x 2ghz (Xeon w/HT)
512 MB RAM
3 internal disks
Various External Disks
5GB Demo Database
85 Tables
250 Indexes
Various table and record sizes
-
8/3/2019 DB-20_Bascom D+L
17/40
DB-20: Highly Parallel Dump and Load - 17
The Storage Area Design
Separate Tables from Indexes.
Use a single Type 1 Area for tiny
miscellaneous tables that have low activity.
Isolate Small, but very active, tables in Type 2Areas (or standalone areas if v9).
Organize Large Tables by RPB in Type 2 Areas
Any table over 1 million rows
Any table over 1GB
Isolate very large indexes
-
8/3/2019 DB-20_Bascom D+L
18/40
DB-20: Highly Parallel Dump and Load - 18
The Storage Area Design
b exch06.b1 f 384000#
d "Schema Area":6,64 .
#
d "Misc":10,256 exch06_10.d1 f 768
#
d "SmallTables":12,128;8 exch06_12.d1 f 9120
#d "RPB256":14,256;512 exch06_14.d1 f 221280
#
d "RPB128":16,128;512 exch06_16.d1 f 768000
#
d "RPB64":18,64;512 exch06_18.d1 f 1536000
d "RPB64":18,64;512 exch06_18.d2 f 1536000
#d "RPB32":20,32;512 exch06_20.d1 f 2048000
d "RPB32":20,32;512 exch06_20.d2 f 2048000
#
d "RPB16":22,16;512 exch06_22.d1 f 768000
#
d "Indexes":99,8 exch06_99.d1 f 1024000
-
8/3/2019 DB-20_Bascom D+L
19/40
DB-20: Highly Parallel Dump and Load - 19
Highly Parallel Dump & Load
90 threads, 16 concurrent.
Dump threads are the same as load threads.
Easy monitoring and control.
No latency the load finishes when the dumpfinishes.
No intermediate disk IO:
More efficient usage of disk resources. Higher overall throughput.
No index rebuild.
No need for table analysis.
-
8/3/2019 DB-20_Bascom D+L
20/40
DB-20: Highly Parallel Dump and Load - 20
The 80/20 Rule
98% of the tables are done very quickly
2% of the tables take most of the time
and nearly all of the space
These are the tables that we target with
multiple threads!
-
8/3/2019 DB-20_Bascom D+L
21/40
DB-20: Highly Parallel Dump and Load - 21
Multiple Threads
Table t Criteria
document 1 where src.document.app# < 1100000document 2 where src.document.app# >= 1100000 and src.document.app# < 1200000document 3 where src.document.app# >= 1200000 and src.document.app# < 1300000document 4 where src.document.app# >= 1300000 and src.document.app# < 1400000
document 5 where src.document.app# >= 1400000 and src.document.app# < 1500000document 6 where src.document.app# >= 1500000 and src.document.app# < 1600000document 7 where src.document.app# >= 1600000
LOGS 1 where type = "P" and ( logdate = 01/01/2004 and logdate = 07/01/2004 and logdate = 01/01/2005 and logdate = 07/01/2005 and logdate = 01/01/2006 )LOGS 14 where type = "L" and ( logdate 08/01/2002 )LOGS 16 where type = "M"LOGS 17 where type = "U"
-
8/3/2019 DB-20_Bascom D+L
22/40
DB-20: Highly Parallel Dump and Load - 22
Dump Factors
Major factors that contribute to dump time:
Data distribution (scatter factor)
Index Selection
-B, -Bp, -RO
Storage Area Configuration (dedicated areas
and type 2 areas can be very, very fast)
Disk Configuration and Throughput
proutil dbname C tabanalys (when doing a
binary d&l)
-
8/3/2019 DB-20_Bascom D+L
23/40
DB-20: Highly Parallel Dump and Load - 23
Concurrency -- Dumping
Dump Time
0
300
600
900
1,200
1,500
1,800
510
15
20
30
40
50
90
Threads
Time(seconds)
-
8/3/2019 DB-20_Bascom D+L
24/40
DB-20: Highly Parallel Dump and Load - 24
Take a Load Off
Factors that contribute to load time:
Progress version
Build indexes
-i, -bibufs, -bigrow, -B, -TB, -TM, -SG
Storage Area Configuration
Disk Configuration and Throughput
-T files
-noautoreslist (aka forward-only)
The importance of testing
-
8/3/2019 DB-20_Bascom D+L
25/40
DB-20: Highly Parallel Dump and Load - 25
Concurrency -- Loading
Dump & Load Time
0300600900
1,2001,5001,8002,1002,4002,7003,0003,3003,600
510
15
20
30
40
50
90
Threads
Time(seconds
)
-
8/3/2019 DB-20_Bascom D+L
26/40
DB-20: Highly Parallel Dump and Load - 26
Buffer-Copy & Raw-Transfer
Very Fast
Eliminates The Middle Man (temp file IOoperations)
Provides fine-grained, ABL level of control Allows on the fly data manipulation
Useful when merging databases
Can use with remote connections to bridge
version numbers.
In OE10.1 performance is essentially equal
Cannot use RAW-TRANSFER with -RO
-
8/3/2019 DB-20_Bascom D+L
27/40
DB-20: Highly Parallel Dump and Load - 27
The Code
start.p
dlctl.p
dlclear.p
dlwrap.p
dlx.p
-
8/3/2019 DB-20_Bascom D+L
28/40
DB-20: Highly Parallel Dump and Load - 28
start.p
/* start.p
*
*/
run ./dotp/dlclear.p.
pause 1 no-message.
run ./dotp/dlctl.p.
pause 1 no-message.
run ./dotp/dlmon.p.
return.
-
8/3/2019 DB-20_Bascom D+L
29/40
DB-20: Highly Parallel Dump and Load - 29
dlclear.p
/* dlclear.p**/
define new global shared variabledlstart as character no-undo format "x(20)".
dlstart = "".
for each dlctl exclusive-lock:assigndlctl.dumped = 0dlctl.loaded = 0
dlctl.err_count = 0dlctl.note = ""dlctl.xstatus = ""
.end.
return.
-
8/3/2019 DB-20_Bascom D+L
30/40
DB-20: Highly Parallel Dump and Load - 30
dlctl.p
/* dlctl.p**/
for each dlctl no-lock where dlctl.active = yesby dlctl.expected descending:
os-command silentvalue("mbpro -pf dl.pf -p ./dotp/dlwrap.p -param " + '"' +string( dlctl.thread ) + "|" +dlctl.tblname + "|" +dlctl.criteria +"|" +
dlctl.pname + '"' +" >> /tmp/dl/error.log 2>&1 ~&").
end.
return.
-
8/3/2019 DB-20_Bascom D+L
31/40
DB-20: Highly Parallel Dump and Load - 31
dlwrap.p
/* dlwrap.p**/
define variable thr as character no-undo.define variable tbl as character no-undo.define variable cri as character no-undo.
define variable pnm as character no-undo.
thr = entry( 1, session:parameter, "|" ).tbl = entry( 2, session:parameter, "|" ).cri = entry( 3, session:parameter, "|" ).
pnm = entry( 4, session:parameter, "|" ).
if pnm = "" then pnm = "dlx".
pnm = "./dotp/" + pnm + ".p".
run value( pnm ) value( thr ) value( tbl ) value( cri ).
return.
-
8/3/2019 DB-20_Bascom D+L
32/40
DB-20: Highly Parallel Dump and Load - 32
dlx.p - part 1
/* {1} = thread number* {2} = table name* {3} = WHERE clause */
define variable flgname as char no-undo initial /tmp/dl/{2}.{1}.flg".define variable logname as char no-undo initial "/tmp/dl/{2}.{1}.log".
define stream unloaded.output stream unloaded to value( "/tmp/dl/{2}.{1}.d" ).
output to value( logname ) unbuffered.put today " " string( time, "hh:mm:ss" ) " {2} {1}" skip.
disable triggers for dump of src.{2}.disable triggers for load of dst.{2}.
define query q for src.{2}.open query q for each src.{2} no-lock {3}.
-
8/3/2019 DB-20_Bascom D+L
33/40
DB-20: Highly Parallel Dump and Load - 33
dlx.p - part 2
outer_loop: do for dlctl, dst.{2} while true transaction:inner_loop: do while true:
get next q no-lock.if not available src.{2} then leave outer_loop. /* end of table */create dst.{2}.
buffer-copy src.{2} to dst.{2} no-error.
d = d + 1.if error-status:num-messages = 0 thenl = l + 1.
elsedo:delete dst.{2}.e = e + 1.export stream unloaded src.{2}.find dl.dlctl exclusive-lock
where dlctl.tblname = "{2}" and dlctl.thread = {1}.dlctl.err_count = e.next inner_loop.
end.
-
8/3/2019 DB-20_Bascom D+L
34/40
DB-20: Highly Parallel Dump and Load - 34
dlx.p - part 3
if d modulo 100 = 0 thendo:find dl.dlctl exclusive-lock
where dlctl.tblname = "{2}" and dlctl.thread = {1}.assigndlctl.dumped = ddlctl.loaded = l
.file-info:file-name = flgname.if file-info:full-pathname = ? thendo:dlctl.note = "stopped".leave inner_loop.
end.leave inner_loop.
end.end. /* inner_loop */
end. /* outer_loop */
-
8/3/2019 DB-20_Bascom D+L
35/40
DB-20: Highly Parallel Dump and Load - 35
dlx.p - part 4
do for dl.dlctl transaction:
find dl.dlctl exclusive-lockwhere dlctl.tblname = "{2}" and dlctl.thread = {1}.
assigndlctl.dumped = ddlctl.loaded = l
dlctl.err_count = edlctl.xstatus = "done"dlctl.note =( if dlctl.note "stopped" then "complete" else "stopped" )
.if dlctl.note "stopped" thendo:
if d < dlctl.expected then dlctl.note = "short".if d > dlctl.expected then dlctl.note = "long".
end.
if e > 0 then dlctl.note = dlctl.note + "+errors".
end.
-
8/3/2019 DB-20_Bascom D+L
36/40
DB-20: Highly Parallel Dump and Load - 36
Validation
Belt & Braces!!!
Compare Record Counts
dlcompare.p (for binary d&l)
Check Logs for Known & Unknown Errors grep i f error.list
fail, error, (1124)
Check for Success
grep 0 errors /tmp/dl/*.log | wc l grep dumped /tmp/dl/*.log
grep loaded /tmp/dl/*.log
ls l /tmp/dl/*.d
-
8/3/2019 DB-20_Bascom D+L
37/40
DB-20: Highly Parallel Dump and Load - 37
Post Load Monitoring
Watch for very rapid growth:
May be a rows-per-block assignment error (i.e.
data areas with RPB = 1).
May be a cluster-size error (i.e. index areas with512 blocks per cluster).
May be an unannounced application change.
It is not unusual for there to be somewhatelevated IO for a short period of time (perhaps
a week or two).
-
8/3/2019 DB-20_Bascom D+L
38/40
DB-20: Highly Parallel Dump and Load - 38
Tip Sheet!
Dont rely on announcements to keep usersout of your system!
Shut down unnecessary services.
Dont forget cron jobs!
Disable remote access.
Rename the database, the directory, theserver
Take the server off the network.
Always perform NON-DESTRUCTIVE Dump& Load!
-
8/3/2019 DB-20_Bascom D+L
39/40
DB-20: Highly Parallel Dump and Load - 39
Recap
Excellent Reasons to Dump & Load:
Move to a new platform!
Reconfigure a storage configuration!!
Take advantage of a hot new db features!!!
Especially Type 2 Areas!!!!
How To Get There Safely
and Quickly!
-
8/3/2019 DB-20_Bascom D+L
40/40
DB-20: Highly Parallel Dump and Load - 40
Questions
http://www.greenfieldtech.com/downloads.shtml