DB-20_Bascom D+L

download DB-20_Bascom D+L

of 40

Transcript of DB-20_Bascom D+L

  • 8/3/2019 DB-20_Bascom D+L

    1/40

    DB-20: Highly ParallelDump & Load

    Tom BascomVP TechnologyWhite Star [email protected]

  • 8/3/2019 DB-20_Bascom D+L

    2/40

    DB-20: Highly Parallel Dump and Load - 2

    Agenda

    Why Dump and Load?

    What Methods are Available?

    Using a Highly Parallel Approach

    Demo!

  • 8/3/2019 DB-20_Bascom D+L

    3/40

    DB-20: Highly Parallel Dump and Load - 3

    Why Dump and Load?

    Because it is X months since the last time.

    Because the vendor said so.

    Because we must in order to upgrade

    Progress.

    Because it will improve performance.

  • 8/3/2019 DB-20_Bascom D+L

    4/40

    DB-20: Highly Parallel Dump and Load - 4

    Why Dump and Load?

    Because it is X months since the last time.

    Because the vendor said so.

    Because we must in order to upgrade

    Progress.

    Because it will improve performance.

  • 8/3/2019 DB-20_Bascom D+L

    5/40

    DB-20: Highly Parallel Dump and Load - 5

    Good Reasons to D&L

    Improve sequential data access (reports).

    Consolidate fragmented records.

    To recover corrupted data.

    In order to move to a new platform.

    In order to reconfigure a storage management

    decision.

    In order to take advantage of hot new dbfeatures!

  • 8/3/2019 DB-20_Bascom D+L

    6/40

    DB-20: Highly Parallel Dump and Load - 6

    Hot Features That Need D&L

    Variable Block Sizes

    Storage Areas

    Variable Rows Per Block

    Data Clusters (Type II Areas)

  • 8/3/2019 DB-20_Bascom D+L

    7/40

    DB-20: Highly Parallel Dump and Load - 7

    Some Dump & Load Choices

    Classic Dictionary Dump & Load

    Bulk Loader

    Binary Dump & Load

    Vendor Supplied Utilities

    Parallelization

    Many Benefits Very difficult to effectively balance manually.

    Automating the process

  • 8/3/2019 DB-20_Bascom D+L

    8/40

    DB-20: Highly Parallel Dump and Load - 8

    Classic Dictionary D&L

    Order-lineCust order hist

    cust Order-line order

    1st the dump

    and then the load.

  • 8/3/2019 DB-20_Bascom D+L

    9/40

    DB-20: Highly Parallel Dump and Load - 9

    Classic Dictionary D&L

    This can take DAYS (or evenWEEKS!)

  • 8/3/2019 DB-20_Bascom D+L

    10/40

    DB-20: Highly Parallel Dump and Load - 10

    Multi-threaded D&L

    Order-line

    cust

    order hist

    cust

    Order-line order

    1st the dump

    and then the load.

  • 8/3/2019 DB-20_Bascom D+L

    11/40

    DB-20: Highly Parallel Dump and Load - 11

    Multi-threaded D&L

    Much more reasonable than classic.

    Generally can be done in a weekend.

    But still too long for many situations: Threads arent easily balanced.

    Longest thread determines total time.

    Timing varies it is hard to predict.

  • 8/3/2019 DB-20_Bascom D+L

    12/40

    DB-20: Highly Parallel Dump and Load - 12

    Overlapping, Parallel, Balanced D&L

    Order-line

    cust

    order hist

    cust

    Order-line

    order

    Both the dump

    and the load inparallel, multi-threaded

    and load-balanced.

  • 8/3/2019 DB-20_Bascom D+L

    13/40

    DB-20: Highly Parallel Dump and Load - 13

    Parallel Balanced D&L

    Usually a big improvement.

    Often can be done in hours.

    But difficult to configure and manage: You need an intelligent and dynamic control

    program.

  • 8/3/2019 DB-20_Bascom D+L

    14/40

    DB-20: Highly Parallel Dump and Load - 14

    Highly Parallel D&L

  • 8/3/2019 DB-20_Bascom D+L

    15/40

    DB-20: Highly Parallel Dump and Load - 15

    Demo !!!

  • 8/3/2019 DB-20_Bascom D+L

    16/40

    DB-20: Highly Parallel Dump and Load - 16

    Resources

    SUSE Linux 9 Server

    2 x 2ghz (Xeon w/HT)

    512 MB RAM

    3 internal disks

    Various External Disks

    5GB Demo Database

    85 Tables

    250 Indexes

    Various table and record sizes

  • 8/3/2019 DB-20_Bascom D+L

    17/40

    DB-20: Highly Parallel Dump and Load - 17

    The Storage Area Design

    Separate Tables from Indexes.

    Use a single Type 1 Area for tiny

    miscellaneous tables that have low activity.

    Isolate Small, but very active, tables in Type 2Areas (or standalone areas if v9).

    Organize Large Tables by RPB in Type 2 Areas

    Any table over 1 million rows

    Any table over 1GB

    Isolate very large indexes

  • 8/3/2019 DB-20_Bascom D+L

    18/40

    DB-20: Highly Parallel Dump and Load - 18

    The Storage Area Design

    b exch06.b1 f 384000#

    d "Schema Area":6,64 .

    #

    d "Misc":10,256 exch06_10.d1 f 768

    #

    d "SmallTables":12,128;8 exch06_12.d1 f 9120

    #d "RPB256":14,256;512 exch06_14.d1 f 221280

    #

    d "RPB128":16,128;512 exch06_16.d1 f 768000

    #

    d "RPB64":18,64;512 exch06_18.d1 f 1536000

    d "RPB64":18,64;512 exch06_18.d2 f 1536000

    #d "RPB32":20,32;512 exch06_20.d1 f 2048000

    d "RPB32":20,32;512 exch06_20.d2 f 2048000

    #

    d "RPB16":22,16;512 exch06_22.d1 f 768000

    #

    d "Indexes":99,8 exch06_99.d1 f 1024000

  • 8/3/2019 DB-20_Bascom D+L

    19/40

    DB-20: Highly Parallel Dump and Load - 19

    Highly Parallel Dump & Load

    90 threads, 16 concurrent.

    Dump threads are the same as load threads.

    Easy monitoring and control.

    No latency the load finishes when the dumpfinishes.

    No intermediate disk IO:

    More efficient usage of disk resources. Higher overall throughput.

    No index rebuild.

    No need for table analysis.

  • 8/3/2019 DB-20_Bascom D+L

    20/40

    DB-20: Highly Parallel Dump and Load - 20

    The 80/20 Rule

    98% of the tables are done very quickly

    2% of the tables take most of the time

    and nearly all of the space

    These are the tables that we target with

    multiple threads!

  • 8/3/2019 DB-20_Bascom D+L

    21/40

    DB-20: Highly Parallel Dump and Load - 21

    Multiple Threads

    Table t Criteria

    document 1 where src.document.app# < 1100000document 2 where src.document.app# >= 1100000 and src.document.app# < 1200000document 3 where src.document.app# >= 1200000 and src.document.app# < 1300000document 4 where src.document.app# >= 1300000 and src.document.app# < 1400000

    document 5 where src.document.app# >= 1400000 and src.document.app# < 1500000document 6 where src.document.app# >= 1500000 and src.document.app# < 1600000document 7 where src.document.app# >= 1600000

    LOGS 1 where type = "P" and ( logdate = 01/01/2004 and logdate = 07/01/2004 and logdate = 01/01/2005 and logdate = 07/01/2005 and logdate = 01/01/2006 )LOGS 14 where type = "L" and ( logdate 08/01/2002 )LOGS 16 where type = "M"LOGS 17 where type = "U"

  • 8/3/2019 DB-20_Bascom D+L

    22/40

    DB-20: Highly Parallel Dump and Load - 22

    Dump Factors

    Major factors that contribute to dump time:

    Data distribution (scatter factor)

    Index Selection

    -B, -Bp, -RO

    Storage Area Configuration (dedicated areas

    and type 2 areas can be very, very fast)

    Disk Configuration and Throughput

    proutil dbname C tabanalys (when doing a

    binary d&l)

  • 8/3/2019 DB-20_Bascom D+L

    23/40

    DB-20: Highly Parallel Dump and Load - 23

    Concurrency -- Dumping

    Dump Time

    0

    300

    600

    900

    1,200

    1,500

    1,800

    510

    15

    20

    30

    40

    50

    90

    Threads

    Time(seconds)

  • 8/3/2019 DB-20_Bascom D+L

    24/40

    DB-20: Highly Parallel Dump and Load - 24

    Take a Load Off

    Factors that contribute to load time:

    Progress version

    Build indexes

    -i, -bibufs, -bigrow, -B, -TB, -TM, -SG

    Storage Area Configuration

    Disk Configuration and Throughput

    -T files

    -noautoreslist (aka forward-only)

    The importance of testing

  • 8/3/2019 DB-20_Bascom D+L

    25/40

    DB-20: Highly Parallel Dump and Load - 25

    Concurrency -- Loading

    Dump & Load Time

    0300600900

    1,2001,5001,8002,1002,4002,7003,0003,3003,600

    510

    15

    20

    30

    40

    50

    90

    Threads

    Time(seconds

    )

  • 8/3/2019 DB-20_Bascom D+L

    26/40

    DB-20: Highly Parallel Dump and Load - 26

    Buffer-Copy & Raw-Transfer

    Very Fast

    Eliminates The Middle Man (temp file IOoperations)

    Provides fine-grained, ABL level of control Allows on the fly data manipulation

    Useful when merging databases

    Can use with remote connections to bridge

    version numbers.

    In OE10.1 performance is essentially equal

    Cannot use RAW-TRANSFER with -RO

  • 8/3/2019 DB-20_Bascom D+L

    27/40

    DB-20: Highly Parallel Dump and Load - 27

    The Code

    start.p

    dlctl.p

    dlclear.p

    dlwrap.p

    dlx.p

  • 8/3/2019 DB-20_Bascom D+L

    28/40

    DB-20: Highly Parallel Dump and Load - 28

    start.p

    /* start.p

    *

    */

    run ./dotp/dlclear.p.

    pause 1 no-message.

    run ./dotp/dlctl.p.

    pause 1 no-message.

    run ./dotp/dlmon.p.

    return.

  • 8/3/2019 DB-20_Bascom D+L

    29/40

    DB-20: Highly Parallel Dump and Load - 29

    dlclear.p

    /* dlclear.p**/

    define new global shared variabledlstart as character no-undo format "x(20)".

    dlstart = "".

    for each dlctl exclusive-lock:assigndlctl.dumped = 0dlctl.loaded = 0

    dlctl.err_count = 0dlctl.note = ""dlctl.xstatus = ""

    .end.

    return.

  • 8/3/2019 DB-20_Bascom D+L

    30/40

    DB-20: Highly Parallel Dump and Load - 30

    dlctl.p

    /* dlctl.p**/

    for each dlctl no-lock where dlctl.active = yesby dlctl.expected descending:

    os-command silentvalue("mbpro -pf dl.pf -p ./dotp/dlwrap.p -param " + '"' +string( dlctl.thread ) + "|" +dlctl.tblname + "|" +dlctl.criteria +"|" +

    dlctl.pname + '"' +" >> /tmp/dl/error.log 2>&1 ~&").

    end.

    return.

  • 8/3/2019 DB-20_Bascom D+L

    31/40

    DB-20: Highly Parallel Dump and Load - 31

    dlwrap.p

    /* dlwrap.p**/

    define variable thr as character no-undo.define variable tbl as character no-undo.define variable cri as character no-undo.

    define variable pnm as character no-undo.

    thr = entry( 1, session:parameter, "|" ).tbl = entry( 2, session:parameter, "|" ).cri = entry( 3, session:parameter, "|" ).

    pnm = entry( 4, session:parameter, "|" ).

    if pnm = "" then pnm = "dlx".

    pnm = "./dotp/" + pnm + ".p".

    run value( pnm ) value( thr ) value( tbl ) value( cri ).

    return.

  • 8/3/2019 DB-20_Bascom D+L

    32/40

    DB-20: Highly Parallel Dump and Load - 32

    dlx.p - part 1

    /* {1} = thread number* {2} = table name* {3} = WHERE clause */

    define variable flgname as char no-undo initial /tmp/dl/{2}.{1}.flg".define variable logname as char no-undo initial "/tmp/dl/{2}.{1}.log".

    define stream unloaded.output stream unloaded to value( "/tmp/dl/{2}.{1}.d" ).

    output to value( logname ) unbuffered.put today " " string( time, "hh:mm:ss" ) " {2} {1}" skip.

    disable triggers for dump of src.{2}.disable triggers for load of dst.{2}.

    define query q for src.{2}.open query q for each src.{2} no-lock {3}.

  • 8/3/2019 DB-20_Bascom D+L

    33/40

    DB-20: Highly Parallel Dump and Load - 33

    dlx.p - part 2

    outer_loop: do for dlctl, dst.{2} while true transaction:inner_loop: do while true:

    get next q no-lock.if not available src.{2} then leave outer_loop. /* end of table */create dst.{2}.

    buffer-copy src.{2} to dst.{2} no-error.

    d = d + 1.if error-status:num-messages = 0 thenl = l + 1.

    elsedo:delete dst.{2}.e = e + 1.export stream unloaded src.{2}.find dl.dlctl exclusive-lock

    where dlctl.tblname = "{2}" and dlctl.thread = {1}.dlctl.err_count = e.next inner_loop.

    end.

  • 8/3/2019 DB-20_Bascom D+L

    34/40

    DB-20: Highly Parallel Dump and Load - 34

    dlx.p - part 3

    if d modulo 100 = 0 thendo:find dl.dlctl exclusive-lock

    where dlctl.tblname = "{2}" and dlctl.thread = {1}.assigndlctl.dumped = ddlctl.loaded = l

    .file-info:file-name = flgname.if file-info:full-pathname = ? thendo:dlctl.note = "stopped".leave inner_loop.

    end.leave inner_loop.

    end.end. /* inner_loop */

    end. /* outer_loop */

  • 8/3/2019 DB-20_Bascom D+L

    35/40

    DB-20: Highly Parallel Dump and Load - 35

    dlx.p - part 4

    do for dl.dlctl transaction:

    find dl.dlctl exclusive-lockwhere dlctl.tblname = "{2}" and dlctl.thread = {1}.

    assigndlctl.dumped = ddlctl.loaded = l

    dlctl.err_count = edlctl.xstatus = "done"dlctl.note =( if dlctl.note "stopped" then "complete" else "stopped" )

    .if dlctl.note "stopped" thendo:

    if d < dlctl.expected then dlctl.note = "short".if d > dlctl.expected then dlctl.note = "long".

    end.

    if e > 0 then dlctl.note = dlctl.note + "+errors".

    end.

  • 8/3/2019 DB-20_Bascom D+L

    36/40

    DB-20: Highly Parallel Dump and Load - 36

    Validation

    Belt & Braces!!!

    Compare Record Counts

    dlcompare.p (for binary d&l)

    Check Logs for Known & Unknown Errors grep i f error.list

    fail, error, (1124)

    Check for Success

    grep 0 errors /tmp/dl/*.log | wc l grep dumped /tmp/dl/*.log

    grep loaded /tmp/dl/*.log

    ls l /tmp/dl/*.d

  • 8/3/2019 DB-20_Bascom D+L

    37/40

    DB-20: Highly Parallel Dump and Load - 37

    Post Load Monitoring

    Watch for very rapid growth:

    May be a rows-per-block assignment error (i.e.

    data areas with RPB = 1).

    May be a cluster-size error (i.e. index areas with512 blocks per cluster).

    May be an unannounced application change.

    It is not unusual for there to be somewhatelevated IO for a short period of time (perhaps

    a week or two).

  • 8/3/2019 DB-20_Bascom D+L

    38/40

    DB-20: Highly Parallel Dump and Load - 38

    Tip Sheet!

    Dont rely on announcements to keep usersout of your system!

    Shut down unnecessary services.

    Dont forget cron jobs!

    Disable remote access.

    Rename the database, the directory, theserver

    Take the server off the network.

    Always perform NON-DESTRUCTIVE Dump& Load!

  • 8/3/2019 DB-20_Bascom D+L

    39/40

    DB-20: Highly Parallel Dump and Load - 39

    Recap

    Excellent Reasons to Dump & Load:

    Move to a new platform!

    Reconfigure a storage configuration!!

    Take advantage of a hot new db features!!!

    Especially Type 2 Areas!!!!

    How To Get There Safely

    and Quickly!

  • 8/3/2019 DB-20_Bascom D+L

    40/40

    DB-20: Highly Parallel Dump and Load - 40

    Questions

    http://www.greenfieldtech.com/downloads.shtml