Experiences with D/R Procedures Of ADABAS Data on Mainframes Natural Conference Boston Dieter W....
-
Upload
emmeline-wilcox -
Category
Documents
-
view
250 -
download
3
Transcript of Experiences with D/R Procedures Of ADABAS Data on Mainframes Natural Conference Boston Dieter W....
Experiences with D/R Procedures
Of ADABAS Data on Mainframes
Natural Conference Boston
Dieter W. Storr May 2004
May 2004 Dieter W. Storr -- [email protected]
3
Different Disaster Different Action
Unplanned downtime Machine outages Network outages Software failures
Disaster Site / data center loss Catastrophic failure
May 2004 Dieter W. Storr -- [email protected]
4
Leading Causes of DowntimeSource: DRJ Summer 2002, Volume 15, Number 3
29%
11% 10% 8%
Power Storm Flood TerrorismOutage Damage Sabotage
May 2004 Dieter W. Storr -- [email protected]
5
Other Causes of Downtime
Fire
Earthquake
Computer Crime
May 2004 Dieter W. Storr -- [email protected]
6
LA Times Downtime
Flood Damage 21 April 2002: Water was flooding through the Orange County
facility, 14-inch pipe that supplies the fire-sprinkler
system burst, half the facility standing in more than
a foot of muddy water
Affected areas: editorial, ad ops, IT,HR,
ADABAS was not affected
May 2004 Dieter W. Storr -- [email protected]
7
LA Times Downtime
Bomb Alarm 14 June 2002: A bomb was believed to have been left in the Bank
of America branch that’s set into the Times
Building
Security swept the building,
DBA’s observed the system from home
May 2004 Dieter W. Storr -- [email protected]
8
LA Times Downtime
Bomb Alarm 29 July 2002: An intruder claimed to have a bomb,
darted into the garage
Security swept the building,
OP stopped CA7 - so PLOGCOPY couldn’t start
automatically, two PLOG’s got full, ADABAS was
locked, DBA’s later started the PLCOPY jobs
manually
May 2004 Dieter W. Storr -- [email protected]
9
LA Times Downtime
Power Outage - 29 August 2002 (3:43 P.M.)
City (DWP) had a power grid, flood leaked into a
DWP transformer
There were actually 2 spikes/outages, the first started
the UPS switchover, which was interrupted by the
second, which took the UPS down.
May 2004 Dieter W. Storr -- [email protected]
10
LA Times Downtime
Power Outage - cont’
The network was back in service after a short delay. Our Unix-based servers were restarted, and checked.
There was no evidence of damage to the Sybase Adaptive Server Enterprise (ASE, formerly: Sybase SQL Server) servers.
May 2004 Dieter W. Storr -- [email protected]
11
LA Times Downtime
Power Outage - cont’ Mainframe recovery was delayed due to corruption to
the Hardware Management Console (HMC) OP did a power-on reset, which restored the HMC Operations IPLed, and Technical Support proceeded
with system checkout procedures. Although Enterprise Storage Server (ESS) had an error
indicator, it was still up and did not add to any outages IBM reset error indicator without impact.
May 2004 Dieter W. Storr -- [email protected]
12
LA Times Downtime
Power Outages - cont’ Started ADABAS servers manually: Parm Error 23,
DIB block remained after an abnormal termination Started all servers with IGNDIB=YES
18:25 ADABAS IS ACTIVE
NO ADAN58 Message
May 2004 Dieter W. Storr -- [email protected]
13
LA Times Downtime
ADAN58 Message (ADA71: ADAN5A)
ADAN58 BUFFER-FLUSH START RECORD DETECTED
DURING AUTORESTART.
THE NUCLEUS WILL T E R M I N A T E AFTER AUTORESTART. IN CASE OF POWER FAILURE, THE DATABASE MIGHT BE INCONSISTENT BECAUSE OF PARTIALLY WRITTEN BLOCKS.
O N L Y IN THIS CASE, REPAIR THE DATABASE BY RESTORE AND REGENERATE; OTHERWISE RESTART THE NUCLEUS.
ADAN5A: FILES MODIFIED DURING AUTORESTART: files
May 2004 Dieter W. Storr -- [email protected]
14
Power Failure During Buffer Flush
A B C D
E F C H
E F C D
old block
updated block
partially updated block on disk
May 2004 Dieter W. Storr -- [email protected]
15
Nucleus Restart After Power failure - IGNDIB=YES<snip>ADA200 00230 User exit 2 active. ADA201 00230 PLOG2 closed. ADAP3X2P submitted. ADAN21 00230 PROTECTION-LOG PLOGR1 STARTED ADAN02 00230 NUCLEUS-RUN WITH PROTECTION-LOG 00677 ADAL02 00230 2002-08-29 18:25:18 CLOGRS IS ACTIVE ADAN03 00230 ADABAS COMING UP ADAN5A 00230 FILES MODIFIED DURING AUTORESTART: ADAN5A 00230 00038 00057 00069 00072 00073 00074 ADAN5A 00230 00075 00076 00104 00138 00139 00148 ADAN5A 00230 00195 00221 00243 ADAN19 00230 RUNNING WITH ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00230 FILE-LEVEL CACHING INITIALIZED ADAN80 00230 ADABAS DYNAMIC CACHING ENVIRONMENT ESTABLISHED. ADAN01 00230 A D A B A S V6.2.2 IS ACTIVE ADAN01 00230 MODE = MULTI I S O L A T E D ADAN01 00230 RUNNING WITHOUT RECOVERY-LOG ADA800 00230 User exit 8 active. <snip>
May 2004 Dieter W. Storr -- [email protected]
16
LA Times Downtime
Power Outage - cont’ Switched all PLOGs Checked batch and online There was no evidence of damage to any of the
ADABAS components.
May 2004 Dieter W. Storr -- [email protected]
17
Other LA Times Disasters
1965: Watts riots
1971: Sylmar quake 6.5
1987: Whittier punch 5.9
1992: LA riots
1994: Northridge quake 6.7
6 Feb 1998: El Niňo, flooding in B-1 computer room
15 April 1999: Power failure ‘news editing’
May 2004 Dieter W. Storr -- [email protected]
18
ADABAS Recovery
Command Log (CLOG) Failure - I/O Error Restore or reallocate/format the CLOG ADABAS will come up through Autorestart normally No data loss if CLOG is not used
CLOG
May 2004 Dieter W. Storr -- [email protected]
19
ADABAS Recovery
Protection Log (PLOG) Failure - I/O Error Restore or reallocate/format the PLOG Take a full back-up of the database ADABAS will come up through Autorestart normally Restart batch jobs
Restartable batch jobs = OK Non-restartable batch jobs = check
PLOGPLOG
May 2004 Dieter W. Storr -- [email protected]
20
ADABAS Recovery
TEMP and SORT Failure - I/O Error Restore or reallocate/format the TEMP/SORT dataset Different actions for the utilities
See the ADABAS Utilities manuals
TEMPSORT
May 2004 Dieter W. Storr -- [email protected]
21
ADABAS Recovery
DSIM Failure - I/O Error Restore or reallocate/format a DSIM dataset Different actions for the utilities
See the ADABAS Utilities manuals
DSIM
May 2004 Dieter W. Storr -- [email protected]
22
ADABAS Recovery
Recovery Aid Dataset Failure - I/O Error Restore or reallocate/format a RLOG dataset Prepare the RLOG dataset
ADARAI PREPARE RLOGSIZE / RLOGDEV…. Different actions for the utilities
See the ADABAS Utilities manuals Take a full back-up of the database
This will start the first generation of the RLOG dataset
RLOGR
RLOGM
May 2004 Dieter W. Storr -- [email protected]
23
ADABAS Recovery
ASSO/DATA Failure - I/O Error Copy PLOG twice - ADARES PLCOPY Restore or reallocate/format DATA dataset(s) Instead of reallocate/format and restore all DATA
volumes, System specialists can Reallocate and format the new volume Restore the VTOC chain Restore and Regenerate only files that were located
on the failed volume Otherwise, . . .
DATADATA
ASSOASSO
May 2004 Dieter W. Storr -- [email protected]
24
ADABAS Recovery
ASSO/DATA Failure - I/O Error Restore entire database
ADASAV RESTORE [OVERWRITE = for GCB] ADASAV RESTONL [OVERWRITE]include PLOG
Start nucleus with UTIONLY=YES Regenerate updates from end of last save (SYN2)
ADARES REGENERATE PLOGNUM=xxxADARES FROMCP=SYN2,FROMBLK=xxx
DATADATA
ASSOASSO
May 2004 Dieter W. Storr -- [email protected]
25
ADABAS Recovery
ASSO/DATA Failure - I/O Error Possible utilities need to be rerun (see ADARES):
ADALOD LOAD FILE=xxx ADALOD UPDATE FILE=xxx ADALOD UPDATE FILE=xxx,DDISN ADAINV INVERT FILE=xxx,FIELD=xx
Lock files to rerun utilities ADADBS OPERCOM LOCKU=xx
Unlock utility-only status ADADBS OPERCOM UTIONLY=NO
DATADATA
ASSOASSO
May 2004 Dieter W. Storr -- [email protected]
26
ADABAS Recovery
ASSO/DATA Failure - I/O Error Rerun the regenerate function for the relevant files Unlock the regenerated files
ADADBS OPERCOM UNLOCKU=xx Don’t repeat these steps if ADARES points out:
ADALOD LOAD FILE=nn ADARES REGENERATE FILE=nn ADADBS REFRESH FILE=nn
Nucleus is ready
DATADATA
ASSOASSO
May 2004 Dieter W. Storr -- [email protected]
27
ADABAS Recovery
WORK 1 Failure - I/O Error Restore or reallocate/format the WORK dataset Restore and regenerate the entire database to avoid
inconsistencies: open transactionsSee ASSO/DATA failure
WORK2
WORK1
WORK3
May 2004 Dieter W. Storr -- [email protected]
28
ADABAS Recovery
WORK 2/3 Failure - I/O Error End the database normally (ADAEND) to avoid open
transactions in part 1 of WORK Restore or reallocate/format the WORK dataset Restart the database normally If database abends then restore and regenerate the
entire database - see ASSO/DATA failure
WORK2
WORK1
WORK3
May 2004 Dieter W. Storr -- [email protected]
29
ADABAS Recovery
Failure in Data Storage Blocks//DDSIIN DD DSN=SAVE.SIBA….// DD DSN=PLCOPY.LOG1…// DD DSN=PLOCPY.LOG2…//DDCARD DD *ADARES REPAIR DSRABN=xxx-yyyADARES FILE=n1,n2,n3
Failure in DSSTADADCK DSCHECK FILE=xxxADADCK REPAIR
DS
DS
DS
DATA
CALL SAG ! !
May 2004 Dieter W. Storr -- [email protected]
30
ADABAS RecoveryNucleus Ends With RC 77 Not restartable No more space for Checkpoint File (CP) Rename old WORK Allocate/format new WORK with old space Change high-used RABN and high-used ISN Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user has access Expand the database Stop the nucleus normally Rename old WORK and restart the nucleus with old
WORK (autorestart)
CP
CP
ASSO
DATA
May 2004 Dieter W. Storr -- [email protected]
31
ADABAS RecoveryNucleus Ends With RC 77 Not restartable No more space for user files Rename old WORK Allocate/format new WORK with old space Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user access Expand database Stop nucleus normally Rename old WORK and restart nucleus with old
WORK (autorestart)
User
ASSO
DATA
User
May 2004 Dieter W. Storr -- [email protected]
32
ADABAS RecoveryNucleus Abends - Missed DE ValuesDescriptor is marked in FDT as DE, value doesn’t
exist in ASSO, but in DATA.
Check: ADAICK ICHECK FILE=xxx[,NOOPEN] ADAVAL VALIDATE FILE=xxx,DESCRIPTOR=yy
Solution 1: ADAULD UNLOAD FILE=xxx,UTYPE=EXF ADALOD LOAD FILE=xxx,LWP=yyyyKSolution 2: ADADBS RELEASE FILE=xxx,DESCRIPTOR=yy ADAINV INVERT FILE=xxx,FIELD=yy,LWP=...
ASSO DATA
CALL SAG ! !
May 2004 Dieter W. Storr -- [email protected]
33
Back-up Possibilities ADASAV to tape / disk Including Fast Dump Restore, DFDSS Delta Save Facility (DSF) Delta Save QDUMP (Legent) Disk mirroring (hardware level)
FlashCopy of Enterprise Storage Server (ESS) Peer-to-Peer Remote Copy Extended Distance (PPRC-XD) OC-3 links two EMC disc arrays
Replication Stand-by systems Restore and Regenerate Entire Transaction Server
ASSO
DATA
May 2004 Dieter W. Storr -- [email protected]
34
ADABAS Disaster Recovery
How to back-up
Collect recovery data
Restore w/o nucleus
Start nucleus w/ UTILONLY=YES
Regenerate w/ nucleus
Switch UTIONLY=NO
May 2004 Dieter W. Storr -- [email protected]
35
21:00 01:00 02:00 03:00 8:00 - 11:00 12:00
ADAP1BKFOnline SAVE
ADAP1BKFOnline SAVE
ADAP1PLC(FEOFPL)
ADAP1PLC(FEOFPL)
ADAP1PLCPLOG Switch
ADAP1PLCPLOG Switch
BRM/ABARSSeveral Jobs
BRM/ABARSSeveral Jobs
ADAP1BKOCopy Tapes
ADAP1BKOCopy Tapes
ASSO / DATA / WORK / etc.
Pick-up by Recall
PDS, GDGs etc.
DFDSSFull-Volume Back-up
DFDSSFull-Volume Back-up
Weekly
ADABAS 6.2.2 Back-up at LA Times
May 2004 Dieter W. Storr -- [email protected]
36
Date DB GB Cartridge3490 Silo
Number of3490 Carts
Disk3390(3399)
4/038/03
1 4.94.9
15 min 2 < 2 min< 2 min
4/038/03
2 30.036.7
150+ min224+ min
42 < 35 min< 45 min
4/038/03
3 11.617.1
110+ min 19 < 15 min< 22 min
4/038/03
4 9.79.9
90+ min 9 < 15 min< 15 min
4/038/03
5 5.27.3
28 min 5 < 5 min< 7 min
Production Database Back-ups
ADASAV SAVE BUFNO=2,TTSYN=60Record format . . . : VB Record length . . . : 27994Block size . . . . : 27998BUFNO=30
May 2004 Dieter W. Storr -- [email protected]
37
Back-up to SMS Disk Pool
Run times are consistently at least
80% lower when writing to disk
instead of cartridge
Run times are consistently around
60% lower when copying from disk to
cartridge (compared with cart to cart)
DFSMShsm, automate your storage
management tasks,
SMS Production Storage Pool
DFSMShsm
May 2004 Dieter W. Storr -- [email protected]
38
Back-up to Disk Pool
No cartridge errors
No cartridge drive errors
No cartridges get accidentally ejected from the silo
Smaller back-up window
Smaller maintenance windows
Less impact to application processes
Greater confidence that the data you need will be
there when you need it
May 2004 Dieter W. Storr -- [email protected]
39
IBM Magstar 3494/Virtual Tape Server
Linear design 1 - 18 frames
Conf. Flexibility SCSI, FC, ESCON,
FICON 3590, 3490E, VTS
High availability Dual robotics Dual library manager
>42 old 3490 carts will fit on 1 new 3494 cart
5 x 3390 volumes fit on one 3494 cart
One 3494 cart can be read in 45 seconds into the VTS disk cache (raid-5)
May 2004 Dieter W. Storr -- [email protected]
40
Virtual Tape Concept
Virtual tape drives Appear as multiple 3490E tape drives 3490E Media 1 and 2 support Shared / partitioned like real tape drives
Tape Volume Caching All data access is to cache Improves ‘mount’ performance LRU Cache management
Volume Stacking Fully utilizes physical cart capacity Reduces physical cart requirement Reduces footprint requirement
Virtual Volume 2
Magstar 359030/60 GB capacity*
Logical Volume 1
. . .
VirtualDrive
1
VirtualDrive
n
180 181 19F
Virtual Volume 1
Virtual Volume n
TapeVolume Cache
VirtualDrive
2
Logical Volume n
* assumes 3:1 compression
May 2004 Dieter W. Storr -- [email protected]
41
Performance Tests
Input Output MM.SS StorageAdabas Disk 42.63 526125 tracks 3390
Adabas VTS 46.43 31 log. 3490 tapes
Disk VTS 42.47 31 log. 3490 tapes
VTS VTS 48.38 31 log. 3490 tapes
Disk VTS 39.39 31 log. 3490 tapes
VTS 3590 47.86 1 phys. 3590 tape
Adabas 3490 216.27 51 phys. 3490 tapes
Adabas VTS 52.47 39 log. 3490 tapes
May 2004 Dieter W. Storr -- [email protected]
42
Collecting Data For Recovery
Block Ranges SYN1 - SYN2For ADASAV RESTOREFrom ADASAV SAVE PROTECTION LOG PLOGNUM=64, SYN1=4695, SYN2=4698
From ADAREPSYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKFSYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKFSYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKFSYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC<snip>EOD 00 ET 2002-09-23 23:30:03 64 4747 DUAL ADAPRREPSYNS 53 ET 2002-09-23 23:30:25 64 4749 DUAL ADAP1REPSYNV 28 UTI 2002-09-23 23:30:30 64 4750 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 23:30:31 64 4751 DUAL ADAP1PLC
May 2004 Dieter W. Storr -- [email protected]
43
Collecting Data For Recovery
Block Ranges SYN2 - EndFor ADARES REGENERATEFrom ADAREPSYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKFSYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKFSYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKFSYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC<snip>EOD 00 ET 2002-09-23 23:30:03 64 4747 DUAL ADAPRREPSYNS 53 ET 2002-09-23 23:30:25 64 4749 DUAL ADAP1REPSYNV 28 UTI 2002-09-23 23:30:30 64 4750 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 23:30:31 64 4751 DUAL ADAP1PLC
May 2004 Dieter W. Storr -- [email protected]
44
Collecting Data For Recovery
Dataset Name From Back-up Job (GDG)For ADASAV RESTORE
ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 CATALOGED
May 2004 Dieter W. Storr -- [email protected]
45
Collecting Data For RecoveryDataset Names From PLOG Copy Jobs (GDG)
Matching block numbers 4695 - EndFor ADASAV RESTORE and ADARES REGENERATE
DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64
FROMBLK= 1214, FROMTIME=2002-09-23 03:30:24 TOBLK= 4701, TOTIME= 2002-09-23 21:01:42ADABAS.PROD.DB1.PLOG.COPY.G7170V00DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64 FROMBLK= 4702, FROMTIME=2002-09-23 21:02:08 TOBLK= 4748, TOTIME= 2002-09-23 23:30:03ADABAS.PROD.DB1.PLOG.COPY.G7171V00DDSIAUS1 OUTPUT VOLUME=WRK004, SESSION NR=64 FROMBLK= 4749, FROMTIME=2002-09-23 23:30:25 TOBLK= 4791, TOTIME= 2002-09-24 03:30:33ADABAS.PROD.DB1.PLOG.COPY.G7172V00
May 2004 Dieter W. Storr -- [email protected]
46
Recovery - Part 1 - W/O NucleusADASAV RESTONL
<snip>//RESTONL EXEC ADASAVRD//DDREST1 DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 //DDPLOG DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7170V00//DDKARTE DD * ADASAV RESTONL BUFNO=2,OVERWRITE //REPORT EXEC ADAREP //DDKARTE DD * ADAREP NOFILE //
May 2004 Dieter W. Storr -- [email protected]
47
Recovery - Part 2Start the ADABAS nucleus with normal JCL (UTIONLY=YES)<snip>ADAN21 00215 PROTECTION-LOG PLOGR1 STARTED ADAN02 00215 NUCLEUS-RUN WITH PROTECTION-LOG 00064 ADAL02 00215 2002-09-21 21:20:29 CLOGRS IS ACTIVE ADAN03 00215 ADABAS COMING UP ADAN19 00215 RUNNING WITH ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00215 FILE-LEVEL CACHING INITIALIZED ADAN80 00215 ADABAS DYNAMIC CACHING ENVIRONMENT ESTABLISHED. ADAN01 00215 A D A B A S V6.2.2 IS ACTIVE ADAN01 00215 MODE = MULTI I S O L A T E D ADAN01 00215 RUNNING WITHOUT RECOVERY-LOG ADA800 00215 User exit 8 active. ADA801 00215 ADAP1PLC submitted.
May 2004 Dieter W. Storr -- [email protected]
48
Recovery - Part 2 - With NucleusADARES REGENERATE<snip>//REGEN EXEC ADARES //DDSIIN DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7170V00 // DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7171V00// DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7172V00//DDKARTE DD * ADARES REGENERATE PLOGDBID=215,PLOGNUM=64 ADARES FROMCP=SYN2,FROMBLK=4698 ADARES TOCP=EOD,TOBLK=00000ADARES TOCP=EOD,TOBLK=00000 not needed <snip>
May 2004 Dieter W. Storr -- [email protected]
49
Recovery - Part 3 - With Nucleus Lock files to re-run utilities
See regenerate report ADADBS OPERCOM LOCKU=fnr
or SYSAOS: A / I / L / F or modify command /F jobname,LOCKU=fnr
Unlock utility-only status for users ADADBS OPERCOM UTIONLY=NO
or SYSAOS: A / I / L / U or modify command /F jobname,UTIONLY=NO
May 2004 Dieter W. Storr -- [email protected]
50
Recovery - Part 3 - With Nucleus Re-run the utilities - if necessary
ADALOD LOAD / UPDATE / DDISN ADAINV INVERT FILE=xxx,FIELD=xx
Unlock files ADADBS OPERCOM UNLOCKF=fnr
or SYSAOS: A / I / L / F / N or modify command /F jobname,UNLOCKF=fnr
May 2004 Dieter W. Storr -- [email protected]
51
ASSO
ADASAV
DLOG
Delta Save
changed blocks
NUCLEUS
DDPLOGR1
DATA
ASSO
ASSO
DATADATA
Buffer Pool Delta Log (RABN) changed RABN
ADARES
PLCOPY
DSIM
DDPLOGR2
SAVE
DELTA
PLOG copy
DDSAVE1
DDDSIM
DSF=YES
DDSIAUS1
DSF=YES
DSF=YES
Dual Protection Log
Extracted
Blocks
Delta Save Facility (DSF)
May 2004 Dieter W. Storr -- [email protected]
52
Delta SaveADASAV
RESTORE
DSIM
DDDSIM
DSF=YES
DATADDDELT1-8
DDREST1
Full Image
Save
Online/Offline
Online Images
RABN
extracted
ASSO
RABN
from PLOG
Delta Save Facility
May 2004 Dieter W. Storr -- [email protected]
53
Delta Save QDUMP (CCA - now: TSI)
ASSO
DATA
MPM
ADABAS
und
Utilities
ADAIOR
QDUMP
RABN-WRITE
CSA
12346789
QDUMP
Read
Sub-
task
Write
Sub-
task
Internal
Buffer
ControlProgram
Front End
84318987
91239675
Read
Sub-
task
Write
Sub-
task
http://www.treehouse.com/qdump.shtml
May 2004 Dieter W. Storr -- [email protected]
54
Disk MirroringBenefits Asynchronous disk mirroring can
provide better physical protection by supporting extended physical distances.
No loss of committed transactions in synchronous storage (mirroring/RAID) on a CPU failure
ASSO
DATA
ASSO
DATA
May 2004 Dieter W. Storr -- [email protected]
55
Disk MirroringLimitations No protection from data corruption
introduced by the hardware / software Secondary site is not guaranteed to be
transitionally consistent, because data is moved at the disk/track/sector or bit level (in the case of asynchronous mirroring).
Client application must be re-started after failure and need to be aware of failure
ASSO
DATA
ASSO
DATA
May 2004 Dieter W. Storr -- [email protected]
56
Disk MirroringLimitations Synchronous mirroring and RAID devices
can add overhead to application performance.
Redundant/specialized high availability hardware/software can be expensive and restricted to use for backup purposes only.
Secondary copy of data is not available for use – low hardware utilization.
Need to replicate everything on disk, no selectivity of data replication
ASSO
DATA
ASSO
DATA
May 2004 Dieter W. Storr -- [email protected]
57
Example For Disk Mirroring
S/390 UNIX
S/390 UNIX
12-15 miles
OC-3 link
EMC 5700
EMC 5700
SRDF remote mirroredsynchronized
Back Up / Hot Site
SRDF remote mirroredsynchronized
Main Platform
May 2004 Dieter W. Storr -- [email protected]
58
Dedicated line broadband speeds and prices
T-1 - 1.544 megabits per second (24 DS0 lines) Ave. cost $400.-$650./mo.
T-3 - 43.232 megabits per second (28 T1s) Ave. cost $6,000.-$16,000./mo.
OC-3 - 155 megabits per second (100 T1s) Ave. cost $20,000.-$45,000./mo.
OC-12 - 622 megabits per second (4 OC3s) no price OC-48 - 2.5 gigabits per seconds (4 OC12s) no price OC-192 - 9.6 gigabits per second (4 OC48s) no priceSource: http://www.infobahn.com/research-information.htmprices updated: 16 March 2004
May 2004 Dieter W. Storr -- [email protected]
59
Peer-to-Peer Remote Copy Extended Distance (PPRC-XD)
PPRC = 60 miles - PPRC-XD = continent
ESS Shark
- IBM ESS DASD - HDSalso support PPRC
ESS Shark
FlashCopy
Also see TimeFinder from EMC
May 2004 Dieter W. Storr -- [email protected]
60
External Back-up SystemsFast Copy of Data Snapshot
No data movement A virtual copy by copying pointers
Copy Process Physical copy asynchr. from the log. Copy No impact on applic. on the original data
Specific Hardware Required Software works only with the hardware
Work on Volume Level Some snapshot only tools work also on
dataset level
May 2004 Dieter W. Storr -- [email protected]
61
Snapshot & Physical Copy
IBM Hardware: Enterprise Storage Server Software: Flashcopy
http://www.share.org/proceedings/sh98/data/S3087.PDF
EMC2
Hardware: Symmetrix Remote Data Facility Software: EMC TimeFinder
http://www.emc.com/interactive_center/media/timefinder/tf_noRC.html
May 2004 Dieter W. Storr -- [email protected]
62
How It Works
Read / update
PhysicalBackup
PhysicalBackup
SnapshotSnapshot
Read / updateRead only
snap
Pre-defined time window
Suspend Resume
SourceData
SourceData
Read only: update requests are queued
Source: SAG
May 2004 Dieter W. Storr -- [email protected]
63
ReplicationBenefits Warm standby systems can be
configured over a Wide Area Network, providing protection from site failures.
Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.
Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.
May 2004 Dieter W. Storr -- [email protected]
64
ReplicationBenefits Warm standby systems can be
configured over a Wide Area Network, providing protection from site failures.
Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.
Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.
ASSO
DATA
WORK
ASSO
DATA
WORK
May 2004 Dieter W. Storr -- [email protected]
65
ReplicationBenefits Automatic switch over for clients using a
switching mechanism, no client restart needed.
Originating applications are minimally impacted as replication takes place asynchronously after commit of the originating transaction.
The warm standby database is available for read-only operations, allowing better utilization of backup systems.
ASSO
DATA
WORK
ASSO
DATA
WORK
May 2004 Dieter W. Storr -- [email protected]
66
ReplicationBenefits Ability to resynchronize and easily switch
back to primary system when it becomes available without loss of data.
ASSO
DATA
WORK
ASSO
DATA
WORK
May 2004 Dieter W. Storr -- [email protected]
67
ReplicationLimitations Warm standby system will be out-of-date
by transactions committed at the active database that have not been applied to the standby.
Protection is limited to components supporting Warm Standby (e.g. DBMS data sources may be protected but file systems may not be supported).
ASSO
DATA
WORK
ASSO
DATA
WORK
May 2004 Dieter W. Storr -- [email protected]
68
Entire Transaction Propagator
The Entire Transaction
Propagator allows for
asynchronous data
replication.
Replicated data can be
updated and
synchronized with
master data at user
specified intervals.
May 2004 Dieter W. Storr -- [email protected]
69
OS/390 Recovery ProceduresPrepared by the Mainframe Recovery Team
Recovering The OS/390 platform
The ABARS aggregates
The ADABAS databases
May 2004 Dieter W. Storr -- [email protected]
70
R e c o v e rR e m a in in g
S y s te mV o lu m e s
M a in fra m e R e c o v e ry P ro c e d u re s
P r e - IP L P r o c e d u r e s
P o s t - IP L P r o c e d u r e s
R e s to re S Y S R .D R PL ib ra r ie s
R e s e rv e C y p re s sT a p e D r iv e s
IP L S u n G a rd F lo o rS y s te m ; C h e c k
S e tt in g s
In i t ia l S e tu p
C o n n e c tT im e s a n dS u n G a rdC a ta lo g s
R e s to reS Y S 0 0 2
a n dO S 7 P C 0
C o p y a n dP r in t
S Y S L O G
C h e c k C lo c k a n dR e s e t , i f N e e d e d
C h a n g e J E S 2p a rm to
P = N O R E Q
R H S M T R E PR H S M D IS MR H S M D E L V
R S M S W O R KR S M S P R M
V e r ify S h ip m e n tsfro m R e c a ll
L o a d O S /3 9 0D o c u m e n ta t io n
in to B o o k M a n a g e r
G o to P re - IP LP ro c e d u re s
In it ia l iz eP ro d u c t io n
V o lu m e s
In it ia l iz e W o rkV o lu m e s
IP L T im e sS y s te m
B e g in A p p lic a t io n(A B A R S ) R e s to re s
M . M a k o fs k e , 7 7 2 6 3D ra f t o f J a n u a ry 2 4 , 2 0 0 2
R e s to re R e m a in in gS y s te m C a ta lo g s
R e s to re H S M a n dT M C D a ta s e ts
R e s to re P a g eV o lu m e s
In s e r tT h ird -P a r ty
S o f tw a reP a s s w o rd s
Im p o r tM V S C A TC a ta lo gE n tr ie s
V A R Y O F F W o rk ,P ro d u c t io n a n d
P a g e P a c k s
R e s to re T im e sP R O C L IB s
R e s to re A D A B A SP ro d u c t io n
V o lu m e s
May 2004 Dieter W. Storr -- [email protected]
71
OS/390 D/R Times (SUNGARD) About 2400 tapes
Shipping time from storage to the mainframe ? 4 hours ahead for tape staging
OS/390 and ABARS aggregates 5 hours planned, 7+ hours with problems
ADABAS databases Approx. 2-3 hours for tape restore and regenerate Next test Nov 1: approx. 45 minutes from disk pool
May 2004 Dieter W. Storr -- [email protected]
72
Experiences From D/R Tests Problems to IPL on a strange CPU (6 hours duration)
Initial setup (restore SYS.. Libraries) Pre-IPL procedures (restore Adabas, work, spool volumes, etc) Post-IPL procedures (DFHSM in disaster mode, etc.) Application restores
Tape drive offline problems, Import MVSCAT typo errors, etc.
Recovered wrong volumes, generation errors
Initialize work volumes - conversion to SMS (DFSMShsm)
TMC recovery problems caused BRM recovery problems, too
May 2004 Dieter W. Storr -- [email protected]
73
Experiences From D/R Tests Sent wrong cartridges with system dates to storage
Less channels for tapes on our offsite (2 instead of 4) = double restore time
May 2004 Dieter W. Storr -- [email protected]
74
Experiences From D/R Tests
RESTONL abended with SB00, no PLOG restored, Recovery
Aid flag was on at the saved database.
REGENERATE deleted file and pointed out to repeat the
ADALOD job but the input dataset was not saved
We did a full volume restore (DFDSS), restored the
database and forgot to format the dual protection logs.
Missed protection logs
BRM restored wrong aggregates
Missing full-volume restores - (Database 2)
Missing volumes in Work Storage Pool - (Database 3)
May 2004 Dieter W. Storr -- [email protected]
75
Experiences From D/R Tests
BRM: Back-up and Recovery ManagerABARS: Aggregate Back-up and Recovery Support(ABARS = not: Air conditioning and refrigeration industry services <smile> ) Recovered (-1) Aggregates instead of (0) – (all Databases) Recovered only SOME files on Aggregate (0) - (Database 1)BRM/ABARS was not properly recovered (wrong version of BRM database) Once those problems were resolved (several hours later), the ADABAS recovery ran smoothly.
5 Databases (61.4GB) restored and regenerated in 3.5 hours (tape/cart)
May 2004 Dieter W. Storr -- [email protected]
76
How Far is ‘Far Enough?’(http://www.drj.com/articles/spr03/1602-02.html)
Alternate Facility
Offsite Storage
Facility
Answer = 105 miles
…so the survey
May 2004 Dieter W. Storr -- [email protected]
77
Lessons Learned (http://www.drj.com/articles/spr02/1502-07.html)
Distance is keyStreets, bridges, tunnels, airports are closed
Tape recovery is not effective
All applications are critical
Inconsistent back-up is no back-up at all
People-dependent processes do not suffice
Two sites are not enough
People are irreplaceable; so is information
May 2004 Dieter W. Storr -- [email protected]
78
Lessons Learned (http://www.drj.com/articles/spr02/1502-07.html)
Companies that relied on tape or on third-party
provider found in many cases they had difficulty
meeting their recovery time objectives
All disasters are possible
May 2004 Dieter W. Storr -- [email protected]
79
Helpful Links Software AG - ADABAS Recovery
http://www.softwareag.com/adabas/news/vers_7.htmhttp://servline24.softwareag.com/SecuredServices/ <Knowledge Center - ADABAS>
ADABAS Restart and Recovery (Operations Manual)http://servline24.softwareag.com/SecuredServices/ <Knowledge Center - Product Documentation>
University of Arkansas - D/R Planhttp://www.uark.edu/staff/drp/
Disaster Recovery Journal http://www.drj.com
May 2004 Dieter W. Storr -- [email protected]
80
Helpful Links FlashCopy
http://www.share.org/proceedings/sh97/data/S9111.PDFhttp://www.storage.ibm.com/hardsoft/products/ess/pubs/f2ahs05.pdf
Shark (ESS)http://www.almaden.ibm.com/cs/shark/ http://www.storage.ibm.com/hardsoft/disk/index.html
State of the Art Storagehttp://www.networkmagazine.com/article/NMG20010104S0002/2
EMC TimeFinderhttp://www.emc.com/products/software/timefinder.jsp
Entire Transaction Propagator (SAG)http://servline24.softwareag.com/SecuredServices/document/html/etp151/pdf/man.pdf