Exploring Database Archival Strategies With emphasis on “Why” and “When” before delving into...
-
Upload
jewel-riley -
Category
Documents
-
view
213 -
download
0
Transcript of Exploring Database Archival Strategies With emphasis on “Why” and “When” before delving into...
Exploring Database Archival StrategiesExploring Database Archival Strategies
With emphasis on “With emphasis on “WhyWhy” and “” and “WhenWhen” ” before delving into “before delving into “How toHow to” options” options
By: By: Ben AminniaBen Aminnia
President, L.A. SQL Server Professionals Group President, L.A. SQL Server Professionals Group www.sql.lawww.sql.la Database Architect, Pointer Corporation www.pointercorp.com Database Architect, Pointer Corporation www.pointercorp.com
22
ObjectivesObjectives
What “What “isis” and what “” and what “is notis not” covered in this ” covered in this presentationpresentation
The first question we should ask ourselves is The first question we should ask ourselves is not “not “How to archiveHow to archive” but rather …” but rather …
Why do we need to archive and what happens Why do we need to archive and what happens if we don’t?if we don’t?
And then …And then …
33
ObjectivesObjectives
The “How to” part will encompass multiple The “How to” part will encompass multiple questions …questions …
How do we plan and design the archive from How do we plan and design the archive from an architect’s perspective?an architect’s perspective?
How do we look for alternative approaches?How do we look for alternative approaches?How do we choose among those alternative How do we choose among those alternative
approaches?approaches?And more importantly …And more importantly …Remember that we’re not alone in the decision Remember that we’re not alone in the decision
process.process.
44
Management ConcernsManagement Concerns
Before the archive process starts, management must Before the archive process starts, management must approve the approach.approve the approach.
How do we defend and justify our selected approach?How do we defend and justify our selected approach? When talking to IT ManagementWhen talking to IT Management When talking to Non-IT Management; CEO’s; CFO’sWhen talking to Non-IT Management; CEO’s; CFO’s
How do we put ourselves in their shoes?How do we put ourselves in their shoes?How do we build a decision matrix to compare How do we build a decision matrix to compare
various alternative approaches?various alternative approaches?What criteria columns should we put on the decision What criteria columns should we put on the decision
matrix?matrix?What’s the cost / benefit summary of different What’s the cost / benefit summary of different
alternatives on our decision matrix?alternatives on our decision matrix?
55
Management ConcernsManagement Concerns
How do we measure cost / benefit?How do we measure cost / benefit? One-time / Initial CostsOne-time / Initial Costs Recurring / Annual CostsRecurring / Annual Costs What about measuring benefits? Have you thought What about measuring benefits? Have you thought
about this?about this?
Do we have a Service Level Agreement Do we have a Service Level Agreement (SLA)?(SLA)?
66
Management ConcernsManagement Concerns
But the challenge of seeking management approval But the challenge of seeking management approval may go well beyond that;may go well beyond that;
They may also ask: They may also ask: How did we get here?How did we get here?Why didn’t we do this and that earlier?Why didn’t we do this and that earlier?Why can’t we save nothing and just recreate?Why can’t we save nothing and just recreate?Trying to solve one problem while creating other Trying to solve one problem while creating other
problems …problems … L.A. Traffic is BadL.A. Traffic is Bad Make all highways one-way to Big Bear LakeMake all highways one-way to Big Bear Lake That will solve L.A.’s traffic problemThat will solve L.A.’s traffic problem Let the mayor of Big Bear Lake worry about their Let the mayor of Big Bear Lake worry about their
traffic problem; that’s not my problem!traffic problem; that’s not my problem!
77
Why Do We Archive?Why Do We Archive?
Increasing Cost of Storage / HardwareIncreasing Cost of Storage / HardwarePerformance Degradation / Response Performance Degradation / Response
TimeTimeRegulatory / Government RequirementsRegulatory / Government RequirementsApplication RequirementsApplication Requirements
Must show current year onlyMust show current year only Data transfer to disconnected usersData transfer to disconnected users Part of a bigger picture, beyond the scope of Part of a bigger picture, beyond the scope of
our role in the projectour role in the project
It’s part of SLA!It’s part of SLA!
88
When Do We Archive?When Do We Archive?
Once a yearOnce a year During spring cleaning seasonDuring spring cleaning season
When something breaks unexpectedly When something breaks unexpectedly and then everyone wakes up and says and then everyone wakes up and says ““Oops! We forgot to archive.Oops! We forgot to archive.””
When we have budgetWhen we have budgetWhen we have nothing else to doWhen we have nothing else to doWhen we are told to get it done by When we are told to get it done by
MondayMonday
99
When Do We Archive?When Do We Archive?
When DB size approaches a predefined When DB size approaches a predefined threshold …threshold …
1 GB1 GB 10 GB10 GB 100 GB100 GB 1 TB1 TB
The important point is to understand the issue The important point is to understand the issue and to have a strategy for addressing it.and to have a strategy for addressing it.
Using ASP.NET, XSLT, and XML to TakeUsing ASP.NET, XSLT, and XML to TakeSQL Server SQL Server toto aa New New HeightHeight
SQL Server as a Document RepositorySQL Server as a Document Repository
Part 4 - Part 4 - The Database Archival ChallengeThe Database Archival Challenge
By: By: Ben AminniaBen Aminnia
President, L.A. SQL Server Professionals Group President, L.A. SQL Server Professionals Group www.sql.lawww.sql.la Database Architect, Pointer Corporation www.pointercorp.com Database Architect, Pointer Corporation www.pointercorp.com
1212
AgendaAgenda
Part 1 Review Summary: Part 1 Review Summary: Background and Background and Overview of the VIP System ArchitectureOverview of the VIP System Architecture
Part 2 Review Summary: Part 2 Review Summary: Generating Reports & Generating Reports & Graphs with SSRS and MS-ChartGraphs with SSRS and MS-Chart
Part 3 Review Summary:Part 3 Review Summary: The Road Ahead – The Road Ahead – Using SQL Server as a Document RepositoryUsing SQL Server as a Document Repository
Part 4:Part 4: The Database Archival Challenge! The Database Archival Challenge!Questions and AnswersQuestions and Answers
1313
Architectural Notes and Architectural Notes and Challenge for the DBAChallenge for the DBA
Each record is about 100 KB large;Each record is about 100 KB large; So it takes about ten thousand records to reach So it takes about ten thousand records to reach
one GB in DB size;one GB in DB size; There’s no physical deletion; deleted records There’s no physical deletion; deleted records
are only marked for deletion (with are only marked for deletion (with [isdeleted]=1[isdeleted]=1););
1414
Some Fundamental QuestionsSome Fundamental Questions
There are many questions on There are many questions on HowHow, , WhatWhat, and , and WhyWhy to deal with database archival process. to deal with database archival process. Who’s on 1Who’s on 1stst??
Should we plan archival when we’re running out Should we plan archival when we’re running out of space? … or performance has gone down? … of space? … or performance has gone down? … or some other company policy mandates it?or some other company policy mandates it?
OR … as the famous database architect, OR … as the famous database architect, Julie Julie AndrewsAndrews, sings in , sings in The Sound of MusicThe Sound of Music … …
LET’S START AT THE VERY BEGINNING!LET’S START AT THE VERY BEGINNING!
1515
The First Question:The First Question:HOW?HOW?, , WHAT?WHAT?, or , or WHYWHY??
There are six different possible orders to ask There are six different possible orders to ask these questions …these questions …
I think the answer is:I think the answer is:
• WHY?WHY?• WHAT?WHAT?• HOW?HOW?
1616
Why?Why? Increasing cost of storage / hardwareIncreasing cost of storage / hardware Performance Degradation / Response TimePerformance Degradation / Response Time Application requirements (e.g. must show Application requirements (e.g. must show
current year only)current year only)
1717
What and Where to?What and Where to? The whole record is moved to another location The whole record is moved to another location
and deleted from the main locationand deleted from the main location Only part of the record is movedOnly part of the record is moved Destination could be …Destination could be …
To another DBTo another DB To the file systemTo the file system No longer onlineNo longer online
1818
How?How? How to archive?How to archive? How to retrieve the archived record?How to retrieve the archived record? What are the possible alternative from an What are the possible alternative from an
architectural perspective?architectural perspective?
1919
Four Alternative Ways …Four Alternative Ways …
Method 1:Method 1: Store in archived Store in archived location (e.g. on the network location (e.g. on the network file system) from the beginning.file system) from the beginning.
There’s nothing to archive periodically or at a There’s nothing to archive periodically or at a later time.later time.
This used to be the most common way for This used to be the most common way for document archival, before the XML technology document archival, before the XML technology which we started to use in the VIP Letters which we started to use in the VIP Letters system.system.
I stored over 50,000 documents from one of my I stored over 50,000 documents from one of my applications this way.applications this way.
2020
Four Alternative Ways …Four Alternative Ways …
Method 2:Method 2: Periodic archive of the Periodic archive of the whole record to a different database.whole record to a different database.
Move last year’s data to a different database Move last year’s data to a different database with identical format.with identical format.
Delete the entire archived record from the main Delete the entire archived record from the main database.database.
Main database remains small and portable.Main database remains small and portable. Helps with response time.Helps with response time. Also helps with portability (e.g. when laptop Also helps with portability (e.g. when laptop
users need to have a small version of database users need to have a small version of database on their local drive, while disconnected from on their local drive, while disconnected from corporate network or the internet).corporate network or the internet).
2121
Four Alternative Ways …Four Alternative Ways …
Method 3:Method 3: Partial archive of the Partial archive of the old records …old records …
This is the case of our VIP Letters architecture.This is the case of our VIP Letters architecture. Each record is about 100 KB.Each record is about 100 KB. 10,000 records are almost 1.0 GB.10,000 records are almost 1.0 GB. Most of it is the XML column data which holds Most of it is the XML column data which holds
the saved letter.the saved letter. The archive process will move the XML part of it The archive process will move the XML part of it
to the network share on the file system …to the network share on the file system … Keeping the other data columns in the main DB.Keeping the other data columns in the main DB. You then set the “IsArchived” column to 1.You then set the “IsArchived” column to 1.
2222
Four Alternative Ways …Four Alternative Ways …
Method 4:Method 4: No longer online … No longer online … Very common practice when a government or Very common practice when a government or
regulatory agency mandates only x number of regulatory agency mandates only x number of years to keep records online.years to keep records online.
The archived records are then scanned and The archived records are then scanned and stored offline (on a tape or in paper form).stored offline (on a tape or in paper form).
2424
In SummaryIn Summary
Archival is done in Phase 4Archival is done in Phase 4And that’s when it should be doneAnd that’s when it should be doneNobody said archival should be done in Nobody said archival should be done in
Phase 1Phase 1But we should have it in sight – on the But we should have it in sight – on the
horizon – from the beginninghorizon – from the beginningArchival planning / implementation Archival planning / implementation
should not come as a surprise!should not come as a surprise!Example:Example: DB and Website Size Tracking DB and Website Size Tracking
(before the archival time)(before the archival time)
2525
Four Methods to ArchiveFour Methods to Archive
Again, from my other presentation, we Again, from my other presentation, we looked into four methods to archive:looked into four methods to archive:
Method 1:Method 1: Store in archived location (e.g. on Store in archived location (e.g. on the network file system) from the beginning.the network file system) from the beginning.
Method 2:Method 2: Periodic archive of the whole Periodic archive of the whole record to a different database.record to a different database.
Method 3:Method 3: Partial archive of the old records Partial archive of the old records Method 4:Method 4: No longer online No longer online
2626
A Closer Look at Method 3A Closer Look at Method 3
Method 3:Method 3: Partial archive of the old records Partial archive of the old recordsSave a 2Save a 2ndnd copy of the document on the file system copy of the document on the file system
and then delete it from the XML column of the and then delete it from the XML column of the databasedatabase
What tracking columns do we add to the main What tracking columns do we add to the main Document Archive table?Document Archive table?
How / when to copy document(s) to the file system How / when to copy document(s) to the file system ……
During the original creationDuring the original creation Later; one-by-one; on demandLater; one-by-one; on demand Later; in batches (e.g. older than 1/1/2005)Later; in batches (e.g. older than 1/1/2005)
How to retrieve it back from the file systemHow to retrieve it back from the file system
2727
A Closer Look at Method 4A Closer Look at Method 4
Method 4:Method 4: No longer online No longer onlineThat is, neither in the XML column nor on the That is, neither in the XML column nor on the
file systemfile systemHow do we recreate it later when needed?How do we recreate it later when needed?Scan the paper copy?Scan the paper copy?I don’t think so!I don’t think so!Regenerate using original letter’s parameter Regenerate using original letter’s parameter
values which are still in the database?values which are still in the database?What if the original template (XSLT) has What if the original template (XSLT) has
changed and recreate doesn’t look like the changed and recreate doesn’t look like the original anymore or it fails?original anymore or it fails?
2828
Beyond Four MethodsBeyond Four Methods
Method 5:Method 5: Totally move from primary database Totally move from primary database to anther (non-DB) medium (e.g. network share to anther (non-DB) medium (e.g. network share / tape)/ tape)
More common in legacy system; Not one of my More common in legacy system; Not one of my options.options.
Method 6:Method 6: Don’t move anything; just set the Don’t move anything; just set the archived flag and use a VIEW to filter out archived flag and use a VIEW to filter out archived records.archived records.
Makes sense if the objective is “visibility” and Makes sense if the objective is “visibility” and not “space” or “response time” or “data not “space” or “response time” or “data transmission between network and local for transmission between network and local for disconnected mode”disconnected mode”
2929
Beyond Four MethodsBeyond Four Methods
Method 7:Method 7: Use “The Cloud”Use “The Cloud”A viable alternative for space limitations in a A viable alternative for space limitations in a
hosted environment or cost considerationshosted environment or cost considerationsIt’s really NOT an archival alternative.It’s really NOT an archival alternative.
Gave it a shotGave it a shot Asked for a compatibility testAsked for a compatibility test Nice presentation, but not compatible!Nice presentation, but not compatible! Analogy with a personal power-generator vs. Analogy with a personal power-generator vs.
electrical outlet connected to DWP …electrical outlet connected to DWP …
3030
The “Cloud” - The “Cloud” - BeforeBefore
My Personal My Personal Power GeneratorPower Generator
3232
The “Cloud” - The “Cloud” - AfterAfter
XXWhat if I need a 150 V generator?What if I need a 150 V generator?
3333
Final ThoughtsFinal ThoughtsBefore Meeting with ManagementBefore Meeting with Management
Be Prepared to Answer QuestionsBe Prepared to Answer QuestionsWhich functions will NOT work on archived records? Which functions will NOT work on archived records?
(e.g. Full-Text Search)(e.g. Full-Text Search)Have a Decision Matrix, showing all options with Have a Decision Matrix, showing all options with
pros and cons of eachpros and cons of eachUser Interface to Retrieve an Archived RecordUser Interface to Retrieve an Archived Record
From the main applicationFrom the main application From a secondary application solely for the purpose of From a secondary application solely for the purpose of
archive retrievalarchive retrieval By sending a request to a service application or email to a By sending a request to a service application or email to a
designated contactdesignated contact
Turnaround time and other pros and cons for each Turnaround time and other pros and cons for each of the above approachesof the above approaches
3434
The Solution MatrixThe Solution MatrixSolution Alternatives Problems Solved
Cost of Storage
Performance / Response Time
Application Requirements
1. Store in archive location from beginning ? ? ?
2. Move from primary to history DB √ √ √
3. Partially move; e.g. keep the record but empty the XML column
√ √ X
4. Just delete the XML column without moving anything; rebuild (based on most recent template) if necessary
√ √ X
5. Move from primary table to anther (non-DB) medium (e.g. network / tape)
√ √ √
6. Don’t move anything; use filters X X √
7. Use “the cloud” √ X X
3535
Ideally, each solution alternative on the matrix Ideally, each solution alternative on the matrix needs the following:needs the following:
Development CostsDevelopment Costs Infrastructure CostsInfrastructure Costs Maintenance CostsMaintenance Costs Which problem(s) are being addressedWhich problem(s) are being addressed Which problem(s) are NOT being addressedWhich problem(s) are NOT being addressed What new problems might be introducedWhat new problems might be introduced
Eventually, as the IT architects, we will Eventually, as the IT architects, we will be responsible for the outcome!be responsible for the outcome!
The Solution MatrixThe Solution MatrixFinal ThoughtsFinal Thoughts
3737
Contact InformationContact Information
• Emails:Emails: [email protected] [email protected] [email protected] [email protected]
• Websites:Websites: www.sql.lawww.sql.la www.pointercorp.com www.pointercorp.com www.vipletters.com www.vipletters.com