13.30 – 15.15 : 17 May 2016
Oxford Room
Preserving Transactional Data: a DPC Study for the Big Data
NetworkSara Day Thomson
Project Officer, Digital Preservation Coalition
Digital Preservation & Web Archiving Café
Breakout F1-1
www.dpconline.org
Preserving Transactional DataIRMS Conference 2016
Sara Day Thomson | @sdaythomson
17 May 2016
Our digital memory accessible tomorrow…
UK Data Service
www.dpconline.org
Outline
Background
Defining Transactional Data
Databases: SQL v NoSQL
Preservation Challenges
Legal Review & Anonymization
Conclusions
Tweet
Share your thoughts on Preserving Transactional Data
Me @sdaythomson
DPC @dpc_chat
UKDS @UKDSBigData
www.dpconline.org
Administrative
Data Research Network
Big Data Research Centres
DPC Technology
Watch
UK Data Service
Big Data Network Support
N Ireland
England Scotland
Wales
Urban Big Data
Centre
ESRC Business and Local
Gov’t Data
Consumer Data
Research Centre
Preserving Social Media
PreservingTransactional
Data
www.dpconline.org
Preserving Transactional DataDPC Technology Watch Report
http://dpconline.org/publications/technology-watch-reports
Available summer 2016
www.dpconline.org
Defining Transactional
Data Greater number of re-uses than imagined
for archived data in the past
Computationally combined to form
richer data Too large or volatile for traditional processing applications to handle
Raw data: numbers, symbols
Information: conclusions based on combinations of raw
data
Individual interactions
with a database
www.dpconline.org
Computationally combined to form
richer data
The value of ‘Big’ Data is the ability
to combine different data sources
+ =
www.dpconline.org
Greater number of re-uses than imagined for archived databases
in the past
Consumer AnalysisSpread of Disease
Network Analysis
Business Analysis
Geo-spatial AnalysisHuman Behaviour
Cyber CrimeEconomics
Energy Usage
Urban Planning
Regulatory Compliance
Public Services
Heritage Collections Healthcare
Environmental Science
Education
Governance
Transport
Compare archived
data with current data
Compare data from different
sourcesComputer Science
www.dpconline.org
Motivations for Long-term Preservation
• Reproducibility of analysis
• Availability of historical data
• Compliance and records management
www.dpconline.org
Preserving Databases
SQL (Relational)
• SQL somewhat standardised
• Established and supported
• Tested against ACID properties
• Does not always scale up
• Does not always support data arranged in a hierarchy
• Not necessary if reading is higher priority than writing
• Often interdependent on other databases
NoSQL (Other than relational)
• Scales up (usually multiple nodes)
• Allows relationships bn objects that are not the same
• Relies more heavily on application layer for storing information
• Not standardised
• Prioritises availability over ACID properties
• Fewer tested methods for archiving
= Preservation friendly
= Less preservation friendly
= Preservation neutral
www.dpconline.org
NoSQL: the Future?
Image source: DB_Engines, ‘RDBMS dominate the database market, but NoSQL systems are catching up’, by Paul Andlinger, 21 Nov 2013, http://db-engines.com/de/blog_post/23
Image source: DB_Engines, ‘DB-Engines Ranking -Trend Popularity’, May 2016, http://db-
engines.com/en/ranking_trend
www.dpconline.org
NoSQL?
Key-Value Database
Document Database
Column Family Store
Graph Database
Image source: ThoughtWorks, ‘NoSQL Databases: An Overview’ by Pramod Sadalage, 1 Oct 2014 https://www.thoughtworks.com/insights/blog/nosql-databases-overview
www.dpconline.org
Summary of Challenges
Volatility
Volume and capacity
Multiple entry points
Context
Data purpose
Legalities
www.dpconline.org
Interdependence
Information stored in Application Layer
People Directory
Customer Queries
Inventory
Application Application
Application
www.dpconline.org
EU Directive 96/9/EC (Database Directive) (1996)
UK Human Rights Act (1998)
Legal Review
UK Data Protection Act (1998)
European General Data Protection Regulation (GDPR)
(2018)
www.dpconline.org
EU Directive 96/9/EC ‘Database Directive’
• Exclusive rights holder
• Copyright protection
• Sui generis
UK Data Protection Act (1998)
• Protects personal data
• Data collected for one purpose cannot necessarily be re-used for another
There is no anonymization
in big data
www.dpconline.org
Thomas and Walport, ‘Data Sharing Review Report’,
11 July 2008
‘…in the vast majority of cases…the complexity of the law, amplified by a plethora of guidance, leaves those who may wish to share data in a fog of confusion’.
Laurie and Stevens, ‘The Administrative Data Research Centre Scotland: A scoping report on the legal & ethical issues arising from access & linkage of administrative data’, Research Paper Series No 2014/35
www.dpconline.org
Conclusions
• New approaches to preservation
• Selection, compatibility, metadata, and documentation
• Preserving more than data
• Planning for broader uses
www.dpconline.org
Actual Experience?
NookGoogleCodeGeoCitiesGoogleWaveknol
Yahoo 360del.icio.usMyBlogLogBeBo…
IS CORPORATE ABANDONMENT AS BIG A THREAT
TO THE DIGITAL ESTATE AS OBSOLESCENCE?
Friends ReUnitedYahoo Mail ClassicBlipfotoMySpaceBlogs…
@WilliamKilbride
www.dpconline.org
NEW! Digital Preservation Handbook
handbook.dpconline.org
Getting Started&
Making Progress in Digital Preservation
dpconline.org/events
Looking Forward
New Suite of EventsIncl. Digital Preservation for
Records [email protected]
E-Ark
www.eark-project.com
www.dpconline.org
Preserving Transactional DataDPC Technology Watch Report
http://dpconline.org/publications/technology-watch-reports
Available summer 2016
www.dpconline.org
Preserving Social MediaDPC Technology Watch Report
dx.doi.org/10.7207/twr16-01
www.dpconline.org
Thanks!
Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark
Questions? Comments?
Top Related