Http:// Dirk Roorda, coordinator infrastructure.
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Http:// Dirk Roorda, coordinator infrastructure.
![Page 2: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/2.jpg)
![Page 3: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/3.jpg)
Overview
Part 1: The rising role of data
Part 2: The free use of data
Part 3: The care for data
Part 4: The re-use of data
![Page 4: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/4.jpg)
Part 1: The rising role of data
http://en.wikipedia.org/wiki/Exabyte
Internet size (May 2009): 500 EB
500.000 PB
500 million TB
500 million fat USB disks
500 billion memory cards of 1 GB
70 memory cards per person
![Page 5: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/5.jpg)
Data deluge
http://www.datadeluge.com/ http://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svg
http://tolweb.org/tree/
![Page 6: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/6.jpg)
Where does it come from?• Instruments
• satellites, sensors, dna-sequencing
• Records• administrations, censuses, surveys
• Digitisation• the analog legacy
• Hobby• pictures, movies, genealogy
• Integration• better interoperability of existing data
![Page 7: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/7.jpg)
The driving force
Information and Communication Technology
Babbage Analytical Engine1870
![Page 8: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/8.jpg)
A datacenter
Genealogy
2,5 PB
5328 servers
1,12 MW
http://blog.familytreemagazine.com/insider/Inside+Ancestrycoms+TopSecret+Data+Center.aspx
http://www.ancestry.com/
![Page 9: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/9.jpg)
A closer look
• Linguistics• text corpora, automatic translation
• Philology• how to read a million books?
• History• historical census data
• Archeology• archive law, commercial research
![Page 10: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/10.jpg)
Linguistics and PhilologyA chronometric approach to Indian alchemical literatureAssessing frequency changes in multistage diachronic corporaEvaluating methods for computer-assisted stemmatology using artificial benchmark data sets A Corpus Study of the Rigveda Dictionary generation for less-frequent language pairs using WordNetAn exercise in non-ideal authorship attribution: the mysterious Maria Ward
http://llc.oxfordjournals.org/
![Page 13: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/13.jpg)
Archaeology
http://edna.itor.org/nl/intern/upload_directory/a00002/downloads/IMG0013.tif
![Page 14: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/14.jpg)
Archaeology (2)
http://edna.itor.org/nl/oai/oai_addi/oai_addi/OAI:EVALMA:a00002.xml/
![Page 15: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/15.jpg)
Part 2: The free use of Data
![Page 16: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/16.jpg)
Open Access
Data is information
Information is knowledge
Knowledge is power
Why share it?
![Page 17: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/17.jpg)
Open Access
Shared knowledge is double knowledge
Without free sharing of knowledge,
scientific progress will halt
Tensions between sharing and not sharing remain, though
![Page 19: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/19.jpg)
![Page 20: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/20.jpg)
![Page 21: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/21.jpg)
Work to do
• organise your data• let your data work together with those of
others • (colleagues, future scientists, the public)
• ask new questions to the data• because there is so much of it
• create new (virtual) data collections
![Page 22: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/22.jpg)
Part 3: The care for data
![Page 23: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/23.jpg)
Research Data Recycling
• existing data• collecting by experiments, surveys
• primary research data• verifying results by others• preserving unique data from experiments
• compilation, aggregation, annotation• databanks
• data mining, analysis, visualisation• new data as research input
![Page 24: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/24.jpg)
Challenge: Software
Operating system (DOS, Windows 95, ...)
Programming Languages (Basic, Pascal)
File formats (Word Perfect, dBase)
Applications (Addressbook, Websites)
Old data may be locked up in old software.
![Page 25: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/25.jpg)
Meeting the challenge
To prevent the problem in the futureBackward compatibility
Open Standards
Open Source Applications
Modular software engineering
keep data separated from interface and business logic
To remedy the problems of the pastEmulation
Migration
![Page 26: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/26.jpg)
Challenge: Human organisation
Forgotten jargon
Forgotten knowledge
No metadata
Websites with broken links
![Page 27: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/27.jpg)
Jargon
• II.17. Posterior berry aneurysm with subarachnoid bleed.
• II.18. Subarachnoid bleed with extension into the ventricles.
• II.19. Ruptured berry aneurysm at the end of the internal carotid artery, with obstructive hydrocephalus. Morgagni found the rupture.
• II.22. Subarachnoid hemorrhage.
http://www.pathguy.com/morgagni.htm
![Page 28: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/28.jpg)
Meeting the challenge
Persistent Identifiers
Enough Metadata
Codification of knowledge and practices
Wikipedia
Datamanagement early on
![Page 29: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/29.jpg)
Part 4: The re-use of data
![Page 30: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/30.jpg)
Data management
Use common infrastructure rather than private means
Use open formats rather than proprietary formats
Use open source software rather than closed software
Use standard ways of documenting data
taxonomies, ontologies, metadata schemes
![Page 31: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/31.jpg)
Common Infrastructure
Local file shares
University repository
DANS
European Infrastructures
![Page 33: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/33.jpg)
EASY
![Page 34: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/34.jpg)
Dataset
![Page 35: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/35.jpg)
Datafiles
![Page 36: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/36.jpg)
Metadata
![Page 37: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/37.jpg)
![Page 38: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/38.jpg)
linguists make their technology accessible- resources algorithms techniques
humanities and social sciences- they are the target users
![Page 39: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/39.jpg)
![Page 40: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/40.jpg)
Geleerdenbrieven=
Circulation of KnowledgeArchiving
=
circulation of information
![Page 41: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/41.jpg)
![Page 42: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/42.jpg)
![Page 43: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/43.jpg)
![Page 44: Http:// Dirk Roorda, coordinator infrastructure.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d2e5503460f94a05492/html5/thumbnails/44.jpg)
Keep imagining