Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation
-
Upload
xerxes-dominguez -
Category
Documents
-
view
48 -
download
1
description
Transcript of Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation
![Page 1: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/1.jpg)
1
Peter Fox
Data Science – CSCI/ERTH/ITWS-4350/6350
Week 12, November 26, 2013
Webs of Data and Data on the Web, the Deep Web, Data
Discovery, Data Citation
![Page 2: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/2.jpg)
Contents• Review of reading assignment
• Webs of data and semantic web
• Data on the web, linked data
• Deep web
• Data discovery
• Data citation
• Summary
• Next week
2
![Page 3: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/3.jpg)
Reading• Mealy
• Wickett et al.
• Data Quality European Union Presentation
• ISO Technical Standards - General Reference
3
![Page 4: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/4.jpg)
Webs of data (science)• Early Web - Web of pages
• http://www.ted.com/index.php/talks/tim_berners_lee_on_the_next_web.html
• Semantic web started as a way to facilitate “machine accessible content”– Initially was available only to those with familiarity
with the languages and tools, e.g. your parents could not use it
• Webs of data grew out of this– One specific example is W3C’s Linked Open
Data
4
![Page 5: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/5.jpg)
Semantic Web• http://www.w3.org/2001/sw/
• “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF)...”
5
![Page 6: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/6.jpg)
6
Terminology• Semantic Web
– An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www.semanticweb.org
– Primer: http://www.ics.forth.gr/isl/swprimer/ • Ontology (n.d.). The Free On-line Dictionary of
Computing. http://dictionary.reference.com/browse/ontology– An explicit formal specification of how to
represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
![Page 7: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/7.jpg)
7
Semantic Web Layers
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
![Page 8: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/8.jpg)
8
Application Areas for SW• Smart search• Annotation (even simple forms), smart tagging• Geospatial• Implementing logic (rules), e.g. in workflows• Data integration• Verification …. and the list goes on• Web services• Web content mining with natural language parsing• User interface development (portals)• Semantic desktop• Wikis - OntoWiki, SemanticMediaWiki• Sensor Web• Software engineering• Explanation
![Page 9: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/9.jpg)
9
Semantic Web Basics• The triple: {subject-predicate-object}
Interferometer is-a optical instrumentOptical instrument has focal length
• W3C is the primary (but not sole) governing org.– RDF– OWL 1.0 and 2.0 - Ontology Web Language
• RDF – programming environment for 14+ languages, including C, C++,
Python, Java, Javascript, Ruby, PHP,...(no Cobol or Ada yet ;-( )
• OWL programming for Java
• Closed World - where complete knowledge is known (encoded), AI relied on this
• Open World - where knowledge is incomplete/ evolving, SW promotes this
![Page 10: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/10.jpg)
10
Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
![Page 11: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/11.jpg)
11
Semantic Web Myths• ‘the Semantic Web is a reincarnation of Artificial Intelligence
on the Web’ (closed world versus open world)• ‘it relies on giant, centrally controlled ontologies for
"meaning" (as opposed to a democratic, bottom-up control of terms)’
• ‘one has to add metadata to all Web pages, convert all relational databases, and XML data to use the Semantic Web’
• ‘one has to learn formal logic, knowledge representation techniques, description logic, etc, to use it’
• ‘it is, essentially, an academic project, of no interest for industry’
![Page 12: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/12.jpg)
12
Integrating Multiple Data Sources
• The Semantic Web lets us merge statements from different sources
• The RDF Graph Model allows programs to use data uniformly regardless of the source
• Figuring out where to find such data is a motivator for Semantic Web Services
#Ionosphere #magnetic
“100”“TerrestrialIonosphere”
name
hasCoordinates
hasLowerBoundaryValue
Different line & text colors represent different data sources
hasLowerBoundaryUnit
“km”
![Page 13: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/13.jpg)
13
Drill Down /Focused Perusal• The Semantic Web uses Uniform
Resource Identifiers (URIs) to name things
• These can typically be resolved to get more information about the resource
• This essentially creates a web of data analogous to the web of text created by the World Wide Web
• Ontologies are represented using the same structure as content– We can resolve class and
property URIs to learn about the ontology
InternetInternet
…#NeutralTemperature
...#ISR
…#Norway
…#EISCAT
measuredby
type
locatedIn
...#FPI
...#MilllstoneHill
operatedby
![Page 14: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/14.jpg)
14
Statements about Statements• The Semantic Web allows us to
make statements about statements– Timestamps
– Provenance / Lineage
– Authoritativeness / Probability / Uncertainty
– Security classification
– …
• This is an unsung virtue of the Semantic Web
#Aurora
Red
#Danny’s
20031031
hascolor
hasSource
hasDateTime
Ontologies Workshop, APL May 26, 2006
![Page 15: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/15.jpg)
15
‘Collecting’ the ‘data’
• Part of the (meta)data information is present in tools ... but thrown away at output e.g., a business chart can be generated by a tool: it ‘knows’ the structure, the classification, etc. of the chart, but, usually, this information is lost storing it in web data would be easy!
• SW-aware tools are around (even if you do not know it...), though more would be good: – Photoshop CS stores metadata in RDF in, say, jpg files
(using XMP)– RSS 1.0 feeds are generated by (almost) all blogging
systems (a huge amount of RDF data!)
![Page 16: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/16.jpg)
16
‘Collecting’ the ‘data’• Scraping - different tools, services, etc, come
around every day: – get RDF data associated with images, for
example: service to get RDF from flickr images– service to get RDF from XMP
– XSLT scripts to retrieve microformat data from XHTML files
– scripts to convert spreadsheets to RDF – e.g. see csv2rdf4lod and the tools, tutorials, demos at http://logd.tw.rpi.edu
– schema.org and the datasets extension
![Page 17: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/17.jpg)
17
‘Collecting’ the ‘data’• SQL - A huge amount of data in Relational Databases
– Although tools exist, it is not feasible to convert that data into RDF
– Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF data is transformed into SQL on-the-fly
– Reading for this week, article by Berners Lee and Sahoo et al.
– RDB2RDF W3 working group - http://www.w3.org/2001/sw/rdb2rdf/
– D2RQ/ D2RServer– Commercial solutions appearing
• NoSQL• Other ‘graph’ forms…
![Page 18: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/18.jpg)
18
More Collecting• RDFa extends XHTML by:
– extending the link and meta to include child elements– add metadata to any elements (a bit like the class in
microformats, but via dedicated properties)– Used in schema.org/ datasets
• It is very similar to microformats, but with more rigor: – it is a general framework (instead of an ‘agreement’ on
the meaning of, say, a class attribute value)– terminologies can be mixed more easily
• GRDDL - Gleaning Resource Descriptions from Dialects of Languages
![Page 19: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/19.jpg)
Linked open data• http://linkeddata.org/guides-and-tutorials
• http://tomheath.com/slides/2009-02-austin-linkeddata-tutorial.pdf (we will look at some of these slides now, #1-25 and 30-37)
• And of course:– http://logd.tw.rpi.edu/
19
![Page 20: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/20.jpg)
September 2011 - http://lod-cloud.net/
20
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
![Page 21: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/21.jpg)
(Class 2) Management• Creation of logical collections
• Physical data handling
• Interoperability support
• Security support
• Data ownership
• Metadata collection, management and access.
• Persistence
• Knowledge and information discovery
• Data dissemination and publication 21
![Page 22: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/22.jpg)
Data Management and WOD• How is the data managed?
– Found?– Curated?
• What about the metadata?
• What problems are introduced/ solved?
• See discussion in: Parsons and Fox (2012): http://mp-datamatters.blogspot.com/
22
![Page 23: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/23.jpg)
Data on the Web, Internet• Data behind web services
• Data files on web sites
• We have covered data as service approaches (week 11)
• Thinking you have found data when you have really only found information and metadata
• The real difference between this topic and the next one is:– Access and dissemination– Level of curation (and often description) 23
![Page 24: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/24.jpg)
Data on the internet• http://www.dataspaceweb.org/
• Data files on other protocols– FTP– RFTP– GridFTP– SABUL– XMPP/AMQP– Others…
24
![Page 25: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/25.jpg)
Deep web• Data behind web services
• Data behind query interfaces (databases or files)
• Introduces a different curation problem
25
![Page 26: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/26.jpg)
The loose definition• Something that a crawler cannot find and/or
index– Creates the other definition of shallow web
• Has many implications for discovery, access and use
• Curation is more complex to satisfy this definition, i.e. not a matter of just putting files ‘on the web’
• 50, 100, 1000 times the ‘shallow web’? 26
![Page 27: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/27.jpg)
Managing (in) the deep web• Sometimes, the deep web aspect of a data
source can be due to extreme obscurity, language peculiarities, NO metadata, NO documentation
• There are no known studies of how effective data management (what you are learning) could change the percentage of deep/ shallow
• Semantics are often put forward as a solution http://www.mkbergman.com/458/new-currents-in-the-deep-web/ 27
![Page 28: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/28.jpg)
Internet impacts on management
• Management of data that is… on the Internet!
• Web –> ‘stateless’
• Curation, Preservation –> highly stateful (by definition)
• You will hear terms such as digital curation and digital preservation but what about internet curation and internet preservation (Internet Archive)? 28
![Page 29: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/29.jpg)
Thus data frameworks are appearing
• Many – meaning they go beyond web sites, they incorporate many of the data management functions
• Initially syntactic – e.g. OPeNDAP, ADDE, ODATA, OODT
• Application oriented – e.g. virtual observatories– Semantic – e.g. Virtual Solar-Terrestrial
Observatory
• ALL of these are changing the nature of data management and role of data ‘providers’
29
![Page 30: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/30.jpg)
30
![Page 31: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/31.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
31
Some DefinitionsDAP = Data Access Protocol
Model used to describe the data; Request syntax and semantics; and Response syntax and semantics.
OPeNDAP The software; Numerous reference implementations; Core/libraries and services (servers and clients).
OPeNDAP Inc. OPeNDAP is a 501.c(3) non-profit corporation; Formed to maintain, evolve and promote the
discipline neutral DAP that was the DODS core infrastructure.
![Page 32: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/32.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
32
Considerations with regard to the development of DAP and OPeNDAP
Many data providers
Many data formats
Many different semantic representations of the data
Many different security requirements
Many different client types
![Page 33: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/33.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
33
Broad Vision
A world in which a single data access protocol is used for the exchange of data between network based applications regardless of discipline.
A layer above TCP/IP providing for syntactic and semantic consistency not available in existing protocols such as FTP.
![Page 34: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/34.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
34
Practical Practical Considerations
The broad vision:
Is syntactically achievable, but
Was not semantically achievable, at least not fully, but perhaps in the near term.
![Page 35: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/35.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
35
The DAP has been designed to be as general as possible without being constrained to a particular discipline or world view.
The Data Access Protocol (DAP)
The DAP is a discipline neutral data access protocol; it is being used in astronomy, medicine, earth science,…
Provides data format and location, and data organization transparency
Is metadata neutral
![Page 36: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/36.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
36
OPeNDAP V4 (Hyrax) Architecture
OLFS BES
OPeNDAP Lightweight Front end Server (OLFS) Receives requests and asks the BES to fill them Uses Java Servlets Does not directly ‘touch’ data Multi-protocol
Data
Back End Server (BES) Reads data files, Databases, et c., returns info May return DAP2 objects or other data Does not require web server
Client
![Page 37: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/37.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
37
netCDF C
Ferret GrADS
netCDF Java
IDV VisAD ncBrowse
Matlab
MatlabClient
Access ExcelIDL
IDLClient
ArcGIS
pyDAP
OPeNDAP Clients
ArcGIS
pyDAP
NCL
NCLClient
Internet
WebBrowser OPeNDAP
DataConnector
![Page 38: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/38.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
38
Data Data Data Data Data Data Data
HDF5
HDF4 JDBC
FreeFormFITS
CDF CEDAR
Data
netCDF
netCDF HDF4 HDF5
Data
DSP
DSP
Data
JGOFS
Tables SQL FITS CDFFlat
Binary CEDAR
Data
General
ESML
OPeNDAP Servers
CDM
Internet
![Page 39: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/39.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
39
Data
GRIBBUFR
OPeNDAP
GDS
Data
CODAR
CODAR
Data
FDS
netCDFOPeNDAP
Data
General
pyDAP
Data
DAPPER
netCDFOPeNDAP
Data
netCDFOPeNDAP
TDS
Data
General
pyDAP
Data
netCDFOPeNDAP
TDS
OPeNDAP Servers (specialized processing)
Data
ESG
netCDFOPeNDAP
Internet
![Page 40: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/40.jpg)
BOM, Melbourne, VIC 20071015 (Fox)
40
Servers
Servers may also provide other services
Directory traversal.
Browser-based form to build URL.
Ascii or other representations of data.
Metadata associated with the data.
Server side functions.
![Page 41: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/41.jpg)
Summary
Tetherless World Constellation
41
![Page 42: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/42.jpg)
Data discovery• Free text search on the internet/ web
• Data portals
• What makes discovery work?– For Deep Web?– For Linked Data?
42
![Page 43: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/43.jpg)
Data discovery• What makes discovery work?
– Metadata– Logical organization– Attention to the fact that someone would want to
discover it– It turns out that file types are a key enabler or
inhibitor to discovery
• What does not work?– Result ranking using *any* conventional
algorithms43
![Page 44: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/44.jpg)
Smart search• Semantically aware search, e.g.
http://noesis.itsc.uah.edu
• Faceted search, e.g. – mspace (http://mspace.fm )– jSpace – Exhibit (MIT)– S2S – e.g. International Open Government
Dataset Catalog (IOGDC; http://logd.tw.rpi.edu )
44
![Page 45: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/45.jpg)
NOESIS
45
![Page 46: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/46.jpg)
Search Application integration!
![Page 47: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/47.jpg)
Deep web dashboards…
47
![Page 48: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/48.jpg)
Intl. Open Govt. Data Cat.http://logd.tw.rpi.edu
![Page 49: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/49.jpg)
Federated search• “is the simultaneous search of multiple online
databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia
• Libraries have been doing this for a long time (Z39.50, ISO23950)
• Key is consistent search metadata fields (keywords)• E.g. Geospatial One Stop http://www.geodata.gov
49
![Page 50: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/50.jpg)
Data Citation• “Sound, reproducible scholarship rests upon
a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.”
(http://www.force11.org/datacitation) 50
![Page 51: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/51.jpg)
![Page 52: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/52.jpg)
![Page 53: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/53.jpg)
Landing page – a short form
http://data.rpi.edu/repository/handle/10833/24
![Page 54: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/54.jpg)
Long formhttp://data.rpi.edu/repository/handle/10833/24?show=full
![Page 55: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/55.jpg)
Conneg• Many examples, but what follows is ~ from:
http://www.crosscite.org/cn/
• Also see - http://labs.crossref.org/ and http://data.datacite.org/
• What is it?
– Es ce que vous parlez Français?– Do you speak html or JSON or RDF?
![Page 56: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/56.jpg)
Conneg
![Page 57: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/57.jpg)
Application level, e.g. as JSON
![Page 58: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/58.jpg)
Supported content types..
![Page 59: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/59.jpg)
Example formatting..
![Page 60: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/60.jpg)
Other coolness…• curl -LH "Accept:
application/vnd.crossref.unixref+xml;q=1, application/rdf+xml;q=0.5" http://dx.doi.org/10.1126/science.169.3946.635
• curl http://data.datacite.org/application/x-datacite+text/10.5524/100005– Li, J; Zhang, G; Lambert, D; Wang, J; (2011):
Genomic data from the Emperor penguin (Aptenodytes forsteri); GigaScience. http://dx.doi.org/10.5524/100005
![Page 61: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/61.jpg)
Further integration..
![Page 62: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/62.jpg)
Additional refs.• EPIC for identifier conventions:
http://pidconsortium.eu
• Dspace and Handle - http://tw.rpi.edu/web/project/Data.rpi.edu/Architecture (install/ config notes in PDF and in Section 4.4.4, page 55 of Handle installation manual for V 1.8)
![Page 63: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/63.jpg)
Aren’t you happy this is the last lecture?
• Otherwise – we’d go into a long discussion of the merit of data citation to achieve the business case in an earlier slide
• It would be enlightening but torture…
63
![Page 64: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/64.jpg)
Summary• Theme of data management in the chaotic
and enabling environment of the web, internet
• Emergence of frameworks that encompass some aspects of data management
• Unlocking data in a useful way is an immense challenge (discovery, citation?)
• Anything/ everything you can do by following what you have learned in this course will help
64
![Page 65: Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Citation](https://reader035.fdocuments.net/reader035/viewer/2022062407/56812af8550346895d8edf24/html5/thumbnails/65.jpg)
What is next• Dec. 3 – project presentations
• Final assignment to be handed in today!
• Reading for this week: – Semantic Deep Web, James Geller, Soon Ae
Chun, and Yoo Jung An, – The Deep Web (Internet Tutorials) – Digital Image Resources on the Deep Web– Parsons and Fox: Is Data Publication the Right
Metaphor?
• Class evaluations…65