In search of lost knowledge: joining the dots with Linked Data

52
In search of lost knowledge: joining the dots with Linked Data Jon Blower [email protected] Department of Meteorology Reading e-Science Centre National Centre for Earth Observation University of Reading With thanks to all CHARMe project partners!

description

These slides are from my seminar to the University of Reading Department of Meteorology, November 2013. They contain a (hopefully not very technical) introduction to the concepts of Linked Data and how we are applying them in the CHARMe project (http://www.charme.org.uk). In CHARMe we are using Open Annotation to connect users of climate data with community-generated "commentary information" that helps them to understand a dataset's strengths and weaknesses. The slide notes contain some helpful context, so you might like to download the PPT file! The slides are licensed as "Creative Commons Attribution 3.0", meaning that you can do what you like with these slides provided that you credit the University of Reading for their creation. See http://creativecommons.org/licenses/by/3.0/.

Transcript of In search of lost knowledge: joining the dots with Linked Data

Page 1: In search of lost knowledge: joining the dots with Linked Data

In search of lost knowledge: joining the dots with Linked Data

Jon [email protected]

Department of MeteorologyReading e-Science Centre

National Centre for Earth ObservationUniversity of Reading

With thanks to all CHARMe project partners!

Page 2: In search of lost knowledge: joining the dots with Linked Data

platform

instrument

algorithm

dataset

publicationscientists

Page 3: In search of lost knowledge: joining the dots with Linked Data
Page 4: In search of lost knowledge: joining the dots with Linked Data
Page 5: In search of lost knowledge: joining the dots with Linked Data

How do you fill in the blanks?

• Hope the documentation contains pointers• … to the right versions of things• … and using consistent terminology

• In restricted communities, we can gather all information into a central location and harmonize it

• OR we hope that a Google search throws up the right results

Page 6: In search of lost knowledge: joining the dots with Linked Data

Problems

1. Crossing communities is very hard

Page 7: In search of lost knowledge: joining the dots with Linked Data

www.fooddeserts.org

Page 8: In search of lost knowledge: joining the dots with Linked Data

Problems

1. Crossing communities is very hard2. Lots of useful stuff isn’t formally documented

Page 9: In search of lost knowledge: joining the dots with Linked Data

1000 hypotheses

100 are true 900 are false

95 truepositives

5 falsenegatives

45 falsepositives

855 truenegatives

Test accuracy 95%

Lots of useful negative results, which aren’t publishedNearly 1/3 of positive results are false

Page 10: In search of lost knowledge: joining the dots with Linked Data
Page 11: In search of lost knowledge: joining the dots with Linked Data

Problems

1. Crossing communities is very hard2. Lots of useful stuff isn’t formally documented3. Unknown unknowns…

Page 12: In search of lost knowledge: joining the dots with Linked Data

“There is a spurious decline in low-cloud fraction in the ISCCP cloud database due to the viewing angle.

It’s in the literature, but you might not find it if you’re not specifically looking for it. For example if you’re using the data for model validation you may not spot the problem.”

(Claire Barber, paraphrased)

Page 13: In search of lost knowledge: joining the dots with Linked Data

Consequences

• We use the data that are most easily available, not necessarily the best

• Constant re-discovery of the same issues

• Very hard to share information outside communities

Page 14: In search of lost knowledge: joining the dots with Linked Data

The “Web of Data”

Page 15: In search of lost knowledge: joining the dots with Linked Data

platform

instrument

algorithm

dataset

publicationscientists

Page 16: In search of lost knowledge: joining the dots with Linked Data

The Web is the“Web of Documents”

Page 17: In search of lost knowledge: joining the dots with Linked Data
Page 18: In search of lost knowledge: joining the dots with Linked Data

What is being cited?Why?

Page 19: In search of lost knowledge: joining the dots with Linked Data

Fingers crossed!

• We hope that everything has a document written about it, or we have nothing to cite

• … and we hope someone writes a new document when “it” changes

• We hope that we are citing the right document (i.e. the most authoritative etc)

Page 20: In search of lost knowledge: joining the dots with Linked Data

Photo by Susan Lesch, from Wikipedia

Sir Tim Berners-Lee“If … the Web made all the online documents

look like one huge book,

[the Semantic Web] will make all the data in the world look like one

huge database.”

Page 21: In search of lost knowledge: joining the dots with Linked Data

Why is the Semantic Web different?

• Focuses on things, not documents-about-things

• Links things together in a meaningful way

• Information is readable by computers

• Here’s how it works…

Page 22: In search of lost knowledge: joining the dots with Linked Data

1. Give things unique identifiers

Must be globally unique and persistent

http://www.reading.ac.uk/people/jon.blowermight represent me

http://nasa.gov/instruments/MODISmight represent the MODIS instrument

–(these are illustrative, not accurate!)

Page 23: In search of lost knowledge: joining the dots with Linked Data

2. Express relationships between things

The key concept is the triple

subject predicate object

E.g. “MODIS is carried by the Terra satellite”

all three things need to be unique identifiers (we don’t use plain English!)

Page 24: In search of lost knowledge: joining the dots with Linked Data

Examples of realistic triples(identifiers are illustrative, not necessarily correct!)

http://www.reading.ac.uk/people/jon.blowerhttp://xmlns.com/foaf/0.1/knowshttp://www.durham.ac.uk/staff/ed.llewellin

http://dx.doi.org/12345.678910 http://purl.org/spar/cito/citesAsDataSourcehttp://www.badc.ac.uk/datasets/sst

Page 25: In search of lost knowledge: joining the dots with Linked Data

Aside: how to use the Semantic Web to ruin jokes

“A hundred kilopascals go into a bar”

“100000”^^unit:Pascalowl:sameAs“1”^^unit:Bar

(Using OWL and QUDT ontologies)

Page 26: In search of lost knowledge: joining the dots with Linked Data

3. Publish your triples

• Can be as simple as putting a document in the right format on your website– (triples can be expressed in many formats)

• … or as complex as publishing a multi-billion-triple database– E.g. Ordnance Survey

• Either way, your data become part of the Web, not just on the web

Page 27: In search of lost knowledge: joining the dots with Linked Data

4. Profit!• Everyone can publish their own “view of the world”

– Don’t need a single data structure “to rule them all”

• Use common identifiers and common vocabularies to link between communities

• Different vocabularies can be mapped to each other– E.g. map CF standard names to GRIB codes

• Gives a means to record provenance of information– “Direct line of sight from decisions to data” - NASA

• Reasoning engines can traverse the graph of links and discover new information

Page 28: In search of lost knowledge: joining the dots with Linked Data

Example: SemantEco

Wildlife observationsEcosystem impactsAdministrative areasPollution data and policies

“Find me all the sites where chloride pollution exceeds the local policy limits”“Show me the species over time in this region”

Page 29: In search of lost knowledge: joining the dots with Linked Data

So what’s “Linked Data” then?

• The Web is the “web of documents”• The Semantic Web is the “web of data”• “Linked Data” is really a set of principles for

using the Semantic Web– http://www.w3.org/DesignIssues/LinkedData.html

• (Many people use “Linked Data” and “Semantic Web” somewhat interchangeably)

Page 30: In search of lost knowledge: joining the dots with Linked Data

carries

produced

describes / uses / queries …

produced

authored

The “Web of Data”

Page 31: In search of lost knowledge: joining the dots with Linked Data

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 32: In search of lost knowledge: joining the dots with Linked Data
Page 33: In search of lost knowledge: joining the dots with Linked Data

CHARMe:Sharing climate knowledge through

Linked Data(Jan 2013 – Dec 2014)

Page 34: In search of lost knowledge: joining the dots with Linked Data

Satellites

Climate Record

Creation and

Curation

ObservationsAnalysis

and applications

Climate DataRecords

Decision making

Reports Decisions,Policies

ClimateModelling

Model results(obs. sometimes

used for model initialization)

From observations to decisions

(Adapted from Dowell et al., 2013), “Strategy Towards an Architecture for Climate Monitoring from Space”)

Page 35: In search of lost knowledge: joining the dots with Linked Data

Where can users go for help?• Scientific literature

– Huge, verbose and inaccessible to some communities– Not well linked to source data

• Technical reports and conference proceedings– Hard to find, scattered or inaccessible

• Data centres– increasingly strong at providing some important metadata, but don’t

usually include community feedback– Not all countries and communities have data centres!

• Websites and blogs– From CEOS Handbook to a scientist’s blog– Increasingly useful, but scattered

Page 36: In search of lost knowledge: joining the dots with Linked Data

How can climate data users decide whether a dataset is fit for their purpose?

(N.B. We consider that “data quality” and “fitness for purpose” are the same thing)

Not specific to climate data!

Page 37: In search of lost knowledge: joining the dots with Linked Data

“Commentary metadata”

37

Page 38: In search of lost knowledge: joining the dots with Linked Data

Examples of commentary metadata• Post-fact annotations, e.g. citations, ad-hoc comments and notes;• Results of assessments, e.g. validation campaigns,

intercomparisons with models or other observations, reanalysis;• Provenance, e.g. dependencies on other datasets, processing

algorithms and chain, data source;• Properties of data distribution, e.g. data policy and licensing,

timeliness (is the data delivered in real time?), reliability;• External events that may affect the data, e.g. volcanic eruptions, El-

Nino index, satellite or instrument failure, operational changes to the orbit calculations.

General rule: information originates from users or external entities, not original data providers– However, sometimes information is not available from the data

provider!

Page 39: In search of lost knowledge: joining the dots with Linked Data

How will this be done?

• CHARMe will create connected repositories of commentary information– Stored as triples in

“CHARMe nodes”,• Information can be read

and entered through websites or Web Services

• Using principles of Open Linked Data and the Semantic Web

CHARMe node CHARMe

node

CHARMe node

3rd party system

3rd party system

Data provider website

Page 40: In search of lost knowledge: joining the dots with Linked Data

Open Annotation• We are using Open Annotation for

recording commentary– World Wide Web Consortium standard

• Associates a body with a target

• E.g. a publication could be the body, an EO dataset could be the target

• Can record the motivation behind an annotation– Bookmarking, classifying, commenting,

describing, editing, highlighting, questioning, replying… (lots more)

– Covers a lot of CHARMe use cases!

• An annotation can have multiple targets– A key requirement from users

Page 41: In search of lost knowledge: joining the dots with Linked Data

Satellites

Climate Record

Creation and

Curation

ObservationsAnalysis

and applications

Analysis and

applications

Climate DataRecords

Decision making

Reports Decisions,Policies

ClimateModelling

(e.g. CMIP5)

Model results(obs. sometimes

used for model initialization)

Where does CHARMe fit in?

Supports analysts and scientists in production of information for

decision-makers

Page 42: In search of lost knowledge: joining the dots with Linked Data

www.fooddeserts.org

Page 43: In search of lost knowledge: joining the dots with Linked Data

Challenges• Ensuring community adoption:– We are “injecting” CHARMe capabilities into existing

websites used in the community– We will “seed” the CHARMe system with information

to attract users (e.g. links between publications and datasets)

• Ensuring quality of commentary metadata– Moderation is a strong user requirement– We will provide guides to creators of commentary –

what makes a comment helpful?

Page 44: In search of lost knowledge: joining the dots with Linked Data

What CHARMe will enable(some examples)

Users: - “Find me all the documents that have been written about this dataset” - “… in both peer-reviewed journals and the grey literature” - “… and specifically about precipitation in Africa” - “… in both NEODC’s and Astrium’s archives”

- “What factors might affect the quality of this dataset?”e.g. upstream datasets, external events

- “What have other users already discovered about this dataset?”

- “I want to find information related to the dataset I’m looking at”

Data providers: - “Who is using my dataset and what are they saying about it?” - “Let me subscribe to new user comments and reply to them”

Page 45: In search of lost knowledge: joining the dots with Linked Data

What CHARMe will not enable• “Give me the best dataset on sea surface temperature”

– The “best” dataset depends on the application

• CHARMe will not provide a new “quality stamp” for datasets– But will be able to link to such things if other people publish

them, e.g. Bates maturity index, QA4EO certification

• CHARMe will not provide access to actual data– (but will enable discovery of data)

• Not planning to create (another) “one-stop shop” for information– We want the information to appear where users are already

looking

Page 46: In search of lost knowledge: joining the dots with Linked Data

Beware of 5-star ratings…

(words of wisdom from xkcd.com)

TERRIBLE

Page 47: In search of lost knowledge: joining the dots with Linked Data

How will you use CHARMe?

• Search for datasets on existing data provider website• Use the “CHARMe button” to view or record

commentary on a particular dataset

Page 48: In search of lost knowledge: joining the dots with Linked Data

“Significant events” viewer

• Google Finance (above) matches stock prices with news events• CHARMe tool will match climate reanalysis data with “significant events”

– Algorithm changes, instrument failures, new data sources• Will allow user annotations on the data and events• Will be available on the Web

Page 49: In search of lost knowledge: joining the dots with Linked Data

Fine-grained commentary

• Will allow creation and discovery about specific parts of datasets

• E.g. variables, geographic locations, time ranges• cf http://maphub.github.io

Page 50: In search of lost knowledge: joining the dots with Linked Data

• Using Linked Data to combine EO and socioeconomic data in 8 application areas• 16-partner EU FP7 project, just started!

– Coordinated from Reading

Jon BlowerDebbie CliffordTristan QuaifeSam Williams

Page 51: In search of lost knowledge: joining the dots with Linked Data

Summary• Linked Data can join up information

from all over the Web

• Makes disparate data sources discoverable and processable

• CHARMe is using Linked Data techniques to help users of climate data to connect with all the experience in the community– Techniques are not specific to

climate data

• MELODIES project will combine environmental data and socioeconomic data to develop real-world services using Linked Data

Page 52: In search of lost knowledge: joining the dots with Linked Data

Thank you!

[email protected]@Jon_Blower

http://www.charme.org.uk@MelodiesProject