Semantic Search: We're Living in a Golden Age for Information

37
Semantic Search: We’re Living in a Golden Age for Information Bernadette Hyland, CEO & co-founder co-chair W3C Government Linked Data Working Group NKOS-CENDI Semantic Search Workshop - Washington DC 6-December 2012 1 Thursday, December 6, 12

description

This talk outlines semantic search and living shows how we're living in a Golden Age for Information. The focus is on how government agencies can most effectively leverage the architecture of the Web to improve publication & consumption of high value open government data sets.

Transcript of Semantic Search: We're Living in a Golden Age for Information

Page 1: Semantic Search: We're Living in a Golden Age for Information

Semantic Search:We’re Living in a Golden Age for Information

Bernadette Hyland, CEO & co-founderco-chair W3C Government Linked Data Working Group

NKOS-CENDI Semantic Search Workshop - Washington DC6-December 2012

1Thursday, December 6, 12

Page 2: Semantic Search: We're Living in a Golden Age for Information

Photo credit: http://www.flickr.com/photos/sjungling/5974860/2Thursday, December 6, 12

We’re living in a digital golden age. • London Times OpEd piece written two years ago, Tim Berners-Lee and Nigel Shadbolt said the current information age will boost the economy and make life easier. • In today’s Web-centric world, retail, banking, all forms of research depends on the Web. • Some agencies now offer open data via APIs and as Linked Data, in addition to PDFs and CSV files via data.gov. Published datasets include much more than weather, GPS & energy data. • However ... the the application developer community, software companies and publishers are anxious to build products and services but they need to FIND, ACCESS and be able to RE-USE the content with CONFIDENCE.Photo credit: http://www.flickr.com/photos/sjungling/5974860/

Page 3: Semantic Search: We're Living in a Golden Age for Information

3Thursday, December 6, 12

Headlines and agency memos about government transparency with open data and various government Web sites.... innovation challenges based on open government data

... High energy datapalooza’s are emerging with awards ranging from a couple thousand to $100k+. These challenges open the doors to innovation for better healthcare solutions and more efficient use of energy, to name but a few. They all require access to and re-use of HIGH QUALITY DATA.

In 2012, we read many headlines about big data and world’s search engines and social media sites.

Page 4: Semantic Search: We're Living in a Golden Age for Information

4Photo credit: http://www.flickr.com/photos/glennharper/4452247708/4Thursday, December 6, 12

However, while there is lots of gold to be mined from public data, it is an uncomfortable time for Government IT and business managers who are tasked with data management programs.

Most people are having a difficult time keeping up. If you feel like you are hanging on while the world changes too fast, you are not alone.

Photo credit: http://www.flickr.com/photos/glennharper/4452247708/

Page 5: Semantic Search: We're Living in a Golden Age for Information

Big DataSimple dataComplex dataLegacy data

5Thursday, December 6, 12

KEY POINT: Search, discovery and data access approaches have evolved over the last decade and techniques are beginning to come together. GoPubMed was launched in 2002 as the first semantic search portal. Later, Microsoft’s Bing, Google’s Knowledge Graph are two of the other well known search engines employing semantic techniques.

Semantic search systems generally considers the context of search, location, intent, variation of words, synonyms and concepts. Semantic search has roots in linguistic research and NLP.

Big data research has grown to include the MapReduce algorithm for handling really large data sets, often measured in terabytes or greater. This is the kind of data that people at the Large Hadron Collider at CERN are working on to provide insights into how the universe works, including the recent discovery of the Higgs Boson, the particle that gives mass to matter.

Under the big top tent of semantic search we’re dealing with different types of content, big, public, complex and legacy data. Simple, complex and legacy data comes in small, medium and large sizes.

Many government agencies by contrast have lots of small to medium data sets in structured databases, like Oracle. These databases (and the systems that depend upon them) are not going away however fewer new data warehouse projects are likely to be started. Data warehouses are widely recognized to be costly to create and maintain, and change SLOWLY.

The biggest win for governments worldwide who adopt a Web architecture for data publishing is combining data sets to discover new or previously uncontemplated relationships.

Page 6: Semantic Search: We're Living in a Golden Age for Information

“Big Data Is Important, but Open Data Is More Valuable”As change agents, enterprise architects can help

their organizations become richer through strategies such as open data.

David Newman, VP Research, Gartner

6Thursday, December 6, 12

Open data refers to the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other forms of control.

The term “open data” has gained popularity with open data initiatives including data.gov.uk, data.gov and other government data catalog sites.

Enterprise architects are playing an important role in fostering information-sharing practices. Access to, and use of, open data will be particularly critical for a business that operate using the Web; organizations should focus on using open data to enhance business practices that generate growth and innovation.

Page 7: Semantic Search: We're Living in a Golden Age for Information

7Thursday, December 6, 12

A sound government information management strategy requires providing CONTEXT and CONFIDENCE to those accessing and potentially re-using your data.

Giving people have timely access to information, for disaster preparedness, scientific research, policy and research, the network effect of people helping people is our greatest hope.

On the heels of the recent East Coast hurricane that devastated parts of New York and New Jersey, government executives suggested that fear of cyber-doom scenarios may be taking too much of our thinking & planning. According to Secretary Panetta, it may be driving us to unrealistic and potentially dangerous responses to threats that don’t exist.

The reality is that when disaster strike, people come together and help one another. We don’t see paralysis, panic and social collapse.

During today’s session, I’ll describe how several agencies and private sector organizations are using Web technologies and semantics to improve information access and discovery. Simply put, semantic technologies provide CONTEXT.

Page 8: Semantic Search: We're Living in a Golden Age for Information

Open Government Data

8Thursday, December 6, 12

Page 9: Semantic Search: We're Living in a Golden Age for Information

“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”

-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People

Growing chorus ...

9Thursday, December 6, 12

The Digital Government Strategy sets out to accomplish three things: Access to high quality digital information & services; procure and manage devices, applications, and data in smart, secure and affordable ways; and unlock the power of government data to spur innovation.

Governments around the world are defining detailed digital services plans based on open data, open APIs and open source data platforms. They are defining how governments are publishing data with an eye towards improving access and re-use. Administrators and program managers are committing to delivery of digital services using semantic technologies broadly, and Linked Data specifically.

Page 10: Semantic Search: We're Living in a Golden Age for Information

10

10Thursday, December 6, 12

If your agency like many others, the reality is more like this ... many systems coming together often with issues of data quality, coming from different systems, supported by different contractors and offices.

You’re being asked to deliver new services using new technologies with shrinking budgets and fewer people.

Government agencies spending $100M’s annually on pipelines that slurp in & transform data from different systems, into yet another data warehouse. The typical time allocated for ETL of a new database is 3 months.

Page 11: Semantic Search: We're Living in a Golden Age for Information

11

Image credit: http://www.flickr.com/photos/fdecomite/3398523464/

11Thursday, December 6, 12

What we wish we had on our hands is neatly ordered information that we could integrate with our datasets, share with other agencies, the scientific community and if appropriate, the general public.

Image credit: http://www.flickr.com/photos/fdecomite/3398523464/

Page 12: Semantic Search: We're Living in a Golden Age for Information

The Web of Data• A global network of linked statements

• A place where anyone can say anything about anything

• A vast collection of machine-readable knowledge (and opinion)

• Statements are linked, and links are qualified

12Thursday, December 6, 12

Page 13: Semantic Search: We're Living in a Golden Age for Information

• Linked Data is about publishing and consuming data using international data standards

• Based on 20 year old idea

• A system of linked information systems

13Thursday, December 6, 12

Page 14: Semantic Search: We're Living in a Golden Age for Information

14Thursday, December 6, 12

Who is sharing their data as Linked Data? Small and large commercial and government organizations, NGOs, Non-profits ... plus many universities. Governments in the last few years have been responding to Open Government initiatives that mandate publishing open government data. Some are careful, slow-moving entities who simply needed to find real solutions to real problems.

Page 15: Semantic Search: We're Living in a Golden Age for Information

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

15Thursday, December 6, 12

Page 16: Semantic Search: We're Living in a Golden Age for Information

Open data + open standards + open platforms

Highly scalable computing &

hosting via the CloudInternational Data Exchange

Standards

5 Star Data (Linked Data)

Open Source tools

16Thursday, December 6, 12

A Web-oriented approach to information sharing has impacted how scientists, researchers, regulators and the public interacts with government.

Linked data lowers the barriers to re-use and interoperability among multiple, distributed and heterogeneous data sources.

Access to high-quality Linked Open Data via the Web means millions of researchers and developers will be able to shorten the time-consuming research process involving data cleansing and modeling.

Page 17: Semantic Search: We're Living in a Golden Age for Information

17Thursday, December 6, 12

How do we get a loose coupling of shared data over Web architectures? By using the structured data model for the Web: RDF.

There is a project to create freely available data on the Web in this way, which is known as the Linked Open Data project.

W3C sees Linked Data as the set of best practices and technologies to support worldwide data access, integration and creative re-use of authoritative data.

Page 18: Semantic Search: We're Living in a Golden Age for Information

Linked Data - refers to a set of best practices for publishing and interlinking structured data for access by both humans and machines via the use of the RDF family of syntaxes (e.g., RDF/XML, N3, Turtle and N-Triples) and HTTP URIs.

Linked Data can be published by an person or organization behind the firewall or on the public Web. If Linked Data is published on the public Web, it is generally called Linked Open Data.

- W3C Linked Data Glossary: https://dvcs.w3.org/hg/gld/raw-file/default/glossary

Linked Data Improves Discoverability & Access

18Thursday, December 6, 12

Lots of people use the term “open data” to mean a range of things. The Administration through data.gov and other sites, is committed to increasing public access to high value, machine readable datasets generated by the Executive Branch, agencies and offices.

“The primary goal is to improve access to Federal data and expand creative use of those data beyond the walls of government by encouraging innovative ideas (e.g., Web applications),” - About Data.gov.

The term “Linked Open Data” refers to the best practices and standards for achieving that goal, specifically, to publish data on the Web for access and re-use.

Page 19: Semantic Search: We're Living in a Golden Age for Information

Big data

• Simple data

• Complex data

• Legacy data

Integrating ...

19Thursday, December 6, 12

We need to find ways to fit things together that wasn’t originally intended to fit together.

NB: This is the Musée du Louvre which has evolved from a late 12th Century fortress under Phillip II, extended over centuries to incorporate the landmark Inverted Pyramid architected by I.M. Pei that was completed in 1993.

A recent competition to house its new galleries for Islamic art opened this year, 2012. It continues to accommodate new works for art & galleries in new & previously unanticipated ways.

Today, we need to understand the context of big data + complex data + public data and legacy data into one consistent whole.

Page 20: Semantic Search: We're Living in a Golden Age for Information

20Thursday, December 6, 12

September 2011: 295 datasets that meet the LOD Cloud criteria, consisting of over 31 billion RDF triples and are interlinked by around 504 million links.

Page 21: Semantic Search: We're Living in a Golden Age for Information

THERE IS A PROCESS

PublishConvertDescribeNameModelIdentify

21Thursday, December 6, 12

Take comfort in the fact that there is a familiar process. It is similar to the process & roles of traditional data modeling.

Creating Linked Data requires that we identify the data, model exemplar records -- what you are going to carry forward & what you are going to leave behind.

Name all of the NOUNs. Turn the records into URIs.

Next, describe RESOURCES with vocabularies.

Write a script or process to convert from canonical form to RDF. Then publish. Maintain over time.

Page 22: Semantic Search: We're Living in a Golden Age for Information

THERE IS A PROCESS

PublishConvertDescribeNameModelIdentify

Maintain

21Thursday, December 6, 12

Take comfort in the fact that there is a familiar process. It is similar to the process & roles of traditional data modeling.

Creating Linked Data requires that we identify the data, model exemplar records -- what you are going to carry forward & what you are going to leave behind.

Name all of the NOUNs. Turn the records into URIs.

Next, describe RESOURCES with vocabularies.

Write a script or process to convert from canonical form to RDF. Then publish. Maintain over time.

Page 23: Semantic Search: We're Living in a Golden Age for Information

22Thursday, December 6, 12

Page 24: Semantic Search: We're Living in a Golden Age for Information

23Thursday, December 6, 12

“Cooperation without coordination”, a meme started by David Wood when describing the profound power of Linked Data.

Data reuse breaks the back of API gridlock.

... explain why screen-scraping approaches are brittle and unreliable.

Page 25: Semantic Search: We're Living in a Golden Age for Information

24Thursday, December 6, 12

Under the covers, here is what linked data for consumption by a machine looks like ...

Page 26: Semantic Search: We're Living in a Golden Age for Information

We’ve Seen This Before

25Thursday, December 6, 12

Like HTML and RDF, credit cards have a human-readable side and a machine-readable side.

Page 27: Semantic Search: We're Living in a Golden Age for Information

LINKED DATA PLATFORM

•Callimachus is an open source, open standards platform for Web developers to create data-driven applications

•http://callimachusproject.org

26Thursday, December 6, 12

Wiki Systems don't handle structured content well nor promulgate change well. A tool for Web 2.0 developers creating DATA RICH web sites was needed … We created Callimachus, a triples up & down solution (no mySQL under the covers). HIGHLY SCALABLE for real world use.Named for the father of Bibliography (The Pinakes) at the Great Library of Alexandria. Lived during 305-c. 240 BCE. He could not categorize his own work using Aristotle's hierarchical system. He was the first person who defined the use case for a graph database.

Page 28: Semantic Search: We're Living in a Golden Age for Information

Contractor (3 Round Stones, Inc.)

Public

Application, Script or automated client

Web Browser

SPARQL endpointREST APIResource URIs

Linked Data management systemlocated at a Tier 1 Cloud Provider

(FISMA compliant)

RDF Database

Registered developer

27Thursday, December 6, 12

Introduce Callimachus, an open source, open data platform based on open standards. 3 Round Stones provides commercial support for Callimachus and is a major contributor to the OS project.

Users of Callimachus see a generated Web interface, but can also directly access the data via REST or SPARQL.

SPARQL Named Queries (like stored procedures) allow for automated conversion to different formats for reuse in non-RDF environments.

Page 29: Semantic Search: We're Living in a Golden Age for Information

28Thursday, December 6, 12

• US HHS committed to making a vast array of open data more readily available to improve health care delivery & reduce costs in 2013 and beyond.

• In 2012, Sentara created a Web application that integrates authoritative data from 5 different sources including content from NLM, NOAA, EPA and DBpedia

• This application utilizes open data, open standards and an open source data platform

Page 30: Semantic Search: We're Living in a Golden Age for Information

User

NOAA US EPA AirNow

DBpediaNational Library of Medicine

US EPA SunWise

29Thursday, December 6, 12

Page 31: Semantic Search: We're Living in a Golden Age for Information

30

Clinical Trials + enterprise linked

data

US Legislation + enterprise data

DBpedia + enterprise datasets

Data driven Web apps using Callimachus

30Thursday, December 6, 12

Callimachus integrates (very) well with other enterprise systems as well as Web content. It can form an entire application or part of one.NB: Mention Documentum, Oracle via HTTP

Page 32: Semantic Search: We're Living in a Golden Age for Information

Key points• No ‘one size fits all’

• Simple, complex and legacy data require different approaches

• Search, discovery and access are coming together

• Open Standards

• Linked Data improves search, discovery & access

• HTML5 for great user experience, mobile

• Biggest win for government & the public is ease of

combining data sets

31Thursday, December 6, 12

Key points from this talk ...When it comes to search, we’re really talking about much more. Search + discovery + access for re-use.

Major Web search engines like Google and Bing incorporate some elements of semantic search. Siri used all of these techniques in combination.

Open standards for the Web are critically important.Sensible information management strategies for government are key. Recognize that you’re entering into a social contract when publish data; similarly, plan accordingly for consumption of government data.

Page 33: Semantic Search: We're Living in a Golden Age for Information

32Thursday, December 6, 12

Page 34: Semantic Search: We're Living in a Golden Age for Information

The mission of the Government Linked Data (GLD) Working Group is to provide standards and other information which help governments around the world publish their data as effective and usable Linked Data using Semantic Web technologies.

33Thursday, December 6, 12

We are 16 months into the Government Linked Data Working group’s two year charter.

Page 35: Semantic Search: We're Living in a Golden Age for Information

• Additional information available on the Web, in books ...http://linkeddatabook.com/editions/1.0/http://3roundstones.com/linking-enterprise-data/http://3roundstones.com/linking-government-data/http://www.manning.com/dwood/http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

• Commercially supported Open Source Web data platform http://3roundstones.com/products

Bernadette Hyland

@BernHyland

[email protected] in Virginia

34Thursday, December 6, 12

http://linkeddatabook.com/editions/1.0/http://3roundstones.com/linking-enterprise-data/http://3roundstones.com/linking-government-data/http://www.manning.com/dwood/http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

Page 36: Semantic Search: We're Living in a Golden Age for Information

This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

35Thursday, December 6, 12

This presentation is licensed under a Creative Commons BY-SA license, allowing you to share and remix its contents as long as you give us attribution and share alike.

Page 37: Semantic Search: We're Living in a Golden Age for Information

Credits

David NewmanGartner: “Innovation Insight: Linked Data Drives Innovation Through Information-Sharing Network Effects” Published: 15 December 2011

David Wood, ed. Linking Government Data, Springer (2011) http://3roundstones.com/linking-government-data/

US Executive Branch

Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government.html

W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa licenseAll other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license

36Thursday, December 6, 12