Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively...

16
Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor Metaweb Technologies, Inc. San Francisco International Conference on Management of Data (2008) 2008. 11. 12. Summarized & presented by Babar Tareen, IDS Lab., Seoul National University

Transcript of Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively...

Page 1: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Center for E-Business TechnologySeoul National University

Seoul, Korea

Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor

Metaweb Technologies, Inc.

San Francisco

International Conference on Management of Data (2008)

2008. 11. 12.

Summarized & presented by Babar Tareen, IDS Lab., Seoul National University

Page 2: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Motivation – Wikipedia

Free multilingual encyclopedia

Supports 264 languages

854 Volumes of English articles

2

Page 3: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Motivation – English Wikipedia Growth

3

Page 4: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Introduction

A public repository of world’s knowledge

Inspired by The Semantic Web and Wikipedia

Supports highly diverse and heterogeneous data

Tries to merge the scalability of structured databases with the diversity of collaborative wikis into a practical, scalable, database of structured general human knowledge

The information contained in Freebase is open to anyone

However, Freebase backend database is not open

4

Page 5: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Data Sources

User Contribution

Metaweb Bots

Incorporates facts from many large, publicly available information sources

5

Page 6: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Data Model

Freebase is a graph database

Set of nodes and a set of links that establish relationships between the nodes

Key Concepts

Domains

– Bases: collections of topics created by users

– Commons: similar to bases but more general

– Film, Religion, Computers

Types

– Analogues to classes

– Film Actor, Film Festival, Film Distribution, Film Rating, Film Format

Properties

– Specific information elements within a type

– Film Performances, Film Dubbing Performances, IMDb Entry

Topics

– Analogues to objects

– Instances of a type

– Topics can be linked to other domains or other topics

6

Page 7: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Data Model (2)

7

Page 8: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Key Components

A scalable Tuple Store

An HTTP/JSON-Based API

MQL for read / write operations

A Lightweight, Collaborative Typing System

Loose collection of structuring mechanisms and conventions

A Large, Diverse Data Set

100 million asserts

4000 types

A Philosophy of “Complete Normalization”

Only one GUID for a real world object

8

Page 9: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Data Entry

9

Page 10: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Schema Creation

10

Page 11: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Data Evaluation

11

Page 12: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Metaweb Query Language

Metaweb Query Language

Who created the comic character Spider-Man ?

12

QUERY[ { "character_created_by" : null, "name" : "Spider-Man", "type" : "/fictional_universe/fictional_character" }]

{ "code" : "/api/status/ok", "q1" : { "code" : "/api/status/error", "messages" : [ { "code" : "/api/status/error/mql/result", "info" : { "count" : 2, "result" : [ "Steve Ditko", "Stan Lee" ] }, "message" : "Unique query may have at most one result. Got 2", "path" : "character_created_by", "query" : [ { "character_created_by" : null, "error_inside" : "character_created_by", "name" : "Spider-Man", "type" : "/fictional_universe/fictional_character" } ] } ] }, "status" : "200 OK", "transaction_id" : "cache;cache01.p01.sjc1:8101;2008-11-11T05:54:45Z;0021"}

Page 13: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

MQL Queries

Characters created by Stan Lee

Foreign donations to 2008 US Political Candidates

Nikon Cameras in order of Resolution

Tropical Storms in the 90's

Mountains of the Himalayas

African American authors and their books

Web Browsers that run on the Mac

US cities named Canton

13

Page 14: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Applications

Parallax: Freebase Browserhttp://mqlx.com/~david/parallax/index.html

Powerset: Semantic Search Enginehttp://www.powerset.com/

ArchiPortalhttp://dev.mqlx.com/~zak/arch/

Dipity Timelineshttp://www.dipity.com/

14

Page 15: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

Discussion

Simple architecture

Topics can be associated to multiple types

Analogues to having a database of knowledge

BUT, Now we have two Knowledge bases to maintain

Wikipedia

Freebase

15

Page 16: Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.

Copyright 2008 by CEBT

References

Freebasehttp://www.freebase.com

The Semantic Edge (Web 2.0 Summit 2007)http://www.web2summit.com/cs/web2007/view/e_sess/15043

MQL Query Editorhttp://www.freebase.com/tools/queryeditor/

Freebase Bloghttp://blog.freebase.com/

Freebase Sample Queries http://www.freebase.com/view/freebase/freebase_query

16