Post on 29-Mar-2015
Abiteboul - EDBT keynote, Athenes 1
Knowledge out there on the Web
Serge Abiteboul
2014
Abiteboul - EDBT keynote, Athenes 2
• Knowledge out there on the Web:
Video of the talk at the Royal Society
2014
Knowledge out there on the Web
Personal knowledge out there
Abiteboul - EDBT keynote, Athenes 3
Organization
1. The context2. The personal information management system
1. The concept of Pims2. Pims are coming3. Advantages
3. From information to knowledge4. The Webdamlog language
1. The language in brief2. Probabilities3. Access control
5. Conclusion: some research issues2014
Abiteboul - EDBT keynote, Athenes 4
1. The context
2014
Abiteboul - EDBT keynote, Athenes 5
Data explosion• data: pictures, music, movies, reports, email, tweets, contacts, schedules…• social interactions: opinions, annotations, recommendation…• metadata: on photos, documents, music…• ontologies: Alice’s ontology and mapping with other ontologies• web localizations: friends account on FB, twitter, lists of blogs… • security: credentials on various systems• data in various organizations
– jobs, schools, insurances, banks, taxes, medical, retirement…• data in various vendors
– amazon, retailers, netflix, applestore… • data that software or hardware sensors capture
– with or without our knowledge– web navigation, phone use, geolocation, "quantified self" measurements, contactless card
readings, surveillance camera pictures, •
…2014
Abiteboul - EDBT keynote, Athenes 6
Data dispersion
• Laptop, desktop, smartphone, tablet, car computer• Residential boxes (tvbox), NAS, electronic vaults…• Mail, address book, agenda, todo-lists• Facebook, LinkedIn, Picasa, YouTube, Tweeter• Svn, Google docs, Dropbox• Government services• Business services• Also machine and systems from
– family, friends, associations, work
• Systems even unknown to the user – third party cookies
2014
Abiteboul - EDBT keynote, Athenes 7
Data heterogeneity
Type: text, relational, HTML, XML, pdf…Terminology/structure/ontologySystems: MS, Linux, IOS, AndroidDistributionSecurity protocolsQuality: incomplete / inconsistent information
2014
Abiteboul - EDBT keynote, Athenes 8
Bad news
• Limited functionalities because of the silos– Difficult to do global search, synchronization, task
sequencing over distinct systems…
• Loss of control over the data– Difficult to control privacy– Leaks of private information
• Loss of freedom– Vendor lock-in
2014
Abiteboul - EDBT keynote, Athenes 9
Growing resentment
• Against companies– Intrusive marketing, cryptic personalization and business
decisions (e.g., on pricing), and automated customer service with no real channel for customers' voices
– Creepy "big data" inferences• Against governments – NSA and its European counterparts
• Dissymmetry between what these systems know about a person, and what the person actually knows
2014
Abiteboul - EDBT keynote, Athenes 10
Future alternatives (for normal people)
1. Continue with this increasing mess– Use a shrink to overcome frustration
2. Regroup all your data on the same platform– Google, Apple, Facebook, …, a new comer– Use a shrink to overcome resentment
3. Study 2 years to become a geek– Geeks know how to manage their information – Use a shrink to survive the experience
4. And, of course, there is the Pims’ way
2014
Abiteboul - EDBT keynote, Athenes 11
2. The personal information management system
2.1 Introduction
2014
Abiteboul - EDBT keynote, Athenes 12
The Pims
• Personal information management system• What is a successful Web service today– Some great software– Some machines on which it runs(and a business model)
• Separate the two facets– Some company provides the software– It runs on your machinewith another business model
2014
Abiteboul - EDBT keynote, Athenes 13
The Pims (1)• The Pims runs software
– The user chooses the code to deploy on the server.– The software is open source, a requirement for security.
• With the user's data– All the user’s personal information
• 0n the user’s server(s) – The user owns it or pays for a hosted server– The server may be a physical or a virtual machine– It may be physically located at the user’s home (e.g., a tvbox) or not– It may run on a single machine or be distributed among several
machines– The server is in the cloud, i.e., it can be reached from everywhere -
personal cloud
2014
Abiteboul - EDBT keynote, Athenes 14
The Pims: the 2 main issues
• Security– Enforced by the Pims: guaranteed by the contract the user
has with the Pims• Reasonably small piece of code; possible to verify it
– Enforced by the services running on it: open source so that we don’t need to trust the providers of these systems
– A higher level of security than now • The management
– Should be epsilon-work– Should require little competence– A company can be paid to do it (in the cloud)
2014
Abiteboul - EDBT keynote, Athenes 15
2. The personal information management system
2.2 This is arriving
2014
Abiteboul - EDBT keynote, Athenes 16
It is becoming possible
• System administration is easier– Abstraction technologies for servers– Virtualization and configuration management tools.
• Open source is very active– Open source technology more and more available
• Price of machines is going down– A hosted-low cost server is as cheap as 5€/month– Paying is no longer a barrier for a majority of people
Indeed I am sure you have friends already doing it2014
Abiteboul - EDBT keynote, Athenes 17
Many people are working on it
• Many systems & projects– Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits,
Connections, Seetrieve, Personal Dataspaces, or deskWeb.
– YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud• Some on particular aspects– Mailpile for mail– Lima for a Dropbox-like service, but at home.– Personal NAS (network-connected storage) e.g. Synologie– Personal data store SAMI of Samsung...
• Many more2014
Abiteboul - EDBT keynote, Athenes 18
Data disclosure movement
• Smart Disclosure in the US• MiData in the UK• MesInfos in France
Several large companies (network operators, banks, retailers, insurers…) have agreed to share with a panel of customers the personal data that they have about them
2014
Abiteboul - EDBT keynote, Athenes 19
Big companies are interested (1) Pre-digital companies
• E.g., hotels or banks • Disintermediated from their customers by pure
Internet players such as Google, Amazon, Booking.com, Mint.
• In Pims, they can rebuild direct interaction • The playing field is neutral – Unlike on the Internet where they have less data
• They can offer new services without compromising privacy
2014
Abiteboul - EDBT keynote, Athenes 20
Big companies are interested (2) Home appliances companies
• Many boxes deployed at home or in datacenters– Internet access provider "boxes”, NAS servers,
"smart" meters provided by energy vendors, home automation systems, "digital lockers”…
• Personal data spaces dedicated to specific usage
• Could evolve to become more generic• Control of private Internet of objects2014
Abiteboul - EDBT keynote, Athenes 21
2. The personal information management system
2.3 Advantages
2014
Abiteboul - EDBT keynote, Athenes 22
Advantages
• User control over their data– Who has access to what, under what rules, to do what
• User empowerment– They choose freely services & they can leave a service
• Participation to a more “neutral” Web– With the "network effects", the main platforms are
accumulating data/customers and distorting competition– The Pims bring back fairness on the Web– Good practices are encouraged, e.g., interoperability,
portability
2014
Abiteboul - EDBT keynote, Athenes 23
Advantages – New functionalities
• Single identity/login• Semantic global search with (personal) ontology• Synchronization/backups across services• Access control management across services• Task sequencing across services• Exchange of information between “friends” • Connected objects control, a hub for the IoT• Personal big data analysis
2014
Abiteboul - EDBT keynote, Athenes 24
3. From information to knowledge
2014
(aka let’s move a tad more technical)
Abiteboul - EDBT keynote, Athenes 25
Machines prefer knowledge
• Integration of data & information sources– It is easier to integrate knowledge than information
• Collaboration between services & devices– It is easier for services to collaborate using
knowledge than with information• Problem solving based on knowledge inference
2014
Abiteboul - EDBT keynote, Athenes 26
Humans as well
• The users of the system are human beings– They want support for managing information– But they are not geeks– They don’t want to program
• To facilitate the interactions between humans and machines,
We should use declarative languages !
2014
Abiteboul - EDBT keynote, Athenes 27
It all started with datalog
• Popular in the 90’s• Some followers in 00’s
– A., Afrati, Atzeni, Cali, Greco, Gotloeb, Milo, Sacca, Ullman…• Recent revival
– 2010 Oege de Moor’s workshop @oxford• Datalog 2.0
– 2010 Joe Hellerstein’s keynote @pods• Datalog Redux: Experience and Conjecture
– 2014 Frank Neven’s keynote @icdt• Remaining CALM in declarative networking
• Now featuring: Webdamlog2014
Abiteboul - EDBT keynote, Athenes 29
Requirement 1: Distribution
• Different machines• Different users
• We use the notion of principal here– family@alice(Bob)– agenda@Alice-iPhone(…)– friends@Alice-FaceBook(…)
• A principal comes with identity and privileges
2014
Abiteboul - EDBT keynote, Athenes 30
Requirement 2: Privacy
• Control of who sees what in a distributed environment
• Access control• Should be clear from the first part of the talk
this is a most important issueTutorial on privacy by Nicolas Anciaux, Benjamin Nguyen, Iulian Sandu Popa – Today at 2:00
2014
31
Requirement 3: Probabilities
• We have to deal with negation– Elvis was not French
• With negations, come contradictions– Elvis Presley died in 1977; The King is alive
• There are different points of view– Elvis’s music is the best; it stinks
• Measure uncertainty with probabilities
2014 Abiteboul - EDBT keynote, Athenes
The more I see, the less I know for sure. John Lennon
Abiteboul - EDBT keynote, Athenes 32
So, what is the goal
• A datalog-style language with distributionaccess controlprobabilities
We are lucky, there is such a language:Webdamlog
2014
Abiteboul - EDBT keynote, Athenes 33
4. The Webdamlog language
(aka let’s be serious)4.1 Webdamlog in brief
2014
Abiteboul - EDBT keynote, Athenes 34
Facts and rules
Facts are of the form R@p(a1,…,an)– p is a principal, i.e., Serge, Serge’s-iPhone, Facebook/Serge, s.ab@gmail.com
Rules are of the form
$R@$P($U) :- $R1@$P1($U1), ..., $Rn@$Pn($Un)– $R, $Ri are relation terms– $P, $Pi are peer terms – $U, $Ui are tuples of terms– Safety condition– Also negations: ignored here
2014
Abiteboul - EDBT keynote, Athenes 35
The semantics of rules
Classification based on locality and nature of head predicates (intentional or extensional)
• Local rule at my-laptop: all predicates in the body of the rules are from my-laptop
Local with local intentional head datalogLocal with local extensional head database updateLocal with non-local extensional head messaging between peersLocal with non-local intentional head view definitionNon-localgeneral delegation
2014
Abiteboul - EDBT keynote, Athenes 36
Local rules with local head
Intensional local head – datalog [at my-iphone]
fof@my-iphone($x, $y) :- friend@my-iphone($x,$y)fof@my-iphone($x,$y) :- friend@my-iphone($x,$z),
fof@my-iphone($z,$y)
Extensional local head – database updates[at my-iphone]
believe@my-iphone(“Alice”, $loc) :-tell@my-iphone($p,”Alice”, $loc), friend@my-iphone($p)
2014
Abiteboul - EDBT keynote, Athenes 37
Local rules & non-local extensional head
Messaging between peers$message@$peer($name, “Happy birthday!”) :-
today@my-iphone($date),birthday@my-iphone($name, $message, $peer, $date)
Example
– today@my-iphone(3/25)
– birthday@my-iphone("Manon”, “sendmail”, “gmail.com”, 3/25)
– sendmail@gmail.com("Manon”, “Happy birthday”)
2014
Abiteboul - EDBT keynote, Athenes 38
Local rules & non-local intentional head
View definitionboyMeetsGirl@gossip-site($girl, $boy) :-
girls@my-iphone($girl, $event),boys@my-iphone($boy, $event)
• Semantics of boyMeetGirl@gossip-site is a join of relations girls and boys from my-iphone
• Defines a view at some other peer
2014
Abiteboul - EDBT keynote, Athenes 39
Non-local rules
General delegation(at my-iphone): boyMeetsGirl@gossip-site($girl, $boy) :-
girls@my-iphone($girl, $event),
boys@alice-iphone($boy, $event)
Example: girls@my-iphone(“Alice”, “Julia's birthday”) – my-iphone installs the following rule at alice-iphoneboyMeetsGirl@gossip-site(“Alice”, $boy) :-
boys@alice-iphone($boy, “Julia's birthday”)
Useful to distribute work and exchange knowledge2014
Abiteboul - EDBT keynote, Athenes 40
The thesis
The Web should turn into a distributed knowledge base where peers share facts and rules, and collaborateThe language Webdamlog is a first step towards that goal Missing– Probabilities– Access control
2014
Abiteboul - EDBT keynote, Athenes 41
4. The Webdamlog language
4.2 Probabilities
2014
Abiteboul - EDBT keynote, Athenes 42
Advertisement
Deduction with Contradictions in DatalogS.A., Daniel Deutch and Victor Vianu– Tomorrow, 11:00
2014
Abiteboul - EDBT keynote, Athenes 43
4. The Webdamlog language
4.3 Access control
2014
Abiteboul - EDBT keynote, Athenes 44
Requirements
Data access Users would like to control who can read and modify their information
Data dissemination Users would like to control how their data are transferred from one participant to another
Application control Users would like to control which applications can run on their behalf, and what information these applications can access.
2014
Abiteboul - EDBT keynote, Athenes 45
The general picture
• Coarse grain for extensional relations– read access to the relation
• Fine grain for intensional relations– read access to tuple t requires read access to the
tuples that lead to deriving t• Delegation controlled in a sandbox• Focus on read privilege here
2014
Abiteboul - EDBT keynote, Athenes 46
Read: default
• Extensional relations– if you have read privilege to the relation
• Intensional relations– if you have read privilege to the relation &– if you can read all the tuples that have been used
to create this fact – provenance of the fact
2014
Abiteboul - EDBT keynote, Athenes 48
Fine grain access control
[at Bob] album@Alice($p,$f) :- photo@Bob($p,$f)[at Sue] album@Alice($p,$f) :- photo@Sue($p,$f)
– album@Alice is intensional– Both Bob and Sue contribute to it– Peter who has read privilege to album@Alice and
photo@Bob only does not see the photos of Sue
2014
Abiteboul - EDBT keynote, Athenes 49
Paranoiac access control
[at Bob] album@Alice($p,$f) :- photo@Bob($p,$f), friends@Bob($f)
– Issue: you can read Bob’s photos only if you have read privilege on friends@Bob that Bob wants to keep private
2014
Abiteboul - EDBT keynote, Athenes 50
Declassification
[at Bob] photo@Alice($p,$f) :- photo@Bob($p,$f), [ hide friends@Bob($f) ]
– Hide: blocks the provenance from friends@Bob– Bob declassify this data just for the evaluation of
this rule– You can declassify only tuples you own grant ↦
privilege2014
Abiteboul - EDBT keynote, Athenes 51
Issues with non local rules
[at Bob] message@Sue(“I hate you”) :- date@Alice(d)
aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x)Ignoring access rights, by delegation, this results in running[at Alice]
message@Sue(“I hate you”) :- date@Alice(d)aliceSecret@Bob(x) :- date@Alice(d),
secret@Alice(x)
2014
Abiteboul - EDBT keynote, Athenes 52
Default solution: sand box
We run the rule at Alice in a Sandbox• We use the access rights of Bob
So the second rule does not succeed in sending secrets
• The message specifies that this is done at Bob’s requestSo requires authentication/signatures
Alternative: delegation without sandbox2014
Abiteboul - EDBT keynote, Athenes 53
5. Conclusion: some research issues
2014
Abiteboul - EDBT keynote, Athenes 55
Explaining
• Users want to understand the informationthey see, the answers they are given
– In their professional/social life • Difficulties– Reasoning with large number of facts – Information is often probabilistic and not public– Requires knowing how the information was
obtained (its provenance)
2014
Abiteboul - EDBT keynote, Athenes 57
Serendipity
• You may hear by chance a song that is going to totally obsess you
• A librarian may suggest your reading an article that will transform your research
This is serendipity
• A perfect search engine • A perfect recommendation
system• A perfect computer assistantSuch systems are boring
They lack serendipity
2014
Design programs that would introduce serendipity in our lives
Abiteboul - EDBT keynote, Athenes 58
Hypermnesia
Exceptionally exact or vivid memory, especially as associated with certain mental illnesses
For a user: We cannot live knowing that any word, any move will leave a trace?
For the ecosystem: We cannot store all the data we produce – lack of storage resources
2014
Forgetting is Key to a Healthy MindScientific AmericanImage: Aaron Goodman
A main issue is to select the information we choose to keep
Abiteboul - EDBT keynote, Athenes 59
Babel of human-machine-interaction
• Each time a user interacts with a data source, does he have to use the ontology of that source ?
• No! • Instead of a user adapting to the ontologies of
the N systems he uses each day• We want the N systems to adapt to the user’s
ontology
2014
Abiteboul - EDBT keynote, Athenes 60
Religion…science…machines
• Knowledge used to be determined by religion• Knowledge used to be determined scientifically• Knowledge will now be determined by machines?
• Decisions are increasingly made by machines– Stock market (automatic trading)– Fully automated factory– Fully automated metros– Death penalty (killer drones)…
2014
Abiteboul - EDBT keynote, Athenes 61
to the digital world!
• We will soon be living in a world surrounded by machines that – acquire knowledge and decide for us
• What will we do with that technology? • Will we become smarter? • Will we become master or slave of the new
technology?
2014
Abiteboul - EDBT keynote, Athenes 622014
σας ευχαριστώ