Hasler2014
-
Upload
exascale-infolab -
Category
Data & Analytics
-
view
93 -
download
0
description
Transcript of Hasler2014
Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland
Reclaim yourDigital Life
Motivation (1/3)
Commoditization of digital equipment■ Desktops, laptops, netbooks, mobile phones,
tablets, e-book readers, set-top boxes, personal GPSs, digital cameras, TVs, etc.
Fragmentation of information across devices
Motivation (2/3)
The story of my life...■ Where are the pictures of my niece’s birthday?■ How should I consolidate/backup my emails?
Fortunately there’s the cloud, right?
Motivation (3/3)
2014 twist on Personal Information Management: lifelogging, health-monitoring■ Everylog, Memoto, Google Glasses, Nike's FuelBand,
FitBit, Samsung GearFit & competitors...➡Urgent need to index & integrate continuous personal
feeds for automated processing
Problem Definition
Personal digital information is today fragmented and externalized
➡ “Each site is a silo, walled off from the others…” [TBL 10.2010]■ Data partitioning■ Loss of governance
How shall one automatically reclaim and meaningfully organize his/her digital information dispersed online and on various devices to generate useful digital memories?
MEM0R1ES......a highly-available, secure, scalable, and semantically-rich platform to extract, preserve, integrate and expose personal information for a smarter world
the -Team
Prof. Dr. Philippe Cudré-Mauroux
Prof. Dr. Karl Aberer
Prof. Dr. Maria Sokhn
Julien Tscherrig
Joël Dumoulin
Michele Catasta
Dr. Gianluca Demartini
Alberto Tonon
Last Year…
Device & Service Wrappers [EIA-FR]
■ Generic Wrapper Architecture: SMTP, Gmail, Google Drive, Facebook, DBPedia, Flickr, LinkedIn
■ Browser wrapper: [EPFL]
Lifelogging rich features (context, user activities and focus, etc.) from the browser
Storage Infrastructure ■ Multi-purpose, declarative & elastic storage
layer [UNIFR]
Result from the Digital Reclaiming
➡Heterogeneous Graphs of EntitiesInformation duplication
Sometimes with different facets
Missing information
Today’s Focus
Meaningful information integration from heterogeneous graphs of entities
1. Entity Search (AOR)
2. Entity Typing (TRank)
3. Entity Clustering (ZenCrowd, MemorySense, Predict)
4. Entity Elicitation (Transactive Search)
Use-case: leveraging digital mem0r1es from a conference participation (demonstrators)
1. Entity Search [UNIFR]
Main idea: combine unstructured and structured search to find relevant entities in the graph■ Inverted index to locate first candidates■ Graph queries to refine the results
■ Graph traversals (queries on object properties)■ Graph neighborhoods (queries on data type properties)
1. Entity Search
➡ up to 25% MAP improvement over BM25!
2. Entity Typing [UNIFR+EPFL]
Entities can have many types (facets)■ Which fine-grained types are most relevant given
the context?
Thing
American Billionaire
s
People from King
CountyPeople from
Seattle
Windows People
Agent
Person
Living People
American People of Scottish Descent
Harvard University
People
American Computer
Programmers
American Philanthropists
People from
Seattle
2. Entity Typing
Integrates BigData types from the Web of data■ Tree of 447’260 types■ Rooted on <owl:Thing> ■ Depth of 19
Ranks relevant types by analyzing the context ■ Textual context■ Graph context■ Decision trees■ Linear regression
3. Entity Clustering
Several efforts to cluster entities into meaningful groups depending on context:
PREDIct [EIA-FR]
■ Extracts Web information through wrappers
■ Models topics through Latent Dirichlet Allocation
■ Predictions based on topic trends
3. Entity Clustering
MemorySense [EPFL]
■ Clusters mobile data into macro-activities
■ Leverages location, machine-learning and an activity ontology
B-hist [UNIFR+EPFL+EIA-FR]
■ Better browser history clustering through entity typing and machine-learning
4. Entity Elicitation [EPFL+UNIFR]
Filling the gaps in mem0r1es entity graphs■ e.g., ‘who also attended WWW03 last year?’■ Traditional methods (Web crawling, machine-
learning, micro-task crowdsourcing) are insufficient■ Errors and lack of discriminative features (➘precision)
■ Lack of public data (➘recall)
4. Entity Elicitation
Adapting the concept of transactive memories (group memories) from psychology
➡Transactive search methods to elicit information
■ Social network analysis (to direct the search)■ Crowdsourcing (to get the information)■ 46% improvement (F1) over best alternative
Demo
Use-case on scientific conference memoriesBased on 4 demonstrators:■ Visualizing clustered mobile data (MemorySense)■ Information elicitation through Transactive Search
(Hippocampus)■ Browsing clustered Web history (B-hist)■ Clustering and prediction of topics based on
extracted information (PREDIct)
Dissemination (1)
Papers at top research venues:■ Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured
search for ad-hoc object retrieval. SIGIR 2012.
■ Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank, Ranking Entity Types Using the Web of Data. International Semantic Web Conference ISWC 2013.
■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudré-Mauroux: Hippocampus, answering memory queries using transactive search. WWW 2014.
■ Michele Catasta, Alberto Tonon, Vincent Pasquier, Gianluca Demartini, Karl Aberer, Philippe Cudré-Mauroux: B-hist, Better Entity-Centric Search over Personal Web Browsing History. International Semantic Web Conference ISWC 2014.
■ Michele Catasta, Alberto Tonon, Gianluca Demartini, Jean-Eudes Ranvier, Karl Aberer, Philippe Cudré-Mauroux: B-hist, Entity-Centric Search over Personal Web Browsing History. Journal of Web Semantics, 2014 (to appear).
■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudre-Mauroux: TransactiveDB: Tapping into Collective Human Memories. PVLDB, 2014 (in revision).
■ Julien Tscherrig, Philippe Cudre-Mauroux, Elena Mugellini, Omar Abou Khaled, Maria Sokhn: SemantiConverter: A Flexible Framework to Convert Semi-Structured Data into RDF. Submitted for publication.
Dissemination (2)
Android app on Google PlayOpen-source release of most components■ https://github.com/MEM0R1ES
ISWC 2013 Best-Paper Award nominee (TRank)Semantic Web Challenge 2013 Finalist (B-hist)Wall Street Journal mention (B-hist, 30.10.2013)Technology transfer■ Extracting entities (Google Zurich)■ MemorySense (Samsung)■ TRank (Yahoo!)
Start-up (?)
Current Research Directions
Modelling tail-entitiesTransactive DB operatorAutomatic capture of important memories■ Google Glasses
Software integration
Conclusions
Exciting project■ Important, timely societal issues■ Fundamental research questions
■ Data Storage, Data Integration, Data Clustering, Data Elicitation
Stimulating collaboration■ Involving 3 (4) institutions➡ Thanks to all partners for their contributions!
A number of tangible results already ■Open-source software components■Publications at top research venues■Industry transfer
Thanks a lot for your attention,
… and many thanks to the Hasler Stiftungfor funding this project!
Questions?
Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland
Reclaim your Digital Life