Hasler2014

24
Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun Switzerland Reclaim your Digital Life

description

Mem0r1es platform overview (platform to meaningfully share and consolidate digital memories and personal information), Hasler Stiftung, 2014 Prof. Philippe Cudre-Mauroux, exascale infolab, http://exascale.info/

Transcript of Hasler2014

Page 1: Hasler2014

Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland

Reclaim yourDigital Life

Page 2: Hasler2014

Motivation (1/3)

Commoditization of digital equipment■ Desktops, laptops, netbooks, mobile phones,

tablets, e-book readers, set-top boxes, personal GPSs, digital cameras, TVs, etc.

Fragmentation of information across devices

Page 3: Hasler2014

Motivation (2/3)

The story of my life...■ Where are the pictures of my niece’s birthday?■ How should I consolidate/backup my emails?

Fortunately there’s the cloud, right?

Page 4: Hasler2014

Motivation (3/3)

2014 twist on Personal Information Management: lifelogging, health-monitoring■ Everylog, Memoto, Google Glasses, Nike's FuelBand,

FitBit, Samsung GearFit & competitors...➡Urgent need to index & integrate continuous personal

feeds for automated processing

Page 5: Hasler2014

Problem Definition

Personal digital information is today fragmented and externalized

➡ “Each site is a silo, walled off from the others…” [TBL 10.2010]■ Data partitioning■ Loss of governance

How shall one automatically reclaim and meaningfully organize his/her digital information dispersed online and on various devices to generate useful digital memories?

Page 6: Hasler2014

MEM0R1ES......a highly-available, secure, scalable, and semantically-rich platform to extract, preserve, integrate and expose personal information for a smarter world

Page 7: Hasler2014

the -Team

Prof. Dr. Philippe Cudré-Mauroux

Prof. Dr. Karl Aberer

Prof. Dr. Maria Sokhn

Julien Tscherrig

Joël Dumoulin

Michele Catasta

Dr. Gianluca Demartini

Alberto Tonon

Page 8: Hasler2014

Last Year…

Device & Service Wrappers [EIA-FR]

■ Generic Wrapper Architecture: SMTP, Gmail, Google Drive, Facebook, DBPedia, Flickr, LinkedIn

■ Browser wrapper: [EPFL]

Lifelogging rich features (context, user activities and focus, etc.) from the browser

Storage Infrastructure ■ Multi-purpose, declarative & elastic storage

layer [UNIFR]

Page 9: Hasler2014

Result from the Digital Reclaiming

➡Heterogeneous Graphs of EntitiesInformation duplication

Sometimes with different facets

Missing information

Page 10: Hasler2014

Today’s Focus

Meaningful information integration from heterogeneous graphs of entities

1. Entity Search (AOR)

2. Entity Typing (TRank)

3. Entity Clustering (ZenCrowd, MemorySense, Predict)

4. Entity Elicitation (Transactive Search)

Use-case: leveraging digital mem0r1es from a conference participation (demonstrators)

Page 11: Hasler2014

1. Entity Search [UNIFR]

Main idea: combine unstructured and structured search to find relevant entities in the graph■ Inverted index to locate first candidates■ Graph queries to refine the results

■ Graph traversals (queries on object properties)■ Graph neighborhoods (queries on data type properties)

Page 12: Hasler2014

1. Entity Search

➡ up to 25% MAP improvement over BM25!

Page 13: Hasler2014

2. Entity Typing [UNIFR+EPFL]

Entities can have many types (facets)■ Which fine-grained types are most relevant given

the context?

Thing

American Billionaire

s

People from King

CountyPeople from

Seattle

Windows People

Agent

Person

Living People

American People of Scottish Descent

Harvard University

People

American Computer

Programmers

American Philanthropists

People from

Seattle

Page 14: Hasler2014

2. Entity Typing

Integrates BigData types from the Web of data■ Tree of 447’260 types■ Rooted on <owl:Thing> ■ Depth of 19

Ranks relevant types by analyzing the context ■ Textual context■ Graph context■ Decision trees■ Linear regression

Page 15: Hasler2014

3. Entity Clustering

Several efforts to cluster entities into meaningful groups depending on context:

PREDIct [EIA-FR]

■ Extracts Web information through wrappers

■ Models topics through Latent Dirichlet Allocation

■ Predictions based on topic trends

Page 16: Hasler2014

3. Entity Clustering

MemorySense [EPFL]

■ Clusters mobile data into macro-activities

■ Leverages location, machine-learning and an activity ontology

B-hist [UNIFR+EPFL+EIA-FR]

■ Better browser history clustering through entity typing and machine-learning

Page 17: Hasler2014

4. Entity Elicitation [EPFL+UNIFR]

Filling the gaps in mem0r1es entity graphs■ e.g., ‘who also attended WWW03 last year?’■ Traditional methods (Web crawling, machine-

learning, micro-task crowdsourcing) are insufficient■ Errors and lack of discriminative features (➘precision)

■ Lack of public data (➘recall)

Page 18: Hasler2014

4. Entity Elicitation

Adapting the concept of transactive memories (group memories) from psychology

➡Transactive search methods to elicit information

■ Social network analysis (to direct the search)■ Crowdsourcing (to get the information)■ 46% improvement (F1) over best alternative

Page 19: Hasler2014

Demo

Use-case on scientific conference memoriesBased on 4 demonstrators:■ Visualizing clustered mobile data (MemorySense)■ Information elicitation through Transactive Search

(Hippocampus)■ Browsing clustered Web history (B-hist)■ Clustering and prediction of topics based on

extracted information (PREDIct)

Page 20: Hasler2014

Dissemination (1)

Papers at top research venues:■ Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured

search for ad-hoc object retrieval. SIGIR 2012.

■ Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank, Ranking Entity Types Using the Web of Data. International Semantic Web Conference ISWC 2013.

■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudré-Mauroux: Hippocampus, answering memory queries using transactive search. WWW 2014.

■ Michele Catasta, Alberto Tonon, Vincent Pasquier, Gianluca Demartini, Karl Aberer, Philippe Cudré-Mauroux: B-hist, Better Entity-Centric Search over Personal Web Browsing History. International Semantic Web Conference ISWC 2014.

■ Michele Catasta, Alberto Tonon, Gianluca Demartini, Jean-Eudes Ranvier, Karl Aberer, Philippe Cudré-Mauroux: B-hist, Entity-Centric Search over Personal Web Browsing History. Journal of Web Semantics, 2014 (to appear).

■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudre-Mauroux: TransactiveDB: Tapping into Collective Human Memories. PVLDB, 2014 (in revision).

■ Julien Tscherrig, Philippe Cudre-Mauroux, Elena Mugellini, Omar Abou Khaled, Maria Sokhn: SemantiConverter: A Flexible Framework to Convert Semi-Structured Data into RDF. Submitted for publication.

Page 21: Hasler2014

Dissemination (2)

Android app on Google PlayOpen-source release of most components■ https://github.com/MEM0R1ES

ISWC 2013 Best-Paper Award nominee (TRank)Semantic Web Challenge 2013 Finalist (B-hist)Wall Street Journal mention (B-hist, 30.10.2013)Technology transfer■ Extracting entities (Google Zurich)■ MemorySense (Samsung)■ TRank (Yahoo!)

Start-up (?)

Page 22: Hasler2014

Current Research Directions

Modelling tail-entitiesTransactive DB operatorAutomatic capture of important memories■ Google Glasses

Software integration

Page 23: Hasler2014

Conclusions

Exciting project■ Important, timely societal issues■ Fundamental research questions

■ Data Storage, Data Integration, Data Clustering, Data Elicitation

Stimulating collaboration■ Involving 3 (4) institutions➡ Thanks to all partners for their contributions!

A number of tangible results already ■Open-source software components■Publications at top research venues■Industry transfer

Page 24: Hasler2014

Thanks a lot for your attention,

… and many thanks to the Hasler Stiftungfor funding this project!

Questions?

Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland

Reclaim your Digital Life