Towards Cognitive Agents for BigData Discovery

28
Towards Cognitive Agents for BigData Discovery Finding Solutions to Complex, Urgent Problems Jack Park BigData Science Meetup Freemont, CA: 19 April, 2014 Shyam Sarkar, Organizer © 2014, TopicQuests Foundation Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

description

Cognitive Agents to augment BigData analysis

Transcript of Towards Cognitive Agents for BigData Discovery

Page 1: Towards Cognitive Agents for BigData Discovery

Towards Cognitive Agents for BigData Discovery

Finding Solutions to Complex, Urgent Problems

Jack Park

BigData Science Meetup

Freemont, CA: 19 April, 2014

Shyam Sarkar, Organizer

© 2014, TopicQuests Foundation

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 2: Towards Cognitive Agents for BigData Discovery

A Narrative Arc

Page 3: Towards Cognitive Agents for BigData Discovery

Context: Two Kinds of Discovery

• Data-based

– Harvesting nuggets from collected data

• Literature-based: Deep Question Answering

– Discovering connections between dots in the literature

Page 4: Towards Cognitive Agents for BigData Discovery

Target: Deep Question Answering

Breadth

Dep

th

InformationRetrieval

Semantic Representation

Goal

Diagram adapted from a talk by Percy Liang at Stanford, 20140407

Page 5: Towards Cognitive Agents for BigData Discovery

Our Goals

• Improve Human-Tool Capabilities

• Augment existing analytic methods

– Increase opportunities for discovery

– Improve already sophisticated methods

“Discovery consists of seeing what everybody has seen and thinking what nobody has thought.”

–Albert Szent-Györgyi

Page 6: Towards Cognitive Agents for BigData Discovery

Our Approach

• Explore and develop the technologies of so-called Cognitive Agents

– Current examples

• IBM’s Watson

• SIRI

• An opportunity

– Couple two platforms

• Berkeley Data Analytics Stack (BDAS)

• SolrSherlock

Page 7: Towards Cognitive Agents for BigData Discovery

Berkeley Data Analytics Stack Deep QA Issues*

• Low latency queries – Perform faster inferences

– Explore larger spaces

– Better decisions

• Sophisticated analysis – Better forecasts

– Better decisions

• Unification of existing data computation models – Integrate interactive queries, batch and streaming

processing

*http://strata.oreilly.com/2013/02/the-future-of-big-data-with-bdas-the-berkeley-data-analytics-stack.html

Page 8: Towards Cognitive Agents for BigData Discovery

An Observation

In this context, interesting literature is about the social lives of data

Page 9: Towards Cognitive Agents for BigData Discovery

Literature-based Discovery

• Forming bisociative links* between information in different literature sources which are not known to be related

• Swanson example (simplified)**: – Literature associated with Raynaud’s

• Raynaud’s therapy linked to blood thinners

– Literature associated with fish oils • Fish oil linked to blood thinners

– “Blood thinners” as an implicit link between fish oil and Raynaud’s Syndrome • Akin to the wormholes formed by tags on web pages or

hashtags *Arthur Koestler (1964). The Act of Creation ** Swanson, Don (1986) "Fish oil, Raynaud's syndrome, and undiscovered public knowledge." Perspectives in Biology and Medicine 30(1): 7-18.

Page 10: Towards Cognitive Agents for BigData Discovery

Cognitive Agents

• Examples – Proprietary

• IBM’s Watson • SIRI • SRI’s CALO

– Part of which: IRIS, was made open source as OpenIRIS

• Others…

– Open Source • Cougaar

– http://www.cougaar.org/

• Open Cog – http://opencog.org/

• Open Advancement of Question Answering Systems – Closely related to IBM’s Watson – http://oaqa.github.io/

• SolrSherlock – http://debategraph.org/SolrSherlock

• Many others…

Page 11: Towards Cognitive Agents for BigData Discovery

Use Cases for Big Data Harvesting

• Resource Collection – Federation

• bring together and organize without filters

• Resource Augmentation – Tagging – Annotating – Debate

• Knowledge Cartography – Connecting resources – Map maintenance – More Debate

• Research Augmentation – Crowd-sourced discovery – Harvesting – Automated inferences /reasoning – Knowledge sharing

Federated Information Resources

Harvesting Activities

Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 29

Harvesting Activities Harvesting Activities

Page 12: Towards Cognitive Agents for BigData Discovery

A Strong Conjecture

• A Knowledge Federation’s topic map provides a Rosetta Stone-like substrate

– Reasoning by analogy

– Big Data mined for clues

– Map:

• Where we have been

• Where we haven’t (Dragons be here)

Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 33

Page 13: Towards Cognitive Agents for BigData Discovery

Topic Maps for Knowledge Federation

• Maintain well-organized by topic structure

• Key issue:

– For any given information resource added to a map:

• Agents must answer this question: – Have I seen this before by any other name or description?

Page 14: Towards Cognitive Agents for BigData Discovery

Are We There Yet?

• We are now at the edges of discovery:

– Deeper ways of representing

– Deeper ways of knowing

• Relational Biology

Page 15: Towards Cognitive Agents for BigData Discovery

Relational Biology

• Paraphrasing Nicholas Rashevsky*: – We can tease open a living cell and count all its

components, but we cannot put it back together and we have no clue why

• Interpreting Robert Rosen**: – Rashevsky’s quest for a relational mathematics for

biology (complex systems) entails topological algebras (Category Theory)

• Category theory is said to facilitate modeling the social lives of members of the categories

*http://en.wikipedia.org/wiki/Nicholas_Rashevsky **http://en.wikipedia.org/wiki/Robert_Rosen_(theoretical_biologist)

Page 16: Towards Cognitive Agents for BigData Discovery

Relational Modeling 1

• Starts with Ontologies

– Ontologies grant uniform vocabularies to universes of discourse

• Including describing data

– Ontology-based frameworks provide ways to model social and other relational structures

• SIOC: Semantically Interlinked Online Communities*

• SWAN: Semantic Web Applications in Neuromedicine**

*http://www.sioc-project.org/ **http://www.w3.org/TR/hcls-swan/

Page 17: Towards Cognitive Agents for BigData Discovery

SIOC Closer Look

• A way to model components entailed by a situation (blog post in this case) – Uniform vocabulary

– Structural relations

• Creates a foundation for much deeper modeling – Including:

• Other ontologies

• Other structures

• Feedback loops

SIOC Blog Post*

*http://rdfs.org/sioc/spec/

Page 18: Towards Cognitive Agents for BigData Discovery

Massive Connectivity and Feedback

http://geography.oii.ox.ac.uk/?page=home

Complex Communication Processes

Page 19: Towards Cognitive Agents for BigData Discovery

Feedback Loops: Crucial to Learning

Image: FEDERAL HEALTH FUTURES SUMMIT LEADERSHIP LEARNING for TRANSFORMATIONAL CHANGE. September 10-11, 2012 Washington DC Metro Region Page 23

Page 20: Towards Cognitive Agents for BigData Discovery

Relational Biology: Context

• Context is about Relations among the components themselves

• Context is about Relations among the components and their environment

• Context is about Feedback

Page 21: Towards Cognitive Agents for BigData Discovery

Example from Breast Cancer 1

Extracellular Matrix (EM) as Context

Complex Communication Processes

Milk producing tissue

http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer

Page 22: Towards Cognitive Agents for BigData Discovery

Example from Breast Cancer 2

Cells missing their EM Cells with restored EM

http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer

Page 23: Towards Cognitive Agents for BigData Discovery

Towards Cognitive Agents

• Harvest and represent – Patterns

• Actors

• Relations

• States

– Context in which patterns exist

• Discover – Processes

– Unrecognized connections

– …

Page 24: Towards Cognitive Agents for BigData Discovery

Watson’s Architecture* (Simplified)

• Analysis determines answer type and topics in play

• Hypothesis formation seeks candidate answers from sources – Pattern matching

• Hypothesis scoring weighs evidence for each hypothesis

• Answer ranking uses models to select answer

Question

Analysis

Answer Sources

Evidence Sources

Hypothesis Formation

Hypothesis Scoring

Answer Ranking

Answer

*http://www.aaai.org/Magazine/Watson/watson.php

Page 25: Towards Cognitive Agents for BigData Discovery

SolrSherlock Architecture (Simplified)

Topic Map

Conceptual Graphs

Harvested Documents

Harvester: HyperMembrane

Information Fabrics, Agents

Literature-based Discovery:

Process documents into structures

(information fabrics) from which patterns

are harvested.

Federate Data Analysis with

Literature: Federate Data

Observations and predictions with

concepts and relations harvested from the

literature

Model Processes, Structures, and

Analogies

Page 26: Towards Cognitive Agents for BigData Discovery

SolrSherlock Component Diagram

Topic Map

Conceptual Graph

Information Fabrics

TSC

TM Provider

CG Provider Machine Reader

TSC Provider

Open DeepQA Harvester

Pers

iste

nce

P

rovi

der

s A

gen

ts

Page 27: Towards Cognitive Agents for BigData Discovery

Looking Forward

• Coupling Literature-based research with BigData analysis

– Common ontologies

– Hypothesis formation

– Evidence gathering

– Relation discovery

Page 28: Towards Cognitive Agents for BigData Discovery

Completed Representation

antioxidants kill

free radicals

Contraindicates

macrophages use free radicals to

kill bacteria

Bacterial Infection Antioxidants

Because

Appropriate For

Compromised Host

Let us co-create Cognitive Agents for Discovery [email protected]

Thanks to Martin Radley , Patrick Durusau Sherry Jones, and Mark Szpakowski for valuable comments

SolrSherlock at: http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock