Anthony J brookes

Big data and

Knowledge Engineering

for Health

May 2012, London Eduserv Symposium 2012: Big Data, Big Deal?

Prof. Anthony J Brookes: University of Leicester, UK

Different or No Big Data problems?

- Changing or stable rate of data generation / availability

- Changing or stable complexity of data

- Changing or stable requirement to use the data

- Changing or stable tooling to use the data

- Changing or stable mass of ‘useless’ data (vs knowledge)

Knowledge engineering was first defined in 1983 as “an engineering discipline that involves integrating knowledge into computer systems in order to solve complex problems normally requiring a high level of human expertise” (Feigenbaum and McCorduck, 1983).

‘KNOWLEDGE ENGINEERING’ for HEALTH

Building and engaging with the community:

- presentation & discussion at many international meetings and forums

- 1/2 day workshop as satellite to ESHG (6 invited speakers)

- workshop session at MIE2011 (3 invited speakers, audience discussion)

- I-Health 2011 workshop in Brussels, 3-4 Oct 2011

- growing community, currently >150 academics, companies, healthcare providers

Integration and Interpretation of Information for Individualised Healthcare http://www.i4health.eu/

150,000 published <100 routinely used

Mostly unknown to Healthcare

BIO-INFORMATICS MED-INFORMATICS

ACADEMICS COMPANIES

Data

Data

Biobanks

Registries

RESEARCH HEALTHCARE

RES

EAR

CH

HEA

LTH

CA

RE

I 4 H E A L T H

‘KNOWLEDGE ENGINEERING’ for HEALTH

RESEARCH WORLD

‘KNOWLEDGE GENERATION’ ...make sense of these entities

CLINICAL WORLD

‘KNOWLEDGE ENGINEERING’ ...identify & use the bits you understand

STANDARDS

• Semantic Standards (to allow unambiguous understanding of the data) – Terminologies, Ontologies, Vocabularies, Coding systems – Need cross-mapping between semantic standards, and across languages

• Syntactic Standards (to make data structures interoperable) – Data and Metadata object models, and Exchange formats – Minimal content specifications, harmonised across domains – Robust core requirements, with general principles that bring flexibility

• Technical Standards (to build a system that works efficiently)

– Database models, Search systems, and User interfaces (e.g., browsers) – Web-service specifications, Web 2.0 technologies – ID solutions for data, databases, publications, biobanks, researchers – Technologies for controlling data access and user permissions – Ethical and Legal policies, implementation, and recognition-rewards structures

• Quality Standards (to match data to needs)

– measuring and representing quality in a meaningful way – Important role here for metadata – Recording and standardising SOPs

2012-02-07 DCC roadshow East Midlands - CC-BY-SA 15

..personal data

Electronic Healthcare Records

EHR

Terminology

Information Model Communication

Models

Collection Models

Search and Retrieval Models

Classifications

Expressiveness Precision/rigour

Searchability Comparability Best Practice

Structure Detail Search Storage

Interoperability

Utility Categorisation Secondary use Decision

Making

Recording

Registration and Location

Models

Notify, Find

Data sharing - Incentive/reward systems

- 3 categories of risk, with ‘speed pass’ access control

- Compulsion/sanctions

- Researcher IDs (ORCID)

- Open data discovery (e.g., Cafe Variome)

- Remote pooled analysis (e.g., Data Shield, EU-ADR/EMIF)

Modelling:

‘Patient Avatars’ / ‘Virtual Patients’

Personalised medicine

Stratified medicine

BIG DATA:

The answer is not a data warehouse !

ARCHITECTURE:

Biosensors EHR

Modalities

Systems data

Text & Web pages

Computer Models

Decision Support Systems

BioScience & Omics

Databases

Fee

db

ack

/ O

pti

mis

atio

n

Self- Optimising

Emerging architectural Concept

DISORGANISED DIGITAL INFORMATION RELEVANT TO PERSONALIZED HEALTHCARE

Personal

Imaging Instrumentation Omics Clinical

Population Models

Data +

Information +

Knowledge

Knowledge Portals

Health Care Utility

Optimised Healthcare

Big Data can mainly stay at ‘source’, feeding the Knowledge Extraction process

Knowledge Extraction/Distillation filters therefore need to be created

Policy and Strategy

- To kick start the field: Put money into research, development, and application projects based upon the Knowledge Engineering concept

- To create the needed expertise: Cross-train people who have a talent for engineering in computer science + bioscience + healthcare

- To ensure interoperability across the total system: Organise activities on a middle-out basis, rather than the usual top-down or bottom-up approaches

- To ensure innovation and sustainability: explore ways to get academic and commercial players working together

- To start bringing the system to life: Emphasize knowledge ‘filtration’, ‘distillation’, and ‘provision’ from sources of (Big) Data

Acknowledgments

• GEN2PHEN Partners

• My team: Robert Free, Rob Hastings, Adam Webb, Tim Beck, Sirisha Gollapudi, Gudmundur Thorisson, Owen Lancaster

• Some key discussants: Søren Brunak, Debasis Dash, Carlos Diaz, Norbert Graf, Johan van der Lei, Heinz Lemke, Ferran Sanz

This work received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project.

“Data-to-Knowledge-for-Practice” (D2K4P) Center

Anthony J brookes

Technology

Transcript of Anthony J brookes