Cytogenetics payne lab_presentation_08282013
-
Upload
arka-pattanayak -
Category
Science
-
view
166 -
download
1
Transcript of Cytogenetics payne lab_presentation_08282013
CytoGPS (CytoGenetic Pattern Sleuth)Arka PattanayakZachary Abrams
Informatics Research & Development, Dept. of Biomedical Informatics at The Ohio State
University
08/28/2013
2
• Complex chromosomal aberration data – structure and knowledge.
• Inherently descriptive grammar – International System for human Cytogenetic Nomenclature (ISCN).
MOTIVATION : DATA
SOLUTION : CytoGPS
APPLICATIONS : MULTIPLE
• Parse karyotypes using Context-Free Grammar (CFG) rules.
• Extract morphological phrases.• Map phrases to abstract biological
meta-model.
• Discovery of important, obfuscated patterns in cytogenetic data.
• Targeted Treatment.• In-silico drug studies.
CytoGPS: 3-month status report
MOTIVATION Existing cytogenetic data:• Structured.• ISCN-conformant.• Multi-dimensional.
Minimal exploitation due to its informational complexity:• Syntactic variability.• Information density.• Human error.
3CytoGPS: 3-month status report
4
The Rulebook
CytoGPS: 3-month status report
CytoGPS PlatformSmart Parser of
Karyotypes
EBNF Grammar
Rules
Parser Generator
s and Parse Tree
Visitors
Biologically Abstracted Meta-Model
Mapper DSL
Genetic Pattern Matching
ML Algorithms
Phenotype-
phenotype Matching
5
CytoGPS: CytoGenetic Pattern Sleuth
CytoGPS: 3-month status report
Parsing Complex Karyotypes using SPoK (Smart Parser of Karyotypes)
6CytoGPS: 3-month status report
SPoK (Smart Parser of
Karyotypes)
• Enables in-silico analyses of complex karyotypes.
• Based on well-studied fundamentals in computational parsing (CFG, EBNF).
• Disease-agnostic.
• Multi-disciplinary effort - Biomedical Informatics, Cytogenetics, Hematology.
• ~76% of 3000 publicly available ISCN 2009 karyotypes were successfully parsed with this method.
7CytoGPS: 3-month status report
8
SPoK: Context-Free Grammar Rules
CytoGPS: 3-month status report
9
SPoK: Parser Generation
Deterministic Parser
ANTLR
CytoGPS: 3-month status report
10CytoGPS: 3-month status report
SPoK: A Parse Tree Showing the Morphological Deconstruction of a Complex Karyotype
46,XY,del(17)(p12),t(12;15)(p13;q20)
Functional Abstraction using LossGainFusion (Biologically Abstracted Meta-Model)
11CytoGPS: 3-month status report
LGF(Biologically Abstracted Meta-Model)
• Abstraction of ISCN aberrations observed in chromosomal bands to their biologically functional outcomes.
• Using a custom Domain-Specific Language (DSL)
• Karyotype complexity-agnostic.
• Human-readable karyotypes to machine-readable construct.
• ~90% of parsed karyotypes were successfully mapped using this model.
12CytoGPS: 3-month status report
13
LGF: Understanding Oncogenic Effects with an Abstracted Meta-Model
CytoGPS: 3-month status report
del(17)(p12)
del 17p12
del1:L
46,XY,del(17)(p12),t(12;15)(p13;q20)
t(12;15)(p13;q20)
t 12p13 15q20
t2:F,F
1.Complete karyotype
2.Chromosomalaberrations
3.ID and chromosomal locations
14CytoGPS: 3-month status report
LGF: Morphological Decomposition of Karyotypes
der(4)t(4;13)(p14;p18)
der(4) t(4;13)(p14;p18)
t 4p14 13p18der 4A B C D E
A+C=F
B,D,E add up to 3
F3:B,D,EWe don’t need B so we don’t put a annotation at that location. We need to put the biological response forD and E in there respective locations.
F3:,FL,FG
15CytoGPS: 3-month status report
LGF: Morphological Decomposition of Karyotypes (more complex example)
16
LGF: Domain-Specific Language for Mapping ISCN Aberrations to the Meta-Model.
CytoGPS: 3-month status report
Genetic Pattern Matching
17CytoGPS: 3-month status report
GPM: Genetic Pattern Matching using C.A.R.T. (Classification And Regression Tree) Algorithm
18
Features-band locations on X-axis
Karyotypes on Y-axis
1p36.3 1p36.2 1p36.1 … 1q44 2p25 … yp12
1 1 0 0 0 0 0 01 1 0 0 0 0 0 01 1 0 0 0 0 0 0
0 0 1 0 0 0 0 00 0 1 0 0 0 0 00 0 1 0 0 0 0 00 0 1 0 0 0 0 0
0 0 0 0 1 1 0 00 0 0 0 1 1 0 0
First cut
Second cut
Applied Biomedical Informatics using CytoGPS: A Case Study
19CytoGPS: 3-month status report
Case Study: In-silico Drug Studies
Raw ISCN Karyotypes. Parse Machine-readable Construct Map ISCN aberration to gene-set Map gene-set to known chemical reagent databases.
An end-to-end in-silico solution for Drug Studies Significant cost savings. Rapid. Flatter learning curve to operate such a system
compared to wet-lab testing.
20CytoGPS: 3-month status report
Case Study: Map ISCN Aberration to Gene-Set
Ensembl @see: http://beta.rest.ensembl.org/documentation/info/feature_region RESTful web service
endpoint Speaks JSON RESTful request looks like
this: http://beta.rest.ensembl.org/feature/region/human/17:15700000-16000000?feature=gene;content-type=application/json
21CytoGPS: 3-month status report
22
Case Study: Extracting Genetic Information
CytoGPS: 3-month status report
Zachary Abrams
Lori Dalton, PhD
Philip R. O. Payne, PhD
Arka Pattanayak
Raj Muthusamy,
PhD
Nyla Heerema,
PhD
William Kenworthy
Sarah Yousef
Alex Mysiw
Yuxiang Kou
Michael Berkovich
23CytoGPS: 3-month status report
24CytoGPS: 3-month status report
25CytoGPS: 3-month status report