Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification...
Transcript of Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification...
![Page 1: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/1.jpg)
Uncovering the dynamics of genome function at the organismal scale using machine vision and computation
Tessa Durham Brooks, Ph.D. Doane College
Department of Biology
![Page 2: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/2.jpg)
Anticipation at the dawn of the Genomic Era
“Within the next few years, technologies developed for the Human Genome Project and similar sequencing efforts will revolutionize medicine, agriculture, crimefighting, and other fields.” – Gwynne and Page, Science, 2000
Genetronic Replicator – Star Trek: TNG, 1992
![Page 3: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/3.jpg)
The genomic powerhouses
~20,000 genes 97 mil base pairs Sequence finished 1998
~25,000 genes 100 mil base pairs Sequenced finished 2000
~22,000 genes 137 mil base pairs Sequence finished 2000
For reference: Humans have about 25,000 genes, 3.2 bil base pairs.
![Page 4: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/4.jpg)
The problem: at best functional roles have been assigned for 15% of predicted genes of a genome
For reference: Humans have about 25,000 genes, 3.2 bil base pairs.
~20,000 genes 97 mil base pairs Sequence finished 1998
~25,000 genes 100 mil base pairs Sequenced finished 2000
~22,000 genes 137 mil base pairs Sequence finished 2000
![Page 5: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/5.jpg)
Why has determining gene function in multicellular organisms been difficult?
![Page 6: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/6.jpg)
Why has determining gene function in multicellular organisms been difficult?
![Page 7: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/7.jpg)
Our goal: Describe genome function at the organismal scale.
Requirements:
• Observations should be made at sufficiently high spatial and temporal resolution
• Methods should be relatively high-throughput to allow genomic survey
• Observations should be able to be made over time and in many environmental contexts
~25,000 genes
![Page 8: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/8.jpg)
Root gravitropism: a model for image analysis approaches in functional genomics
![Page 9: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/9.jpg)
Root gravitropism: a model for image analysis approaches in functional genomics
0 2 4 6 8 10
0
20
40
60
80
100
glr3.3-1
glr3.3-2
SalkColT
ip A
ngle
(deg
)
Time (h)
First Order (swing rate)
Scal
e Sc
ale
Second Order (acceleration)
Time (h)
glr3.3-1 vs wt
glr3.3-2 vs wt
Miller, Durham Brooks, and Spalding 2010, Genetics
![Page 10: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/10.jpg)
Developing tools to detect genome function at the organismal scale.
Requirements:
Observations should be made at sufficiently high spatial and temporal resolution
• Methods should be relatively high-throughput to allow genomic survey
• Observations should be able to be made over time and in many environmental contexts
~25,000 genes
![Page 11: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/11.jpg)
Automation and High Throughput ver. 1
![Page 12: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/12.jpg)
Automation and High Throughput ver. 2 Doane Phytomorph
![Page 13: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/13.jpg)
• Recombinant inbred lines (RILs)
• Ecotypes (e.g. 1001 genomes project)
High throughput genetic stocks
Matthieu Reymond Max-Planck Institute
S T S T
QTL Analysis
![Page 14: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/14.jpg)
Developing tools to detect genome function at the organismal scale.
Requirements:
Observations should be made at sufficiently high spatial and temporal resolution
Method should be relatively high-throughput to allow genomic survey
• Observations should be able to be made over time and in many environmental contexts
~25,000 genes
![Page 15: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/15.jpg)
Phenotypes are plastic
Durham Brooks, Miller, and Spalding 2010, Plant Physiology
One ecotype
![Page 16: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/16.jpg)
Po
siti
on
(cM
)
Time (minutes)
LOD
Genomics analysis in a multi-dimensional condition space
Seed Size
Small Large
See
dlin
g A
ge
2d 164 lines X
15 indiv.
3d
4d
Moore, et al., unpublished result
![Page 17: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/17.jpg)
Developing tools to detect genome function at the organismal scale.
Requirements:
Observations should be made at sufficiently high resolution
Method should be relatively high-throughput to allow genomic survey
Observations should be able to be made over time and in many environmental contexts
• Cyberinfrastructure must facilitate the above
~25,000 genes
![Page 18: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/18.jpg)
Workflow
Data Capture
Data Capture
Data Storage (30 TB)
Data Compression and Analysis
Feature Extraction
Data Storage (X TB)
Schorr Center and OSG
0.5 TB/day
Data Compression
QTL analysis
![Page 19: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/19.jpg)
Data Compression
Scan Root Response
•Time Series of 220 uncompressed TIF’s •~225 MB each
Image Compression
(Condor on OSG)
•Time series of lossless -compressed PNG’s •~195 MB each
Auto Crop & Time Stamp
(Condor on OSG)
•Equalize dimensions •Insert the time stamp into the least significant bits of the first 14 pixels of each image
Video Compression (Local Grid)
• Compress to video using FFV1 codec •Uses a lossless intraframe codec •~160 MB/frame
Ship out to UW
![Page 20: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/20.jpg)
Feature Extraction
Compressed Data
•Decompress Data •Currently using FFV1 codec
Root tip Identification
•Isolate and track root tip •Currently image curvature features are used
Machine Learning
•Isolate plant’s Arial and ground tissue from image background •Currently Bayesian and SVM methods work well
Tip angle Measurement
• Linear regression on the root’s meristematic tissue Ship out to Doane
![Page 21: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/21.jpg)
QTL Analysis - Association of a phenotypic
value (e.g. root tip angle) with a genetic element
Compile Tip Angle Data
•Find mean tip angle at each time point for each genetic line
QTL Detection (Local Grid)
•Choose detection method •Optimize maximum likelihoods of trait data •Determine significance (LOD)
Determine Significance Threshold
(Condor on OSG)
•Permutation testing •Haley Knott or Multiple imputation (0.24 vs 63 CPU years per time point) •Determine max likelihood and LOD score of randomized data
Additional analysis
•Interactions, additive effects •Plasticity QTL •Conditional QTL (GxE effects)
Ship out to Doane
Ship out to Doane •Launch analysis locally •Bleed out to OSG
![Page 22: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/22.jpg)
Progress and Future Directions
• One small college has collected over 14,500 individual root gravitropic responses in six conditions (32 TB) in RIL population in 6 mo.
• We will finish collection from NILs (near-isogenic lines) - an additional 8,700 individuals, 19 TB in 1-2 mo.
• Begin image analysis and QTL analysis – dataset opens new doors in visualizing genomes
• Interdisciplinary undergraduate training opportunities
![Page 23: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/23.jpg)
Acknowledgments Doane College
Mike Carpenter (CIO), David Andersen, Dan Schneider Chris Wentworth (Physics) and Alec Engebretson (IST) Students: Amy Craig and Brad Higgins (Physics), Autumn Longo and Grant Dewey (Biochemistry), Tracy Guy, Miles Mayer, Halie Smith, Anthony Bieck, Sarah Merithew, Devon Niewohner, Muijj Ghani, Julie Wurdeman (Biology)
University of Wisconsin Edgar Spalding’s Laboratory: Nathan Miller, Candace Moore, Logan Johnson Miron Livny (CHTC)
UNL – Schorr Center and HCC David Swanson, Brian Bockelman, Carl Lundstedt
University of Florida Mark Settles
Computing Resources
Funding
PGRP - 1031416 EPSCoR - URE NE-INBRE
![Page 24: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning](https://reader035.fdocuments.net/reader035/viewer/2022063007/5fba61f144275c40b430efaf/html5/thumbnails/24.jpg)
Questions?