Full genome characterization of porcine circovirus type 3 ...
Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16...
Transcript of Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16...
![Page 1: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/1.jpg)
Bioinformatics and Connection to Computational Pathology
Fayyaz Minhas
Department of Computer Science
University of Warwick
![Page 2: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/2.jpg)
![Page 3: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/3.jpg)
Why Bioinformatics?• How do we know that humans and
chimpanzees share more than 95% of their DNA?• Human Genome Project
3
How to
compare?
![Page 4: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/4.jpg)
Why Bioinformatics?
• The knapsack problem• Uses dynamic
programming
4
![Page 5: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/5.jpg)
Why Bioinformatics?
• Tree of life
5
![Page 6: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/6.jpg)
Why Bioinformatics?
• How are humans across the Earth related to each other?• Human Genographic project
6
![Page 7: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/7.jpg)
Why Bioinformatics?
• How can we screen for disease?
7
![Page 8: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/8.jpg)
Why Bioinformatics?
• Personalized medicine
8
![Page 9: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/9.jpg)
Why Bioinformatics?
• How can we fight against diseases like Cancer?
9
![Page 10: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/10.jpg)
Handling viruses• In Silico Prediction and Validations of Domains
Involved in Gossypium hirsutum SnRK1 Protein Interaction with Cotton leaf curl Multan betasatellite encoded βC1
• βC1, pathogenicity determinant encoded by Cotton leaf curl Multan betasatellite interacts with calmodulin-like protein 11 (CML11) in Gossypium hirsutum
10In Silico Prediction and Validations of Domains Involved in Gossypium hirsutum SnRK1 Protein Interaction with Cotton leaf curl Multan betasatellite encoded βC1, Kamal, Hira, Fayyaz ul Amir Afsar Minhas, Hanu Pappu, Imran Amin et al., in Frontiers in Plant Science 10 (2019): 656.Bioinformatics and molecular analysis of Gossypium hirsutum calmodulin-like protein (CML11) interaction with begomovirus-transcription activator protein C2. Hira Kamal, Fayyaz Minhas, et al., in PLoSOne (In press).
![Page 11: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/11.jpg)
Why Bioinformatics?
• How can we find out what are the effects of a certain disease?
11
![Page 12: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/12.jpg)
Why Bioinformatics?
• How can we design new life?
12
https://www.ted.com/talks/craig_venter_unveils_synthetic_life
![Page 13: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/13.jpg)
Molecular Biology Fundamentals
![Page 14: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/14.jpg)
Genome: The ‘Program’
• Genome is the genetic material of an organism
• Deoxyribonucleic acid (DNA)• Encodes these genetic instructions
14
![Page 15: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/15.jpg)
How is the program stored?
15
Watson & Crick with DNA model
Rosalind Franklin with X-ray image of DNA
![Page 16: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/16.jpg)
Program size: DNA base pairs (bp)
16
Organism # of base pairs # of Chromosomes
Virus
HIV 9193 1
SARS 29751 1
Porcine circovirus 1759 1
Prokayotic
Haemophilus influenzae 1.8x106 1
Escherichia coli (bacterium) 4.6x106 1
Carsonella ruddii 159, 662 (0.16M) 1Eukaryotic
S. cerevisiae (yeast) 1.35x107 17
Drosophila melanogaster (fly) 1.65x108 4
Homo sapiens (human) 2.9x109 23
Paris japonica 150x109 -
http://www.nature.com/news/2006/061009/full/news061009-10.htmlhttp://en.wikipedia.org/wiki/Genome
![Page 17: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/17.jpg)
Execution of the program: Central dogma of molecular biology
17
splicing
(pre-mRNA)
mRNA→ ProteinRibosome
RNA bases to amino acids(A,U,G,C) to (A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Z)
DNA →mRNA: (A,T,G,C) to (A,U,G,C)RNA polymerase
![Page 18: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/18.jpg)
![Page 19: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/19.jpg)
Proteins
19
![Page 20: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/20.jpg)
Non-Sense Mutation
• A point mutation in a sequence of DNA that results in a premature stop codon
• Protein product is incomplete or non-functional
• Beta-Thalassemia• Results from a single point mutation
• HBB gene on chromosome 11
• Reduction in production of hemoglobin
• HBB blockage over time leads to decreased Beta-chain synthesis
• Having a single gene for thalassemia may protect against malaria
• One of the most commonly inherited disorders in Pakistan
• With a prevalence rate of 6 % in the Pakistani population
• 5000-9000 children every year
20
![Page 21: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/21.jpg)
What is going on in your body?
21
What can the cell do?
What is it doing?
How is it doing it?
![Page 22: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/22.jpg)
Sequencing Technologies &
Algorithms
![Page 23: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/23.jpg)
Human Genome Project
• Started in 1990
• Objective: Sequence the human genome by 2005
• Achieved: 2000 • Government consoritium
• Cost: $3 Billion
• Craig Venter’s Celera / Solexa
• $1000 genome project
• 1000 genomes project
23
![Page 24: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/24.jpg)
Sequencing Technologies• Sanger Sequencing
• 454 Sequencing / Roche
• GS Junior System
• GS FLX+ System
• Illumina (Solexa)
• HiSeq System
• Genome analyzer IIx
• MySeq
• Applied Biosystems - Life Technologies
• SOLiD 5500 System
• SOLiD 5500xl System
• Ion Torrent - Life Technologies
• Personal Genome Machine (PGM)
• Proton
• Helicos
• Helicos Genetic Analysis System
• Pacific Biosciences
• PacBio RS
• Oxford Nanopore Technologies
• GridION System
• MinION 24
First Generation
2nd Generation(Next Generation Sequencing, NGS)(Deep Sequencing)(High-throughput sequencing)Amplified Single Molecule SequencingMost widely used right now
3rd Generation(Next Next Generation Sequencing)Single molecule sequencing
HiSeq 2000
![Page 25: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/25.jpg)
Steps in Sequencing
• DNA Extraction
• Preprocessing (Amplification , …)
• Sequencing
• Shotgun sequencing• Reads
• Assembly
• Data analysis
25
![Page 26: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/26.jpg)
Shotgun Sequencing: The case of exploding newspapers
26
![Page 27: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/27.jpg)
Joining overlapping reads
27
![Page 28: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/28.jpg)
Completing the overlap puzzle
28
![Page 29: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/29.jpg)
Sequencing and newspaper explosions: 1• Take (millions of) copies of the DNA you want to
sequence
29
![Page 30: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/30.jpg)
Sequencing and newspaper explosions: 2• Fragment the DNA into smaller pieces
• Because our sequencing technologies can only read very short fragments reliably
30
![Page 31: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/31.jpg)
Sequencing and newspaper explosions: 3• The short fragments resulting from DNA
fragmentation are called reads
• Some reads disappear
31
![Page 32: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/32.jpg)
Sequencing and newspaper explosions: 3• We get the reads but we have no idea where they
came from in the DNA• No position information
• Need to reconstruct the DNA sequence
32
![Page 33: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/33.jpg)
Sequencing and newspaper explosions: 4• Solve it as an overlap puzzle
33
![Page 34: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/34.jpg)
Sequencing and newspaper explosions: 5• Reconcile the pieces
34
![Page 35: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/35.jpg)
Sequencing as a computational problem
35
![Page 36: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/36.jpg)
![Page 37: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/37.jpg)
Comparison of de novo assemblers• Zhang, Wenyu, Jiajia Chen, Yang Yang, Yifei Tang, Jing Shang, and Bairong Shen. “A Practical
Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies.” PLoS ONE 6, no. 3 (March 14, 2011). doi:10.1371/journal.pone.0017915.
37
![Page 38: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/38.jpg)
Sequencing: Costs & Amount
•
38
http://sulab.org/2013/06/sequenced-genomes-per-year/
![Page 39: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/39.jpg)
Applications of Sequencing
![Page 40: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/40.jpg)
How sequencing machines work
• Input DNA/RNA Sample
• Output• FASTQ Files: Reads stored
• Also have quality information
• Phred quality scoring
• Different machines use different formats on quality
40
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
![Page 41: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/41.jpg)
Genome Assembly
• Based on reads, assemble them into a genome
• Whole Genome Sequencing• Input: FASTQ file
• Output: Genome
• Whole Exome or Targeted Sequencing• Input: FASTQ file of reads, Reference Sequence
• Output: Genome
![Page 42: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/42.jpg)
RNA-Seq: What are you doing?• Input: Reference Genome, RNA reads
• Output: Alignment File (SAM or BAM)• Tell where each read is aligned
![Page 43: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/43.jpg)
RNA-Seq
![Page 44: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/44.jpg)
Differential expression
http://www.fejes.ca/labels/figures.html
![Page 45: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/45.jpg)
![Page 46: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/46.jpg)
Gene Expression meets Pathology
https://www.nejm.org/doi/full/10.1056/NEJMoa021967
![Page 47: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/47.jpg)
Knowledge Transfer between spaces
• Platinum-based combination chemotherapy response in Ovarian Cancer
• 224 Cases (159 Sensitive, 65 Resistant)
• Both Gene Expression and H&E WSIs available
Patient-wise bootstrap ROC
Gene Expression: selected 227 genes
47https://github.com/deroneriksson/python-wsi-preprocessing/blob/master/docs/wsi-preprocessing-in-python/index.md
RBF SVM
(MIL-CNN)
Gene Expression Based
https://arxiv.org/abs/1803.04054
WSI: 80Kx80K(Input Space)
20X
ROI Patches: 512x512Top 50 scoring patches per slide
Color factor (pink/purple)Saturation and value factorTissue quantity factorTissue percentageScore
Ref
Stain Normalized
Source
Positive bag (1 per “sensitive” patient)
Negative bag (1 per “resistant” patients)
Stain NormalizationPatch Extraction Formation of bags for MIL
Pretrained on breast cancer classification
If a tumor is sensitive, then all of it may not be sensitive
If a tumor is resistant, then all (or at least most) of it is not sensitive
MIL Based Loss
max 0,1 − 𝑌𝐵𝑚𝑎𝑥𝑖∈𝑩 𝑓 𝒙𝒊; 𝜽CNN
MIL Training for CNN
Gene Expression
Pathology Imaging
![Page 48: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/48.jpg)
Integration of genetic information changes tile selection
Pathology space only
Heterogeneous feature space
TCG
A-1
3-1
48
8TC
GA
-13
-14
97
TCG
A-2
9-1
70
5TC
GA
-25
-23
93
Pathology space only
Heterogeneous feature space
![Page 49: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/49.jpg)
Thoughts on the future
• Multi-view prediction!• Understanding similarity in gene space and pathology
space
• Linking pathways to pathology
• Understanding causal connections
![Page 50: Bioinformatics and Connection to Computational Pathology · Program size: DNA base pairs (bp) 16 Organism # of base pairs # of Chromosomes Virus HIV 9193 1 SARS 29751 1 Porcine circovirus](https://reader034.fdocuments.net/reader034/viewer/2022050503/5f9507da763a4a4a993b2e57/html5/thumbnails/50.jpg)
Biology easily has 500 years of exciting problems to work on.
Knuth.