Comparative Assembly for Cancer Human Genome
description
Transcript of Comparative Assembly for Cancer Human Genome
Comparative Assemblyfor
Cancer Human GenomeGao Song
2010/02/03
Background Knowledge Problem Description Framework of Solution Own Methods Results
Content
Pair End Tag (PET)
Background Knowledge
Concordant PET (CPET)
Discordant PET (DPET)◦ Distance or orientation is incorrect◦ Map to different chromosomes
DPET Cluster
Background Knowledge
Given:◦ Frequency of DPET and CPET along the reference
genome◦ DPET Cluster
Requirement:◦ Find rearrangement of cancer genome compare to
normal human genome◦ Now focus on Amplicons
Problem Description
The reference genome is cut when CPET is 0=> some big contigs
According to DPET, find the breakpoints Using CPET to check if there is connection
between breakpoints Convert DPET Cluster into edges in the
graph Using high copy edges to form subgraph of
amplicons
Framework of Solution
Framework of Solution
DPET Start and End Breakpoint
CPET
Filted BreakPoints
Original Contigs
Small Contigs
DPETReference Genome
Edges CPETNodes
Graph
DPET Frequency Curve Using DPET directly
choose a threshold to Select the breakpoint
Problem:◦ How to choose the threshold◦ Within amplicon region, it is hard to find the
breakpoint – basic frequency is too much
Own Methods-NaiveChromosome 9
Using slope(differentiation)
Problem:◦ How to define threshold◦ Too many false positive◦ Also miss some DPET cluster
Own Methods - Slope Chromosome 9
In breakpoint, DPET increases, CPET decreases
Can be used as another criteria Problem
◦ Another Parameter!
Own Method – Consider Ratio
Using slope to find the threshold The previous missing point can be found
New methods of finding breakpoint
Localize checking Using two consecutive windows
◦ Each window has: μ σ
◦ Null Hypothesis: σ2 is not significantly larger than σ1
◦ Using Binomial Testing:
Significance level: 0.05
Own Method – Hypothesis Testing
window1 window2
Some details:◦ Check if the cluster region is included in window
Not finished yet Calculating σ is time-consuming
- have to recalculate after each step
Own Method – Hypothesis Testing
Results(slope)
10k 20k# of subgraph 72 35
Max chromosome inOne subgraph
4 4
Average chromosomeIn one subgraph
1.18 1.23
Max edge inOne subgraph
42 44
Average edgeIn one subgraph
5.47 5.77
One Special Case
10k Lib
20k Lib
10k lib 20k lib
Another example