생물학 연구를 위한 컴퓨터 활용기술 11강

Post on 13-Feb-2017

2.032 views 1 download

Transcript of 생물학 연구를 위한 컴퓨터 활용기술 11강

Computational Skill for Modern Biology Research

Department of BiologyChungbuk National University

11th Lecture 2015.11.24

NGS Analysis IV : Gene Set Analysis

Syllabus주 수업내용1 주차 Introduction : Why we need to learn this stuff?

2 주차 Basic of Unix and running BLAST in your PC

3 주차 Unix Command Prompt II and shell scripts

4 주차 Basic of programming (Python programming)

5 주차 Python Scripting II and sequence manipulations

6 주차 Ipython Notebook and Pandas

7 주차 Basic of Next Generation Sequencings and Tutorial

8 주차9 주차 Next Generation Sequencing Analysis I

10 주차 Next Generation Sequencing Analysis II

11 주차 Next Generation Sequencing Analysis III

12 주차 Next Generation Sequencing Analysis IV

13 주차14 주차

Differential Expression Data MiningSlueth

Analysis-Test Table

Download Table

Data Mining with Ipython NotebookRead ‘test_table.csv’ as dataFrame

P Values FDR(False Discovery Rate)

Mean expression level (logged)

Fold Change

Remove datasets without data

Filtering

Fold change is bigger than 2 FDR is less than 0.01

Observation(expression level is higher than 2)

Read abundance table for each samples

Save them as abundance.csv

Read abundance Table in Pandas

Same Transcripts

Different Samples

Extract transcripts id with differential expression

Select transcripts with differential expression met on criterions

Using ‘pivot’, reshape dataFrame

Calcurate average of tpm for treatment and samples, and filter them out

Draw Clustermap

Use packages called ‘seaborn’ (if it is not there)

In command line, conda install seaborn

Clustermap

Red : overexpressed geneBlue : Downregulated gene

Zoom out these regions

Find out Gene names corresponding upper regions

Application of NGS technology- DNA : Genome Sequencing

• Genome Sequence• Personal Genomic Sequencing : Variant Discovery

- RNA : RNA-Seq

• Expression levels of mRNA

- Anything Else?

- Epigenetics States of Cell

DNA methylationHistone methylation

- Transcription Factors binding : ChIP Sequencing

- Chromatin Status

- RIP-Seq : RNA-Protein Interactions

Application of NGS technology

Application of NGS on Epigenetics

Epigenetics : changes in gene expression without sequence changes

During development of organisms, cell undergoes various differentiation stageAlthough they share common DNA, they have different expression pattern

How these different expresion patterns were determined?

DNA Methylation

Histone Modification

Two Factors in epigenetics

Histone Modification

NGS and Epigenetics- How we can deduce DNA methylation or Histone Mark?

- DNA methylation : Bisulfide Sequencing

- Histone Mark : Chromatin Immunoprecipitation – Sequencing (CHiP-Seq)

* 어떻게 Methylation 된 C 를 알 수 있는가 ?

Bisulfide Sequencing

• By treatment of Bisulfide on DNA, Cytosine is changed as Uracile (Read as T)

• Methylated Cytosin resistant to bisulfide treatments

Genome Wide BS-Seq

Analysis of Bisulfide Sequencing

CGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG

1. Reference Sequence

2. C-T Conversion except CG

CGGGCGTGGTGGCGCGCGTTTGTAATTTTAGTTATTCGGGAGGTTGAGGTAGGAGAATCGTTTGAATTCGGGAGGCGGAGGTTGTAGTGAGTCGAGATCGCGTTATTGTATTTTAGTTTGG

CGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG

3. Converted sequence

4. Align sequecing results in Converted Sequence

Analysis of small portion of sequences

http://services.ibc.uni-stuttgart.de/BDPC/index.php

http://services.ibc.uni-stuttgart.de/BDPC/BISMA/examples_unique.php

<- 시퀀싱 데이터

<- 레퍼런스 데이터

DNA Methylation in Genome Browser

- Histone Mark : Chromatin Immunoprecipitation – Sequencing (CHiP-Seq)

Histone MethylationHistone Acetylation

-> align on reference genome

지놈의 어떤 영역에 어떤 Histone Mark 가 있는지를 파악가능

Histone Mark

After Sequencing

Quality Control

Align to reference Genome

Analysis of alignment file (Finding Peak)

Motif Discovery / Secondary Analysis

ChIP results in Genome Browser

H3K4me3 : Mark for active Promoter

H3K27ac : Mark for active Promoter

Transcription Start

H3K27me3 : Inactive chromatin

ChIP with other factors

Transcription Factors

“Yamanaka Factors”

- Oct4, Sox2, Klf-4, c-Myc (OSKM)- Transcription Factors which express abundantly in Embryonic Stem Cell- Screened from 24 transcription factors expressed in ESC- Retroviral expression of these 4 genes in embryonic/Adult fibroblast transformCells into ‘Stem Cell Like’ cells

iPSC (induced Pluripotent Stem Cell)

Molecular event of induced pluripotency

Questions

How we know the specific transcription factors bind which DNA?

Electrophoresis Mobility Shift Assay (EMSA)

Binding of Protein with DNASlow down migration speed in gel

Label DNA with isotpe

Drawbacks : Low throughput, You cannot test genome wide levels

Genome Sequence

Target Site of Transcription Factor

Chromatin immunoprecipitationSequencing

Genome Sequence

Read Depth ( 얼마나 많은 시퀀싱 Read 가 특정위치에 쌓여있는가 ?) 에 의해 전사인자의결합부위를 확인

Sequence Mapping

Transcription Factor

Gene Expressed by Estrogen Stimulations

Transcription Factor Binding

Transcription

Chromatin Status

“ 단단히 꼬여있는 부분과 그렇지 않은 부분의 파악”

RIP-Seq

ChIP-Seq : Find out DNA regions bind to specific protein

Then, How about RNA?How we can find RNAs bound on specific Proteins?

RIP-Seq : RNA interacting Protein Sequencing

http://rbpdb.ccbr.utoronto.ca//

고등생물에는약 200-400 개의RNA bindingProtein 이 존재

http://cistrome.org/dc