BIOINFORMATIK I UEBUNG 2 . mRNA processing.

26
BIOINFORMATIK I UEBUNG 2 http://icbi.at/ bioinf

Transcript of BIOINFORMATIK I UEBUNG 2 . mRNA processing.

BIOINFORMATIK I UEBUNG 2

http://icbi.at/bioinf

mRNA processing

splicing

U2AFGU YAGA

YAG

U1U4 U6

U5

GU

U2A

Spliceosome assembly

+ ~200 non-snRNPproteins

U4

U1

hnRNP

SR proteins

RNA helicases

kinases and phosphatases

Cyclophilins

U4 U6

U5

U2U6

U5YAGA

GUU1

Different levels of regulation

Regulation of transcription

Farnham, Nature Rev Genetics, 2009

ChIP procedure

AACTAGGTCAAAGGTCA

A/B A/B

E/F E/F

C C PPRE

PPAR RXRPPAR RXR

PPREDNA

microRNAs

http://www.mirbase.org/

Ensembl BioMart

UCSC Table Browser

UCSC Table Browser

Notepad++ and regular expressions

^ > . * \r \n

begin of line> any symbol

0 or more times

carriage return (CR) line feed (LF)

Notepad++ and regular expressions

character meaning

\ escape; used to make specials non-special

() group; you can retrieve its contents e.g. with \1 for the first occurrence

[] any character inside is considered a match

. matches any character

* match the previous character 0 or more times

+ match the previous character 1 or more times

{n} match the previous character n times

^ if the first character in the regex, means “beginning of line”; inside [] means “not”

$ last character in the regex, means “end of line”

\s any space character (space, tab)

\t tab (-->)

\r carriage return (CR)

\n line feed (LF)

Notepad++ and regular expressions

^[ACGT].*\r\n replace with

^(.{20}).*\r\n replace with \1\r\n

^>.*\r\n replace with

\r\n replace with

> replace with \r\n>

repeatMasking=none replace with \r\n

^>.*\r\n replace with .*(.{20})$ replace with \1

Sequence Logo

http://icbi.at/logo

KEGG

Protein domains

Uniprot, Prosite, Interpro, Pfam, CD, SMART

Gene Ontology

• cellular component (e.g. mitochondrium)• biological process (e.g. lipid metabolism)• molecular function (e.g. hydrolase activity)

Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term

The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism.

ISS Inferred from Sequence SimilarityIEP Inferred from Expression PatternIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIPI Inferred from Physical InteractionIDA Inferred from Direct AssayRCA Inferred from Reviewed Computational AnalysisTAS Traceable Author StatementNAS Non-traceable Author StatementIC Inferred by CuratorND No biological Data available

3 organizing principles

Evidence code

Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)

Orthologs

Homologs: A – B – C

Orthologs: B1 – C1

Paralogs: C1 – C2 –C3

Inparalogs: C2 – C3

Outparalogs: B2 – C1

Xenologs: A1 – AB1

Protein A

Orthologous prediction

Ortholog databases

• YOGY (eukarYotic OrtholoGY) is a web-based resource and integrates 5 independent resources (Sanger)

• COG Cluster of ortholog groups of proteins and KOG for 7 eukaryotic genomes (NCBI),

• Inparanoid (Center Stockholm Bioinformatics)

• HomoloGene (NCBI)

• OrthoMCL use Markov Clustering algorithm (University of Pennsylvania)

Multiple sequence alignment (CLUSTALW)

Progressive tree alignment

Jalview

Exercise 2-1: REGULATORY GENOMICS

Pyruvate Carboxylase as example

Ensembl Biomart1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as fasta file, length of 3'UTR

microRNA target prediction1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the sequence of microRNA hsa-mir-182.

UCSC genome browser1.3 Position of transcript start site and transcription end of Pyruvate carboxylase (NM_000920) in hg19 assembly

Exercise 2-1: REGULATORY GENOMICS

Find splicing signals1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders from pyruvate carboxylase using UCSC table browser and Notepad++1.5 Construct in both cases sequence logo and frequency plot. Can you identify (regulatory) sequence motifs?

Regulatory motifs (transcription factor binding sites) 1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse cell line that the transcription factor Pparg is binding near the pyruvate carboxylase gene and hence potentially regulate its transcription (ppar.wig). Show binding region as custom track in UCSC genome browser and extract sequence.

Exercise 2-2: PROTEIN FUNCTION

Identify function /processes/pathways for a protein2.1 What is the function of pyruvate carboxylase and in which pathways and processes this enzyme is involved?Show pathway maps and find Enzyme ID (EC) using KEGGIdentify functional domains and Gene Ontology Annotation of the protein sequence using Uniprot, Prosite, Pfam

Find orthologs and perform multiple sequence alignment2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus, Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and visualize with Jalview.