Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

67
Machine Learning & Bioinformatics Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Transcript of Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Page 1: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Machine Learning& Bioinformatics

Machine Learning & Bioinformatics 1

Tien-Hao Chang (Darby Chang)

Page 2: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Machine Learning & Bioinformatics 2

Molecular biology Nucleic acid

– DNA

– RNA

Central dogma– Transcription

– Translation

Protein– Amino acid

– Primary structure

– Secondary structure

– Tertiary structure

Page 3: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Nucleic acid A nucleic acid is a macromolecule composed

of chains of monomeric nucleotide In biochemistry these molecules carry genetic

information or form structures within cells The most common nucleic acids are

deoxyribonucleic acid (DNA) and ribonucleic

acid (RNA)

Machine Learning & Bioinformatics 3

Page 4: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/NA%20Fig1.jpg

Page 5: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Nucleic acid components

Sugar

Machine Learning & Bioinformatics 5

http://www.mun.ca/biology/scarr/Fg10_09b_revised.gif

Page 6: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Nucleic acid components

Base Purine

–Adenine (A) and guanine (G)

Pyrimidine–Thymine (T), cytosine (C)

–Uracil (U, only in RNA)

Machine Learning & Bioinformatics 6

Page 7: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://www.elmhurst.edu/~chm/vchembook/images/580bases.gif

Page 8: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://fig.cox.miami.edu/~cmallery/150/chemistry/sf3x14a.jpg

Page 9: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

DNA Chemically, DNA is a long polymer of simple units

called nucleotides, with a backbone made of sugars

and phosphate groups joined by ester bonds Attached to each sugar is one

of four types of molecules

called bases It is the sequence of these four

bases along the backbone that

encodes informationMachine Learning & Bioinformatics 9

http://upload.wikimedia.org/wikipedia/commons/8/87/DNA_orbit_animated_small.gif

Page 10: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

DNA

Base pairing Each type of base on one strand forms a bond with

just one type of base on the other strand Here, purines form hydrogen bonds to pyrimidines,

with A bonding only to T, and C bonding only to G DNA sequence

– 5’CpGpCpApApTpT

3’TpTpApApCpGpC

– CGCGAATT

Machine Learning & Bioinformatics 10

Page 11: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://www.ucl.ac.uk/~sjjgsca/NucleotidePairing.jpg

Page 12: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://www.coe.drexel.edu/ret/personalsites/2005/dayal/curriculum1_files/image001.jpg

Double helix

Page 13: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Hydrogen bond A hydrogen bond exists between an electronegative atom and

a hydrogen atom bonded to another electronegative atom This type of force always involves a hydrogen atom and the

energy of this attraction is close to that of weak covalent

bonds (155 kJ/mol), thus the name – Hydrogen Bonding Biological functions

– DNA/RNA base paring

– protein secondary/tertiary structure formation

– some properties of water molecule

– antibody-antigen (and other protein-protein) binding

Machine Learning & Bioinformatics 13

Page 14: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://upload.wikimedia.org/wikipedia/commons/4/43/Liquid_water_hydrogen_bond.png

Hydrogen bond is resulted from electronegativity

Page 15: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://courses.biology.utah.edu/horvath/biol.3525/1_DNA/Fig2/marty_1.jpg

Grooves

Page 16: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

DNA structure

Machine Learning & Bioinformatics 16

http://www.youtube.com/watch?v=qy8dk5iS1f0&NR=1

Page 17: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Any Questions?

Machine Learning & Bioinformatics 17

About DNA

Page 18: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://fig.cox.miami.edu/~cmallery/255/255hist/mcb4.1.dogma.jpg

Central dogma

Page 19: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Central dogma The process by witch information is extracted from

the nucleotide sequence of a gene and then used to

make a protein is essentially the same for all living

things on Earth

and is described by the grandly

named central dogma of

molecular biology Information in cells passes from

DNA to RNA to proteinsMachine Learning & Bioinformatics 19

http://upload.wikimedia.org/wikipedia/commons/3/3a/Crick's_1958_central_dogma.svg

Page 20: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

RNA Information stored from DNA is used to make a more

transient, single-stranded polynucleotide called RNA

(Ribonucleic Acid) RNA is very similar to DNA, but differs in a few important

structural details– in the cell RNA is usually single stranded, while DNA is usually

double stranded

– RNA nucleotides contain ribose while DNA contains deoxyribose

(a type of ribose that lacks one oxygen atom)

– in RNA the nucleotide uracil substitutes for thymine, which is

present in DNAMachine Learning & Bioinformatics 20

Page 21: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://www.dadamo.com/wiki/dna-rna.png

Page 22: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Central dogma

Transcription Transcription is the synthesis of RNA under the

direction of DNA Both nucleic acid sequences use the same

language, and the information is simply

transcribed, or copied DNA sequence is copied by RNA polymerase to

produce a complementary nucleotide RNA

strand, called messenger RNA (mRNA)Machine Learning & Bioinformatics 22

Page 23: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

DNA transcription

Machine Learning & Bioinformatics 23

http://www.youtube.com/watch?v=vJSmZ3DsntU

Page 24: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Transcription detail

Machine Learning & Bioinformatics 24

http://www-class.unl.edu/biochem/gp2/m_biology/animation/m_animations/gene2.swf

Page 25: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

RNA

Various types mRNA

– messenger RNA (mRNA) is the RNA that carries

information from DNA to the ribosome

– the coding sequence of the mRNA determines the

amino acid sequence in the protein that is produced

Non-coding RNA

Machine Learning & Bioinformatics 25

Page 26: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Various RNA types

Non-coding RNA Many RNAs do not code for protein These ncRNAs encode in specific genes (RNA

genes) or mRNA introns The most common ncRNAs are transfer RNA

(tRNA) and ribosomal RNA (rRNA) Other ncRNAs such as microRNA (miRNA)

involve in post-transcriptional gene regulation

Machine Learning & Bioinformatics 26

Page 27: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://eurheartj.oxfordjournals.org/content/vol0/issue2010/images/large/ehp57301.jpeg

Page 28: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Central dogma

Translation Translation is the second stage of protein

biosynthesis Translation occurs in the cytoplasm where the

ribosomes are located In translation, mRNA is decoded to produce a

specific polypeptide according to the rules

specified by the genetic code

Machine Learning & Bioinformatics 28

Page 29: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

From RNA to protein synthesis

Machine Learning & Bioinformatics 29

http://www.youtube.com/watch?v=NJxobgkPEAo

Page 30: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein translation

Machine Learning & Bioinformatics 30

http://www.youtube.com/watch?v=nl8pSlonmA0

Page 31: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://biology.kenyon.edu/courses/biol114/Chap05/code.gif

Page 32: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Any Questions?

Machine Learning & Bioinformatics 32

About central dogma

Page 33: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein

Machine Learning & Bioinformatics 33

Page 34: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein Proteins are large organic compounds made of amino

acids arranged in a linear chain and joined together

by peptide bonds between the carboxyl and amino

groups of adjacent amino acid residues Proteins can also work together to achieve a

particular function, and they often associate to form

stable complexes

Machine Learning & Bioinformatics 34

Page 35: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein

Amino acid In chemistry, an amino acid is a molecule that

contains both amine and carboxyl functional

groups In biochemistry, this term refers to alpha-

amino acids with the general formula

H2NCHRCOOH, where R is an organic

substituent

Machine Learning & Bioinformatics 35

Page 36: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/AminoAcidball.svg/702px-AminoAcidball.svg.png

Page 37: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Amino acid

Various side chains The various alpha amino acids differ in which

side chain (R group) is attached to their alpha

carbon They can vary in size from just a hydrogen

atom in glycine through a methyl group in

alanine to a large heterocyclic group in

tryptophan

Machine Learning & Bioinformatics 37

Page 38: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Aa.svg/2000px-Aa.svg.png

Page 39: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Amino%281%29%202007/A1-7.JPG

Page 40: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Amino%281%29%202007/A1-9.JPG

Page 41: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Machine Learning & Bioinformatics 41

http://www.russell.embl-heidelberg.de/aas/other_images/lb3.gif

Page 42: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Amino acid

The building blocks of proteins Amino acids combine in a condensation

reaction and the new “amino acid residue” are held together by peptide bonds

Proteins are defined by their unique sequence of residues (primary structure)

As the letters form various words, amino acids form a vast variety of sequences/proteins

Machine Learning & Bioinformatics 42

Page 43: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Peptidformationball.svg/2000px-Peptidformationball.svg.png

Page 44: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Amino(1)%202007/A1-11.JPG

Page 45: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Amino(1)%202007/A1-13.JPG

Page 46: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein

After knowing amino acids Amino acids form short polymer chains called

peptides or longer chains called either

polypeptides or proteins The process of such formation from an mRNA

template (obeying genetic code) is known as

translation, which is part of protein

biosynthesis

Machine Learning & Bioinformatics 46

Page 47: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein structure hierarchy

Machine Learning & Bioinformatics 47

Page 48: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://cropandsoil.oregonstate.edu/classes/css430/lecture%209-07/figure-09-03.JPG

Page 49: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-4.JPG

Page 50: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-8.JPG

50

Page 51: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-9.JPG

Page 52: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein structure hierarchy

Secondary structure In biochemistry and structural biology,

secondary structure is the general three-

dimensional form of local segments of

biopolymers such as proteins and nucleic acids It does not, however, describe specific atomic

positions in three-dimensional space, which

are considered to be tertiary structure

Machine Learning & Bioinformatics 52

Page 53: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Protein(2)%202007/P2-3.JPG

Page 54: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein structure hierarchy

Tertiary structure The three-dimensional structure of a protein or

any other macromolecule, as defined by the

atomic coordinates Describe the spatial relations among it

secondary structures Tertiary structure is considered to be largely

determined by the protein’s primary sequence

Machine Learning & Bioinformatics 54

Page 55: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein tertiary structure

Experiment techniques The majority of protein structures have been

solved with X-ray crystallography The second common way is NMR (Nuclear

Magnetic Resonance)– lower resolution

– limited to small proteins

– provide time-dependent information in solution

Machine Learning & Bioinformatics 55

Page 56: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://campusapps.fullerton.edu/news/arts/2003/photos/protein-art.jpg

Page 57: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein structure hierarchy

Quaternary structure Many proteins are actually

assemblies of more than one

polypeptide chain, which in the

context of the larger assemblage

are known as protein subunits In addition to the tertiary structure

of the subunits, multiple-subunit

proteins possess a quaternary

structure, which is the arrangement

into which the subunits assembleMachine Learning & Bioinformatics 57

http://courses.cm.utexas.edu/jrobertus/ch339k/overheads-1/ch6_quat-struct1.jpg

Page 58: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein sub-structure

Machine Learning & Bioinformatics 58

Page 59: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein sub-structure

Domain A part of protein sequence

and structure that can

evolve, function, and exist

independently About 25–500 aa Often form functional

units

Machine Learning & Bioinformatics 59

http://upload.wikimedia.org/wikipedia/commons/6/67/1pkn.png

Page 60: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://upload.wikimedia.org/wikipedia/commons/7/79/Zinc_finger_DNA_complex.png

Zinc fingers are small protein structural motifs that can coordinate zinc ions to help stabilize their folds

Page 61: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein sub-structure

Motif A sequence motif indicate a nucleotide or

amino-acid sequence pattern that is

widespread and often has a biological

significance For proteins, a sequence motif is distinguished

from a structural motif, a motif formed by the

three dimensional arrangement of amino acids,

which may not be adjacentMachine Learning & Bioinformatics 61

Page 62: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Protein sub-structure

Structure motif A 3D structural element or fold, which appears

also in a variety of other molecules In the context of proteins, the term is

sometimes used interchangeably with

“structure domain,” although a domain need

not be a motif nor, if it contains a motif, need

not be made up of only one

Machine Learning & Bioinformatics 62

Page 63: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)
Page 64: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://www.biomedcentral.com/content/figures/1471-2164-8-60-8.jpg

Page 65: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-3.JPG

Page 66: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Molecular biology

Reference 台大莊榮輝教授網站

– http://juang.bst.ntu.edu.tw/BC2008/index.htm

交大分子生物學網站– http://www.life.nctu.edu.tw/~mb/c40101.htm

Machine Learning & Bioinformatics 66

Page 67: Machine Learning & Bioinformatics 1 Tien-Hao Chang (Darby Chang)

Any Questions?

Machine Learning & Bioinformatics 67

About molecular biology