EECS 800 Research Seminar Mining Biological Data
-
Upload
quentin-branch -
Category
Documents
-
view
18 -
download
1
description
Transcript of EECS 800 Research Seminar Mining Biological Data
The UNIVERSITY of Kansas
EECS 800 Research SeminarMining Biological Data
Instructor: Luke Huan
Fall, 2006
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide2
9/25/2006Protein Structures
IntroductionIntroduction
ProteinA sequence from 20 amino acids
Adopts a stable 3D structure that can be measured experimentallyLys Lys Gly Gly Leu Val Ala His
RibbonSpace fillingCartoon Surface
Oxygen
Nitrogen
Carbon
Sulfur
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide3
9/25/2006Protein Structures
Exponential Growth of Protein Structures
Exponential Growth of Protein Structures
Year
# o
f st
ruct
ures
35,000
2005
Growth of Known Structures in Protein Data Bank
1988
The total number of known protein structures
Newly characterized proteins in that year
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide4
9/25/2006Protein Structures
Protein Structure SpaceProtein Structure Space
http://www.nigms.nih.gov/psi/
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide5
9/25/2006Protein Structures
Structure Space is Described Hierarchically
Structure Space is Described Hierarchically
From SCOP: Structure classification of proteins (http://scop.berkeley.edu/)
ClassFold
Superfamily– Family
» Protein domains
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide6
9/25/2006Protein Structures
SCOP StatisticsSCOP Statistics
Class Number of folds Number of superfamilies
Number of families
All alpha proteins 218 376 608
All beta proteins 144 290 560
Alpha and beta proteins (a/b)
136 222 629
Alpha and beta proteins (a+b)
279 409 717
Multi-domain proteins
46 46 61
Membrane and cell surface proteins
47 88 99
Small proteins 75 108 171
Total 945 1539 2845
25973 PDB Entries (July 2005). 70859 Domains.
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide7
9/25/2006Protein Structures
Amino Acids: Building Blocks of Proteins
Amino Acids: Building Blocks of Proteins
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide8
9/25/2006Protein Structures
20 Naturally-occurring Amino Acids20 Naturally-occurring Amino Acids
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide9
9/25/2006Protein Structures
Protein Secondary StructureProtein Secondary Structure
α Helix
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide10
9/25/2006Protein Structures
Protein Secondary StructureProtein Secondary Structure
β strands
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide11
9/25/2006Protein Structures
Top Level of Structure Space: Structure Classes
Top Level of Structure Space: Structure Classes
There are four major classes:α proteins
β proteins
α + β (anti-parallel β strands)
α / β (parallel β strands).
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide12
9/25/2006Protein Structures
Protein FoldsProtein Folds
Protein fold is the way how secondary structures are organized in a 3D structure.
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide13
9/25/2006Protein Structures
Popular FoldsPopular Folds
The eight most frequent SCOP folds
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide14
9/25/2006Protein Structures
Superfamily and FamilySuperfamily and Family
Proteins within the same superfamily and family will tend to have similar sequence and similar function
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide15
9/25/2006Protein Structures
The Nature of Protein Structure Data
The Nature of Protein Structure Data
The ball-stick model is an element-based structure representation A structure is decomposed into a set of amino acids
Protein geometry, topology, and attributes are defined with respect to the amino acid set
Geometry is the coordinates of amino acids
Topology is the phyisco-chemical interactions of the residues
Attributes are the physico-chemical properties of the residues
….
Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide16
9/25/2006Protein Structures
Grant Challenges: Proteomics
Grant Challenges: Proteomics
IL -3
IL -3R
IG F1
IG F1R
IR S 1
R A S
P I 3-K
A K T /P K B
B A D
B cl-XL
FA S -L
FA S
FA DD/MO R T
FL IC E
IC E
C P P 32
apoptos is
m itogen
C yc lin D1
pR b
E 2F
C yc lin E
P 53
P 21
P 16
P 27
C dk4
P 107
C -Myc
C -Myc
?
B in-1
Max
Max
C dc 25A
Max
Mad
Mad
C dk2p
P 27 C yc lin E
C dk2p
C yc lin E
C dk2 p
C yc lin E
C dk2
c ell pro liferation
Part of the biological system in a cell at the molecular level
Source: http://www.ircs.upenn.edu/modeling2001/,