Structure Modeling and Structure Modeling and Bioimage informaticsBioimage informatics
Unit 26Unit 26
BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for
BiotechnologyBiotechnologyIrene Gabashvili, PhD
Abstracts – Abstracts – approximate approximate guidelinesguidelines
Motivation:Motivation:Why do we careWhy do we care?(importance, difficulty, impact). ?(importance, difficulty, impact).
Problem statement:Problem statement:What What problemproblem are you trying to solve? What is are you trying to solve? What is the the scopescope of your work? of your work?
Approach:Approach:How did you go about solvingHow did you go about solving or making or making progress on the problem? What was the progress on the problem? What was the extent extent of your work? of your work?
Results:Results:What's the answer?What's the answer?
AbstractsAbstracts
Limits: paragraph, ~150-200 words, one Limits: paragraph, ~150-200 words, one double-spaced page… More to include:double-spaced page… More to include:
Numbers Numbers – if possible: How many genes, – if possible: How many genes, SNPs, sequence identity.. xx percent SNPs, sequence identity.. xx percent faster, cheaper, smaller, betterfaster, cheaper, smaller, better
Conclusions: Conclusions: What are the implicationsWhat are the implications? ? Have you found a path to change the Have you found a path to change the world, was it a nice hack, or a road sign world, was it a nice hack, or a road sign indicating that this path is a waste of time indicating that this path is a waste of time (all is useful!). Can you (all is useful!). Can you generalizegeneralize? ?
How will projects be How will projects be graded?graded?
Originality, structure, and scopeOriginality, structure, and scope No copy/paste from the web – but No copy/paste from the web – but
it’s Ok to reference the source - it’s Ok to reference the source - publications & websitespublications & websites
Proteins play key roles in Proteins play key roles in a living systema living system
Three examples of protein Three examples of protein functionsfunctions
Catalysis:Catalysis:Almost all chemical reactions in Almost all chemical reactions in a living cell are catalyzed by a living cell are catalyzed by protein enzymes.protein enzymes.
Transport:Transport:Some proteins transports Some proteins transports various substances, such as various substances, such as oxygen, ions, and so on.oxygen, ions, and so on.
Information transfer:Information transfer:For example, hormones.For example, hormones.
Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones
Haemoglobin carries oxygen
Insulin controls the amount of sugar in the blood
Amino acid: Basic unit Amino acid: Basic unit of proteinof protein
COO-NH3+ C
R
HAn amino
acid
Different side chains, R, determin the properties of 20 amino acids.
Amino group Carboxylic acid group
The DSSP codeThe DSSP code"Dictionary of Protein Secondary Structure""Dictionary of Protein Secondary Structure" G = 3-turn helix (G = 3-turn helix (310 helix). Min length 3 residues. ). Min length 3 residues. H = 4-turn helix (H = 4-turn helix (alpha helix). Min length 4 residues. ). Min length 4 residues. I = 5-turn helix (I = 5-turn helix (pi helix). Min length 5 residues. ). Min length 5 residues. T = hydrogen bonded turn (3, 4 or 5 turn) T = hydrogen bonded turn (3, 4 or 5 turn) E = E = beta sheet in parallel and/or anti-parallel sheet in parallel and/or anti-parallel sheet
conformation (extended strand). Min length 2 residues. conformation (extended strand). Min length 2 residues. B = residue in isolated beta-bridge (single pair beta-B = residue in isolated beta-bridge (single pair beta-
sheet hydrogen bond formation) sheet hydrogen bond formation) S = bend (the only non-hydrogen-bond based S = bend (the only non-hydrogen-bond based
assignment) assignment)
Protein structureProtein structure
Primary structure Primary structure (Amino acid sequence)(Amino acid sequence)↓↓
Secondary structureSecondary structure (( αα-helix, -helix, ββ-sheet-sheet ))↓↓
Tertiary structure Tertiary structure (( Three-dimensional Three-dimensional structure formed by assembly of secondary structure formed by assembly of secondary
structuresstructures ))↓↓
Quaternary structure Quaternary structure (( Structure formed by Structure formed by more than one polypeptide chainmore than one polypeptide chain ))
20 20 Amino acidsAmino acids
Glycine (G)
Glutamic acid (E)Asparatic acid (D)
Methionine (M)
Threonine (T)
Serine (S)
Glutamine (Q)
Asparagine (N)
Tryptophan (W)Phenylalanine (F)
Cysteine (C)
Proline (P)
Leucine (L)Isoleucine (I)Valine (V)
Alanine (A)
Histidine (H)Lysine (K)
Tyrosine (Y)
Arginine (R)
Yellow: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic
Proteins are linear Proteins are linear polymers of amino acidspolymers of amino acids
R1
NH3+ C CO
H
R2
NH C CO
H
R3
NH C CO
H
R2
NH3+ C COO ー
H
+
R1
NH3+ C COO ー
H
+
H2OH2O
Peptide bond
Peptide bond
The amino acid sequence is called
as primary structure A AF
NGG
S TS
DK
A carboxylic acid condenses with an amino group with the release of a water
Amino acid sequence is Amino acid sequence is encoded by DNA base encoded by DNA base sequence in a genesequence in a gene・
CGCGAATTCGCG・
・GCGCTTAAGCGC・
DNA molecule
=
DNA base sequence
Amino acid sequence is Amino acid sequence is encoded by DNA base encoded by DNA base sequence in a genesequence in a geneSecond letter
T C A G
First le
tter
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
T
Th
ird le
tter
TTC TCC TAC TGC CTTA
LeuTCA TAA
StopTGA Stop A
TTG TCG TAG TGG Trp G
C
CTT
Leu
CCT
Pro
CATHis
CGT
Arg
TCTC CCC CAC CGC CCTA CCA CAA
GlnCGA A
CTG CCG CAG CGG G
A
ATTIle
ACT
Thr
AATAsn
AGTSer
TATC ACC AAC AGC CATA ACA AAA
LysAGA
ArgA
ATG Met ACG AAG AGG G
G
GTT
Val
GCT
Ala
GATAsp
GGT
Gly
TGTC GCC GAC GGC CGTA GCA GAA
GluGGA A
GTG GCG GAG GGG G
Gene is protein’s Gene is protein’s blueprint, genome is blueprint, genome is
life’s blueprint life’s blueprint
Gene
GenomeDNA
Protein
Gene GeneGene
Gene
GeneGeneGeneGene
GeneGeneGeneGene
GeneGene
Protein Protein
ProteinProtein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
ProteinProtein
Protein
Gene is protein’s Gene is protein’s blueprint, genome is blueprint, genome is
life’s blueprint life’s blueprint Genome
Gene GeneGene
Gene
GeneGeneGeneGene
GeneGeneGeneGene
GeneGene
Protein Protein
ProteinProtein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
ProteinProtein
Protein
Glycolysis network
Each Protein has a Each Protein has a unique structureunique structure
Amino acid sequence
NLKTEWPELVGKSVEEAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAE
VPRVGFolding!
Basic structural units of Basic structural units of proteins: Secondary proteins: Secondary
structurestructureα-helix β-sheet
Secondary structures, α-helix and β-sheet, have regular hydrogen-bonding patterns.
Three-dimensional Three-dimensional structure of proteinsstructure of proteins
Tertiary structure
Quaternary structure
Close relationship Close relationship between protein structure between protein structure
and its functionand its function
enzyme
A
B
A
Binding to A
Digestion of A!
enzyme
Matching the shape to A
Hormone receptor AntibodyExample of enzyme reaction
enzyme
substrates
More LinksMore Links
BLOCKS: http://blocks.fhcrc.org/ www.sbc.su.se/~miklos/DAS www.pdg.cnb.uam.es/EUCLID/Full_Paper
/homepage.html Eva: Cubic.bioc.columbia.edu/evaEva: Cubic.bioc.columbia.edu/eva Jpred: Jpred: www.compbio.dundee.ac.uk/~www-www.compbio.dundee.ac.uk/~www-
jpredjpred// LOC3D: LOC3D:
cubic.bioc.columbia.edu/db/LOC3Dcubic.bioc.columbia.edu/db/LOC3D Pfam: Pfam: http://www.sanger.ac.uk/Software/Pfam/http://www.sanger.ac.uk/Software/Pfam/
More LinksMore Links PredictProtein PredictProtein www.predictprotein.org ProfTMB: ProfTMB: http://www.predictprotein.org/cgi-bin/var/bigelow/proftmb/queryhttp://www.predictprotein.org/cgi-bin/var/bigelow/proftmb/query
PROSITE: http://expasy.org/prosite/PROSITE: http://expasy.org/prosite/ ProtFun: ProtFun: http://www.cbs.dtu.dk/services/ProtFun/http://www.cbs.dtu.dk/services/ProtFun/ PSIPRED: PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/http://bioinf.cs.ucl.ac.uk/psipred/
PSORT: http://psort.nibb.ac.jp/PSORT: http://psort.nibb.ac.jp/ SAM-T99 - discontinuedSAM-T99 - discontinued SOSUI: SOSUI: http://bp.nuap.nagoya-u.ac.jp/sosui/sosui_submit.htmlhttp://bp.nuap.nagoya-u.ac.jp/sosui/sosui_submit.html
TargetP: TargetP: http://www.cbs.dtu.dk/services/TargetP/http://www.cbs.dtu.dk/services/TargetP/
DatabasesDatabases
PDB: www.rcsb.org/ PDB: www.rcsb.org/ MSD: http://www.ebi.ac.uk/msd/MSD: http://www.ebi.ac.uk/msd/ MMDB: MMDB:
http://www.ncbi.nlm.nih.gov/Structure/MMDBhttp://www.ncbi.nlm.nih.gov/Structure/MMDB
PDBSum: www.ebi.ac.uk/PDBSum: www.ebi.ac.uk/pdbsumpdbsum// TargetDB: TargetDB: targetdbtargetdb.pdb.org/ .pdb.org/
PDBsumPDBsum
provides an at-a-glance overview of provides an at-a-glance overview of every macromolecular structure every macromolecular structure deposited in the Protein Data Bank deposited in the Protein Data Bank (PDB), giving schematic diagrams of (PDB), giving schematic diagrams of the molecules in each structure and the molecules in each structure and of the interactions between them. of the interactions between them.
http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/
GetPage.plGetPage.pl
More linksMore links
AbCheck - Antibody Sequence Test AbCheck - Antibody Sequence Test http://www.bioinf.org.uk/abs/seqtest.ht
ml Atlas of protein Side chain interactionsAtlas of protein Side chain interactions http://www.biochem.ucl.ac.uk/bsm/sid
echains/index.html# The beta-turn prediction server:The beta-turn prediction server: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/
btpred/index.htmlbtpred/index.html
More linksMore links
CATH – protein structure CATH – protein structure classification:classification:
http://www.cathdb.info/latest/http://www.cathdb.info/latest/index.htmlindex.html
Protein Ligand Interactions:Protein Ligand Interactions: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/
proLig/proLig/
More linksMore links
DB Browser, including protein DB Browser, including protein sequence/structure DBssequence/structure DBs
http://www.bioinf.man.ac.uk/dbbrowser/http://www.bioinf.man.ac.uk/dbbrowser/ Dictionary of Homologous Dictionary of Homologous
superfamilies:superfamilies: http://www.biochem.ucl.ac.uk/bsm/dhs/http://www.biochem.ucl.ac.uk/bsm/dhs/ PROCAT – a DB of 3D enzyme active site PROCAT – a DB of 3D enzyme active site
templates:templates: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/
PROCAT/PROCAT.htmlPROCAT/PROCAT.html
More linksMore links
DOMPLOT – annotation by ligands:DOMPLOT – annotation by ligands: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/
domplot/domplot/ Enzymes Structure database:Enzymes Structure database: http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/
enzymes/index.htmlenzymes/index.html Gene3DGene3D http://gene3d.biochem.ucl.ac.uk/http://gene3d.biochem.ucl.ac.uk/
Gene3D/Gene3D/
More linksMore links
The Scorecons Server The Scorecons Server (scores (scores residue conservation in a multiple residue conservation in a multiple sequence alignment)sequence alignment)
http://www.ebi.ac.uk/thornton-srv/http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/valdar/databases/cgi-bin/valdar/scorecons_server.plscorecons_server.pl
3D enzyme active site 3D enzyme active site templatestemplates
PROCAT: PROCAT: http://www.biochem.ucl.ac.uk/bsm/Phttp://www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.htmlROCAT/PROCAT.html
PROCAT has now been PROCAT has now been superseded by the Catalytic Site superseded by the Catalytic Site Atlas: Atlas: http://www.ebi.ac.uk/thornton-http://www.ebi.ac.uk/thornton-srv/databases/CSA/srv/databases/CSA/
More LinksMore Links
Protein Nucleic Acid interaction ServerProtein Nucleic Acid interaction Server http://www.biochem.ucl.ac.uk/bsm/DNA/http://www.biochem.ucl.ac.uk/bsm/DNA/
server/server/ Protein DNA interaction, taxProtein DNA interaction, tax http://www.biochem.ucl.ac.uk/bsm/http://www.biochem.ucl.ac.uk/bsm/
prot_dna/prot_dna.htmlprot_dna/prot_dna.html SAS (SAS (Sequences Annotated by Sequences Annotated by
StructureStructure)) http://www.ebi.ac.uk/thornton-srv/http://www.ebi.ac.uk/thornton-srv/
databases/sas/databases/sas/
More LinksMore Links
NACCESS – calculates residue NACCESS – calculates residue accessibilitiesaccessibilities
http://www.bioinf.manchester.ac.uk/http://www.bioinf.manchester.ac.uk/naccess/naccess/
The The SURFNETSURFNET program generates program generates surfacessurfaces and and void regionsvoid regions between between surfaces from coordinate data supplied in surfaces from coordinate data supplied in a a PDBPDB file file
http://www.biochem.ucl.ac.uk/~roman/http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.htmlsurfnet/surfnet.html
PredictionPrediction
Homology Modeling: >30%Homology Modeling: >30% Threading – picks up where Threading – picks up where
homology leaves offhomology leaves off Ab initio structure predictionAb initio structure prediction
ValidationValidation
DSSPDSSP PROCHEK: PROCHEK:
http://www.biochem.ucl.ac.uk/~romhttp://www.biochem.ucl.ac.uk/~roman/procheck/procheck.htmlan/procheck/procheck.html
VADARVADAR Verify3D: Verify3D:
http://nihserver.mbi.ucla.edu/Verify_3D/http://nihserver.mbi.ucla.edu/Verify_3D/
VisualizationVisualization
Cn3DCn3D UCSF Chimera (MidasPlus)UCSF Chimera (MidasPlus) Rasmol Rasmol ProteinExplorer ProteinExplorer
BioimagingBioimaging
NIH sites for image processing software:NIH sites for image processing software:http://www.cc.nih.gov/cip/visualization/vis_packages.htmlhttp://www.cc.nih.gov/cip/visualization/vis_packages.html
NIH IMAGENIH IMAGE
http://rsb.info.nih.gov/nih-image/http://rsb.info.nih.gov/nih-image/ Spider & Web: Spider & Web:
http://www.wadsworth.org/spider_doc/spider/docs/spider.htmlhttp://www.wadsworth.org/spider_doc/spider/docs/spider.html
EMAN : EMAN : http://blake.bcm.tmc.edu/eman/eman1/http://blake.bcm.tmc.edu/eman/eman1/
DICOMDICOM
The Digital Imaging and The Digital Imaging and Communications in Medicine standardCommunications in Medicine standard
For all medical imaging modalities, For all medical imaging modalities, such as CT scans, MRIs, and such as CT scans, MRIs, and ultrasound. ultrasound.
All image files which are compliant All image files which are compliant with Part 10 of the DICOM standard with Part 10 of the DICOM standard (available in DocSharing) are DICOM (available in DocSharing) are DICOM format filesformat files
Humans Animal models
Mutant GeneMutant Gene
Mutant or Mutant or missing Proteinmissing Protein
Mutant Phenotype Mutant Phenotype
(disease)(disease)
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease model)
Disease models
Top Related