Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

61
Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

description

Areas of Interest Bioinformatics –Sequences –Alignments Mass Spectrometry –De novo sequencing –Pattern matching Annotation –Integration –Automatic assessments General Automation and Productivity

Transcript of Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Page 1: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Applied Bioinformatics

Dr. Jens Allmer

Week 1 (Introduction)

Page 2: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Your Instructor

• Education– BSc: University of Münster 1996– MSc: University of Münster 2002– PhD: University of Münster 2006

• Worked at – Izmir Institute of Technology (since 2008)– Izmir University of Economics, Turkey (Feb 2007 – Aug 2008)– University of Muenster, Germany (Jan 2006 – Feb 2007)– University of Pennsylvania, USA (Jan 2004 – Dec 2005)– University of Jena, Germany (Nov 2002 – Dec 2003)

Page 3: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Areas of Interest

• Bioinformatics– Sequences– Alignments

• Mass Spectrometry– De novo sequencing– Pattern matching

• Annotation– Integration– Automatic assessments

• General Automation and Productivity

Page 4: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Course Rules

• Attendance– Is essential and will be monitored strictly– if(absence > 12h) Then NA;

• Make-up Work– None

Page 5: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Course Rules

• Lecture starts on time– if late enter QUIETLY– if more then 5 min late DO NOT ENTER wait for break

• Breaks are 10 min max– if late after break enter QUIETLY– if more then 5 min late DO NOT ENTER wait for next break

• Early leave– Announce before course and leave if granted

Page 6: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Course Rules

• Project– Parts to be performed published on the website and/or as slides– Deadline 6pm on the day before the next class

(you may submit early of course)– No extention– No make-up– No extra work

• Must be electronicly submitted to: [email protected]– Must be named ????_first_last.eee or will not be accepted– Formats include: doc, ppt, odx, txt, html, ...– Not allowed are formats that may not be edited by me like

pdf, and similar formats that are not widespread– Must be significantly different from your classmates– Otherwise everyone involved will obtain zero for that assignment

Page 7: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Grading

• All information available on class website

• Grading individualized– Quizzes 15%– Mind Maps 10%– Midterm 1 25%– Midterm 2 25%– Project 25%

Page 8: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Project

• Group Formation 0% (08.10. 18:00)– Group Size: 4

• First Draft 25% (22.10. 18:00)• Results 15% (19.11. 18:00)• Second Draft 20% (03.12. 18:00)• Presentation 10% (25.12. 18:00)• Final Version 25% (31.12. 18:00)

Page 9: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Grading

• I am responsible to evaluate you– I am not responsible to pass everyone or give great grades

• Make it easy for me1. Show up and participate2. Do homeworks and pre-course preparations3. Midterm and Final will be easy for you if you adhere to 1. and 2.

Page 10: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Course Structure

– Start– 10 min quiz– 35 min lecture– 5 min mind mapping– 10 min break– 50 min practice– 10 min break– 40-50 min lecture– 10 min break– 30 min practice

Page 11: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Textbooks

Primary audienceJunior bio majors

Course home page:http://www.biolnk.com/habf

ISBN: 978-605-133-297-0

http://www.idefix.com/kitap/biyoenformatik-1-dizi-kiyaslamalari-jens-allmer/tanim.asp?sid=GUFFOI44R7FJ9CIR6STU

Page 12: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Textbooks

Everything you currentlyneed to know about AppliedBioinformatics in regard topractical problems you willencounter during everydayresearch.

Page 13: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

MathematicsStatistics

Computer ScienceInformatics

BiologyMolecular biology

Medicine

Chemistry

Physics

Bioinformatics

Bioinformatics

Page 14: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Bioinformatics is Multidisciplinary

ComputerScience

Math

Statistics

StructuralBiology

Phylogenetics

Drug Design

Genomics

MolecularLife Sciences

Page 15: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

The Pyramid of Life (2000)

30,000 Genes30,000 Genes

33,000 ,000 EnzymesEnzymes

1400 Chemicals

Metabolomics

Proteomics

Genomics

B I

O I

N F

O R

M A

T I

C S

Page 16: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

The Pyramid of Life

10100,000 0,000 ProteinsProteins

330,000 0,000 GenesGenes

1400 Chemicals

Protein Interactions?Protein Interactions?

Page 17: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Bioinformatics (or Computational Biology)

• Not just the study of DNA or protein sequence data

• Inclusive definition – concerns the storage, display, reduction, management, analysis, extraction, simulation, modeling, fitting or prediction of biological, medical or pharmaceutical data

Page 18: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Basis of molecular life sciences

• Hierarchy of relationships (some exceptions):

Genome

Gene 1 Gene 3Gene 2 Gene X

Protein 1 Protein 2 Protein 3 Protein X

Function 1 Function 2 Function 3 Function X

Page 19: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

How can one use bioinformatics to link diseases to genes?

• Positional cloning of genes1. Find genetic markers

associated with disease2. Sequence DNA next to

the markers3. Compare DNA from

afflicted individuals to DNA of normal individuals (database)

4. Find abnormalities5. Predict gene function

from sequence information

Disease

Map

Gene

Function

Page 20: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Bioinformatics in the old days

• Close to Molecular Biology: – (Statistical) analysis of protein and nucleotide structure– Protein folding problem– Protein-protein and protein-nucleotide interaction

• Many essential methods were created early on– Protein sequence analysis (pairwise and multiple alignment)– Protein structure prediction (secondary, tertiary structure)

• Evolution was studied and methods created– Phylogenetic reconstruction (clustering – e.g., Neighbor

Joining (NJ) method)

– Nowadays also part of Datamining

Page 21: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

But then the big bang….

Page 22: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

The Human Genome - 26 June 2000

Dr. Craig Venter

Celera Genomics

-- Shotgun method

Francis Collins (USA)/Sir John Sulston (UK)

Human Genome Project

Page 23: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Human DNA

• There are at least 3bn (3 109) nucleotides in the nucleus of almost all of the trillions (3.2 1012 ) of cells of a human body (an exception is, for example, red blood cells which have no nucleus and therefore no DNA) – a total of ~1022

nucleotides!• Many DNA regions code for proteins, and are called genes (1

gene codes for 1 protein as a base rule, but the reality is a lot more complicated) – Name examples

• Human DNA may contain ~27,000 expressed genes – Problems?

• Deoxyribonucleic acid (DNA) comprises 4 different types of nucleotides: adenine (A), thiamine (T), cytosine (C) and guanine (G). These nucleotides are sometimes also called bases – Ambiguities?

Page 24: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Y-Chromosome

• 50% of the sequence consists of NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

• Not very meaningful– Explanation .... Same as in x chromosome– What about the N’s in chr 1?

Page 25: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Human DNA (Cont.)

• All people are different• but the DNA of different people only varies for

0.2% or less • So, only up to 2 letters in 1000 are expected to be

different. • Evidence in current genomics studies (Single

Nucleotide Polymorphisms or SNPs) imply that • on average only 1 letter out of 1400 is different

between individuals. • Over the whole genome, this means that 2 to 3

million letters would differ between individuals.

Page 26: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Modern bioinformatics is closely associated with genomics

• The aim is to solve the genomics information problem

• Ultimately, this should lead to biological understanding how all parts fit (DNA, RNA, proteins, metabolites) and how they interact (gene regulation, gene expression, protein interaction, metabolic pathways, protein signaling, etc.)

Page 27: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)

Genome

Expressome

Proteome

Metabolome

Functional GenomicsFunctional GenomicsFrom gene to functionFrom gene to function

Interactome?

Page 28: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Unknown Function

How much of the genome is defined?

Page 29: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

What is bioinformatics?

• E.g. Process the spots on a microarray, determine which genes are differentially expressed, link spots to sequence via a database, analyze the sequence using predictive tools, link the genes to related genes to form a network

Comp sci

Bio

Math

Stats

• Machine learning• Database systems• Data mining• Image processing• Modeling• Graph theory• Statistical analysis• Sequence• Structure• Interactions• Regulation• Genomes• Evolution

Physics English

Bioinformatics

Chem

Page 30: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

What is a bioinformatician?

• Somebody who knows everything

Page 31: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

What is a bioinformatician?

• A facilitatorfacilitator– Typically has background in biology or CS, but is comfortable

with concepts from other disciplines – Bring together ideas (or researchers) from different domains to

solve a biological problem• Conceptualize the problem

– Use language appropriate to the domain• Identify potential solutions

– Understanding of different fields helps to identify possible approaches at a broad level

• Guide the development process– Create in-house or find potential collaborators to work on

approaches in-depth• Integrate results into overall solution

– Software/method, results of biological analysis

Page 32: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

How is Bioinformatics Used?

Experimental proof is still the “Gold Standard”.

Bioinformatics isn’t going to replace lab work anytime soon

Bioinformatics is used to help “focus”the scientist on the bench top experiments

Page 33: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Bioinformatics

• Is application of computational tools in Biology Bioinformatics?

• Not really!

• In this course we will however only go into algorithmic details rarely (like today ;)

Page 34: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)
Page 35: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Mind Mapping

• Have you ever studied a subject or brainstormed an idea, only to find yourself with pages of information, but no clear view of how pieces fit together?

Mind mapping– Learn more effectively– Improves memorization– Enhances creativity– Speeds up analyses– Gives structure to complex ideas– Records information for future use

Source: http://www.mindtools.com/pages/article/newISS_01.htm

Page 36: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

An Example Mind Map for MicroRNAs

Page 37: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

How to Mind Map

1. Identify the central topic write in center

2. Write major parts of the topic on lines in all directions

3. Repeat 2. with ever finer level of detail until satisfied

Source: http://www.mindtools.com/pages/article/newISS_01.htm

Page 38: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Note Taking with Mind Maps

• Capture ideas organized into topics– What if the central topic which I chose is not the central topic?– Make a new mind map which captures the topic correctly

• Uses Cases– Note taking in class– Recapitulization after lecture– Analysis of a new topic– Structuring of any intended writing

• When– During acquisition of new knowledge (faster than writing)– For review 5m, 1h, 6h, 1d, 7d, 1m after note taking

Page 39: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Mind Mapping Tips

1. Use single words or very short phrases

2. Write clearly and readable

3. Use color!

4. Seperate ideas (color, lines, shading)

5. Draw symbols and images

6. Draw links among elements

Page 40: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

A More Elaborate Mind Map

Source: http://www.mindtools.com/pages/article/newISS_01.htm

Page 41: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

At the Heart of Bioinformatics

>scaffold_1152GGTGCGGCCGTCCTCCAGCTGCTTGCCGGCGAAGATCAGGCGCTGCTGGTCCGGGGGGATGCCTGCATCCGGTGAGGAAACGCTCGTGTCAGACAAAGTGGGTGGGCGCAGGAAGCAGCAATCAACACAGCCCAGTGCAGCTGCAAAGCGCCCGCCTTACCACTGACCCGCCTGGCCACCCACCCCTACCCCCCGTAAGGAAAGAGCCCCGACTCACCCTCCTTGTCCTGAATCTTGGCCTTCACGTTCTCAATGGTGTCCGAAGACTCCACCTCGAGCGTGATGGTCTTGCCCGTCAGGGTCTTGACGAAGATCTGCATGCCACCGCGCAGGCGCAGCACCAGGTGCAG

Genomic

>RF1_scaffold_1152GAAVLQLLAGEDQALLVRGDACIR$GNARVRQSGWAQEAAINTAQCSCKAPALPLTRLATHPYPP$GKSPDSPSLS$ILARDVAHDFAKSSPR$YAPLIPQNLRC$SIEMKQPASLLSPIGEGACASHLQCLEKCLLP$GAIVYMIS$GSGRR$TSWVGIGGCNDGTEKRSEVDSRRGGKGNIHD>RF2_scaffold_1152VRPSSSCLPAKIRRCWSGGMPASGEETLVS AATAAKPQTWSPTAWEFKVGGRRKQQSTQPSAAAKRPPYH$PAWPPTPTPRKERAPTHPPCPESW SRSQWCPKTPPRA$WSCPSGS$RRSACHRAGAAPGAGSTPSGCCSQPGCGRPPAACRRRSGAAGPGGCLCVGGGGEGACASHLQCLEGE

Translated

Page 42: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Your Task

You may only compare 1 character at a time

You may create helpful structures

You should find the location of the pattern in the Sequence with a minimal number of comparisons

Try it for yourself

ACGGTAGTATGTGATGTATGATCGCGAAAGAGG

TGATGT

Sequence

Pattern

Your Task

You may only compare 1 character at a time

You may create helpful structures

You should find the location of the Pattern in the Sequence with a minimal number of comparisons

Page 43: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 1

Page 44: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 2

Page 45: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 3

Page 46: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 4

Page 47: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 6

Page 48: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 7-16

Page 49: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Brute Force Approach

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 17-22

Page 50: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Boyer-Moore Algorithm

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 1

•Preprocessing•Good suffix matrix (m+1)•Bad character matrix (m+1)

Page 51: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Boyer-Moore Algorithm

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 2

Page 52: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Boyer-Moore Algorithm

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 3-7

Page 53: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Boyer-Moore Algorithm

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 8

Page 54: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Boyer-Moore Algorithm

ACGGTAGTATGTGATGTATGATCGCGAAAGAGGTGATGT

Comparisons: 9-15

Page 55: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Questions

Page 56: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Define Algorithm

Page 57: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)
Page 58: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Website

• http://mbg305.allmer.de

• Slides• Homework• Additional materials and challenges

• Grades

Page 59: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Website

• To see your grades you need to login• Some material may need login as well

• Currently– UserID = StudentID– Password = StudentID

• Change now– UserID = working email address– Password = whatever you will remember

Page 60: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Login to mbg305.allmer.de

• We will now assist you to log in and to add your email address and change your password.

Page 61: Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Assignments

– Research about Mind Maps• E.g.: http://en.wikipedia.org/wiki/Mind-map• IYTE library

– Make sure to read the lecture notes for next week (Available online on Wednesday)

– Read Chapters 1 and 2 from our textbook