Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135...

12
Lecture 1 BNFO 136 Usman Roshan

Transcript of Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135...

Page 1: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Lecture 1

BNFO 136Usman Roshan

Page 2: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Course overview• Pre-req: BNFO 135 or approval of instructor• Python progamming language

– Some unix basics– Input/output, lists, dictionaries, loops, if-else, counting

• Sequence analysis– Comparison of protein and DNA sequences– Scoring a sequence alignment– Picking the best alignment from a set– Computing a sequence alignment– Finding the most similar sequence to a query

• Phylogeny reconstruction– UPGMA and neighbor joining algorithms

Page 3: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Overview (contd)• Grade: 15% programming assignments, two 25% mid-

terms and 35% final exam• Recommended Texts:

– Introduction to Bioinformatics by Arthur M. Lesk– Python Scripting for Computational Science by Hans Petter Langtangen

Page 4: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Nothing in biology makes sense,except in the light of evolution

AAGACTT -3 mil yrs

-2 mil yrs

-1 mil yrs

today

AAGACTT

T_GACTTAAGGCTT

_GGGCTT TAGACCTT A_CACTT

ACCTT (Cat)

ACACTTC (Lion)

TAGCCCTTA (Monkey)

TAGGCCTT (Human)

GGCTT(Mouse)

T_GACTTAAGGCTT

AAGACTT

_GGGCTT TAGACCTT A_CACTT

AAGGCTT T_GACTT

AAGACTT

TAGGCCTT (Human)

TAGCCCTTA (Monkey)

A_C_CTT (Cat)

A_CACTTC (Lion)

_G_GCTT (Mouse)

_GGGCTT TAGACCTT A_CACTT

AAGGCTT T_GACTT

AAGACTT

Page 5: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Representing DNA in a formatmanipulatable by computers

• DNA is a double-helix molecule madeup of four nucleotides:– Adenosine (A)– Cytosine (C)– Thymine (T)– Guanine (G)

• Since A (adenosine) always pairswith T (thymine) and C (cytosine)always pairs with G (guanine)knowing only one side of the ladder isenough

• We represent DNA as a sequence ofletters where each letter could beA,C,G, or T.

• For example, for the helix shown herewe would represent this as CAGT.

Page 6: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Transcription and translation

Page 7: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Amino acids

Proteins are chains ofamino acids. There aretwenty different aminoacids that chain indifferent ways to formdifferent proteins.

For example,FLLVALCCRFGH (this is how we could storeit in a file)

This sequence of aminoacids folds to form a 3-Dstructure

Page 8: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Proteinstructure

• Primary structure: sequence ofamino acids.• Secondary structure: parts of thechain organizes itself into alphahelices, beta sheets, and coils. Helicesand sheets are usually evolutionarilyconserved and can aid sequencealignment.• Tertiary structure: 3-D structure ofentire chain• Quaternary structure: Complex ofseveral chains

Page 9: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Getting started

• FASTA format:

>humanACAGTAT>mouseACGTA>catAGGTGAAA

Page 10: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Python basics

• Basic types:– Scalar: number or a string,– Lists: collection of objects– Dictionaries: collection of (key,value) pairs

• Reading sequences from a file into a datastructure

• Basic if-else conditionals and for loops

Page 11: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Some simple programs

• How many sequences in a FASTA file?• What is the length and name of the

longest and shortest sequence?• What is the average sequence length?• Verify that input contains DNA sequences.• Compute the reverse complement of a

DNA sequence

Page 12: Lecture 1 - web.njit.edu · Lecture 1 BNFO 136 Usman Roshan. Course overview •Pre-req: BNFO 135 or approval of instructor •Python progamming language –Some unix basics –Input/output,

Two dimensional lists

• Represented as list of lists• For example the matrix23 4 112 5 121 6 20

would be stored as[ [23, 4, 1], [12, 5, 12], [1, 6, 20] ]