Computational Biology Dr. Jens Allmer Lecture Slides Week 5.

Post on 29-Dec-2015

218 views 0 download

Tags:

Transcript of Computational Biology Dr. Jens Allmer Lecture Slides Week 5.

Computational Biology

Dr. Jens Allmer

Lecture Slides Week 5

MakeDB

• Example– makeblastdb -in seq.fasta -dbtype prot -out seqBl –title

seqBlastDB

• More information?– Go to the doc folder of BLAST– Documentation is there– http://www.ncbi.nlm.nih.gov/books/NBK1763/

BLAST

• Now that we have an indexed database try to run BLAST

• Read documentation and try to solve the simplest case– You will need the indexed database and you will need a FASTA

file as query– You could create queries from the database and slightly change

them

• Good luck

OMSSA

• Unzip folder and check– Alternatively, download from NCBI

• MS/MS mgf file• Database file as FASTA• makeblastdb.exe• omssacl.exe• usermods.xml

OMSSA

Before running OMSSA, database file must be converted to BLAST-like format.

So let’s run makeblastdb.exe to create a hash-indexed database

OMSSA

Here 2 different settings are used.First one is with 0.05 product ion toleranceSecond one is with default product ion toleranceFor variable modifications (-mv) check usermods.xml

X!Tandem

• Unzip folder and check

• Mgf formated spectra (file)• Database file (FASTA)• tandem-win32-10-12-01-1 folder• Used .xml configuration files (default_input.xml, input.xml

and taxonomy.xml)• To get the same output given in zip folder;

– Replace configuration files in «tandem-win\bin» folder with ones in «used» folder.

– Also copy database file to «fasta» folder and .mgf file to «bin» in «tandem-win»

X!Tandem Console Application

MBG404 Overview

Data

Generation

Processing

Storage

Mining

Pipelining

X!Tandem Default Input

Parameters such as mass tolerances, enzyme type, number of charged for search can be reset in default_input.xml

X!Tandem Input.xml

In input.xml file, you should specify path of:• taxonomy.xml • default_input.xml • Spectra filename • Output filenameNOTE: Here input.xml and all files above are in same folder(directory))

X!Tandem Taxonomy

In taxonomy file, you should specify «database file path». In this example, database file is in «fasta» folder in «Xtandem\tandem-win32-10-12-01-1» folder.

X!Tandem Output

Console Applications

Why

HTML

• What you need to know about hyper text markup language

• How to reach to it– Right click the document in your browser– Make sure you do not click on an image, link or some other non

HTML element– Choose View Source or View Page Source.

• What’s in the source

• Sometimes things are not visible/ accessible on the web page but can be retrieved from the source

HTML Structure

<HTML>

<HEAD>

<TITLE>Page title seen in the title bar</TITLE>

<!-- Some other links and scripts can be here -->

</HEAD>

<BODY>

Text and other visible elements go here

</BODY>

</HTML>

HTML Input

<FORM action=“destination” method=“POST/GET”>

<INPUT type=“TYPES” name=“” id=“” value=“” />

<TEXTAREA name=“” id=“”>value</TEXTAREA>

<SELECT name=“” id=“”>

<OPTION value=“”>display</OPTION>

</SELECT>

</FORM>

TYPES: { text, password, checkbox, radio, submit, reset, file, hidden, image, button}

Why?

• Why do you need this information?

• Some information may be inaccessible on the website • In the HTML code it will be accessible

• Sometimes you may be interested in all settings for the programs that you used online

• Often these settings are in hidden input fields (you need to check the source then)

NCBI Blast

• Contains many hidden variables here are some:

Theory I

MBG404 Overview

Data

Generation

Processing

Storage

Mining

Pipelining

Database Management Systems

Database Management Systems

Company/ Organization

DatabaseSize (GB)

DBMS SystemArch.

DBMSVendor

SystemVendor

StorageVendor

France Telecom 29,232 Oracle SMP Oracle HP HP

AT&T 26,269 Daytona SMP AT&T Sun Sun

SBC 24,805 Teradata MPP Teradata NCR LSI

Anonymous 16,191DB2 forUnix

MPP/ Cluster IBM IBM IBM

Amazon.com 13,001 Oracle SMP Oracle HP HP

Kmart 12,592 Teradata MPP Teradata NCR LSI

Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi

Health Insurance Review Agency 11,942 Sybase IQ Cluster Sybase HP Hitachi

FedEx Services 9,981 Teradata MPP Teradata NCR EMC

Vodafone D2 GmbH 9,108 Teradata MPP Teradata NCR LSI

Database Management Systems

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Users

Database Management Systems

A Relation is a Table

Attributes(columnheaders)

Tuples(rows)

Contains data -> InstanceDomain

All possible values

name manf

WinterbrewBud Lite

Pete’sAnheuser-Busch

Beers

Schemas

• Relation schema = relation name and attribute list.– Optionally: types of attributes.– Example: Beers(name, manf) or Beers(name: string, manf:

string)• Database = collection of relations.• Database schema = set of all relation schemas in the

database.• Instance of a relation = a table in a database with data

Anomalies

• Goal of relational schema design is to avoid anomalies and redundancy.– Update anomaly : one occurrence of a fact is changed, but not

all occurrences.– Deletion anomaly : valid fact is lost when a tuple is deleted.

Example of Bad Design

Drinkers(name, addr, beersLiked, manf, favBeer)

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ???’s can be easily figured out.

This Bad Design AlsoExhibits Anomalies

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud

• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.

1st Normal Form

All attributes need to be atomic

2nd Normal FormMust be in 1st NFa key must uniquely identify each tuple

3rd Normal Form

Must be in 2nd NFattributes not part of a key must directly depend on one of the keys

One-One Relationships

• In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.

• Example: Relationship Best-seller between entity sets Manfs (manufacturer) and Beers.– A beer cannot be made by more than one manufacturer, and no

manufacturer can have more than one best-seller (assume no ties).

Many-One Relationships

• Some binary relationships are many-one from one entity set to another.

• Each entity of the first set is connected to at most one entity of the second set.

• But an entity of the second set can be connected to zero, one, or many entities of the first set.

Many-Many Relationships

• Focus: binary relationships, such as Sells between Bars and Beers.

• In a many-many relationship, an entity of either set can be connected to many entities of the other set.– E.g., a bar sells many beers; a beer is sold by many bars.

End Theory I

• 5 min mindmapping• 10 min break

Practice I

MS Access

• Create new Tables:– Plant– Features– FeatureTypes

Create a Table

Create a Table

Edit a Table

Create the Three Tables

• Plant• Features• FeatureTypes

Add Attributes

• Plant– ID– Gender– Species– Strain– Clone

Add Attributes

• Features– ID– FeatureType– Value

Add Attributes

• Features– ID– Type– Unit

Table Space

Notice

More Editing

More Editing

Notice

Fill with Data

• Import the data in the plants.csv file

Select Appropriate table

Some adjustments Are needed here

Need to name theColumns

appropriately

Insert Data

• Import Feature table• Import features txt file

Real Data

• Download GO Terms:– http://

archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz

• Change file extention to .xml so that Access can import• Import file into Access

– May take a short while– Errors will occur (we ignore them for now)

• Have a look at the tables• Analyze the relationships (were they imported?)

End