Bioinformatički centri i baze podataka - University of...

Post on 06-Feb-2018

220 views 6 download

Transcript of Bioinformatički centri i baze podataka - University of...

1

27 February 2011 zeljko.jericevic@riteh.hr

Bioinformatički centri i baze podataka

Željko Jeričević, Ph.D.

www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

zeljko.jericevic@riteh.hr

27 February 2011 zeljko.jericevic@riteh.hr 2

What Is Bioinformatics?“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. …”from NCBI web site

2

27 February 2011 zeljko.jericevic@riteh.hr 3

Što je bioinformatika?“Bioinformatika je multidisciplinarna znanostu kojoj se biologija, računalna znanost i informacijska tehnologija sjedinjuju u jednudisciplinu. Konačni cilj je omogućiti otkrićanovih bioloških uvida i stvoriti globalnuperspektivu iz koje se mogu razaznati biološkiprincipi objedinjena (unifying principles).”

(prevod definicije s NCBI web stranice)

27 February 2011 zeljko.jericevic@riteh.hr 4

Ukratko1. Zašto i što su baze podataka

2. Što je bitno za studente

3. Navigacija u lavini medicinskih i bioloških informacija na internetu

4. O dostupnosti i obradi informacija

3

27 February 2011 zeljko.jericevic@riteh.hr 5

Zašto biološke baze podataka• Razvoj molekularne biologije i genomskih

tehnologija doveo je do eksplozivnog rastabioloških podataka

• Poplava bioloških podataka nužno zahtjevaračunarske baze podataka i metode zapohranu, pronalaženje, organizaciju, obradu i vizualizaciju podataka

6

What Is a Biological Database?“A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence.”from NCBI web site

4

27 February 2011 zeljko.jericevic@riteh.hr 7

What Is a Biological Database?“For researchers to benefit from the data stored

in a database, two additional requirements must be met:

• easy access to the information• a method for extracting only that information

needed to answer a specific biological question…”

from NCBI web site

27 February 2011 zeljko.jericevic@riteh.hr 8

1. Što su baze podataka

Organizirani skup povezanih podataka u računalu koji se mogu pretraživati

• Plošne baze podataka

• Relacijske baze podataka

• Objektno orijentirane baze podataka

5

27 February 2011 zeljko.jericevic@riteh.hr 9

Relacijske baze podataka

• Komercijalni DBMS: $$$

Oracle, DB2, SQL Server, FileMaker

• Slobodni DBMS: PiO

PostgreSQL, MySQL

27 February 2011 zeljko.jericevic@riteh.hr 10

Relacijske baze podataka

• Edgar F. Codd

• Tablice

• Preobilje(Redundancy)

• Ključevi

• Codd-ovih12 pravila

6

27 February 2011 zeljko.jericevic@riteh.hr 11

Primjer knjižnice

Carey

Jakovljević, …

Mintas, …

Autor telefonKorisnikNaslov#

651-156Jeričević, Ž.

Organic Chemistry13

651-192Jeričević, Ž.

Benzodiazepini12

651-192Jeričević, Ž.

Načela dizajniranjalijekova

11

Gornja tabela posudbi nije relacijska baza podataka(preobilje i konfliktne informacije)

Primjer knjižnice

Organic Chemistry

3913

Benzodiazepini34,35

12

Načeladizajniranjalijekova

31,32,33

11

NaslovA#B# telefonKorisnikK#

651-156Jeričević, Željko123

Lacković, Zdravko35

Raić-Malić, Silvana32

Raos, Nenad33

Carey, Francis A.39

Jakovljević, Miro34

Mintas, Mladen31

AutorA#

Intersection table =>Prikazane tabele mogu bitielementi relacijske BP

1231323

1231222

1231121

K#B#P#

7

27 February 2011 zeljko.jericevic@riteh.hr 13

Ukratko1. Zašto i što su baze podataka

2. Navigacija u lavini medicinskih i bioloških informacija na internetu

3. O dostupnosti i obradi informacija

27 February 2011 zeljko.jericevic@riteh.hr 14

2. NavigacijaPočetne točke

• Nucleic Acids Research DB &WS Issues

• NCBI– Bookshelf, PubMed, PubChem, …

• EBI

• KEGG

8

27 February 2011 zeljko.jericevic@riteh.hr 15

NAR DB Issue

Ovogodišnji broj:Volume 39, Database Issue

January 1, 2011

http://nar.oxfordjournals.org/

27 February 2011 zeljko.jericevic@riteh.hr 16

NAR DB Issue 2011 editorial“The current 18th Database Issue of Nucleic Acids Researchfeatures descriptions of 96 new and 83 updated online databases covering various areas of molecular biology. It includes two editorials, one that discusses COMBREX, a new exciting project aimed at figuring out the functions of the ‘conserved hypothetical’ proteins, and one concerning BioDBcore, a proposed description of the ‘minimal information about a biological database’. …The Nucleic Acids Research online Database Collection, available at: http://www.oxfordjournals.org/nar/database/a/, now lists 1330 carefully selected molecular biology databases. Nucleic Acids Research web site http://nar.oxfordjournals.org/

9

27 February 2011 zeljko.jericevic@riteh.hr 17

NAR Web Server Issue

Zadnji broj:

Nucleic Acids Research, 2010, Vol. 38, suppl 2 July 1, 2010

http://nar.oxfordjournals.org/

27 February 2011 zeljko.jericevic@riteh.hr 18

NAR WS Issue 2010 editorial“This year’s special emphasis is on next-generation sequencing data analysis, molecular network and pathway analysis, and biological text mining. A total of 15 papers deal with these topics. A large number of papers cover two common tasks, alignment for DNA, RNA or proteins (10 papers), and various forms of gene set enrichment analysis (9 papers). A large number of papers cover DNA and RNA sequence and structure analysis (23 papers) and protein analysis, primarily of protein structure (35 papers). “

From editorial

10

27 February 2011 zeljko.jericevic@riteh.hr 19

NCBI

http://www.ncbi.nlm.nih.gov/

27 February 2011 zeljko.jericevic@riteh.hr 20

11

27 February 2011 zeljko.jericevic@riteh.hr 21

EBI

http://www.ebi.ac.uk

27 February 2011 zeljko.jericevic@riteh.hr 22

EBI servisi

12

27 February 2011 zeljko.jericevic@riteh.hr 23

KEGG

http://www.genome.jp/kegg/

27 February 2011 zeljko.jericevic@riteh.hr 24

KEGG

13

27 February 2011 25

O NCBIDobra informativnastranica ako većniste upoznati s NCBI

27 February 2011 zeljko.jericevic@riteh.hr 26

14

27 February 2011 zeljko.jericevic@riteh.hr 27

Zašto NCBI EntrezSadašnji “za profit” sistemi publiciranja i

pristupa informacijama su nedovoljniza moderni tempo stvaranja biološkihpodataka i translaciju (bench to bed) u medicini.

NIH/NLM je razvio sistem slobodnogpristupa (Open Access) za širenjeinformacija preko interneta.

27 February 2011 zeljko.jericevic@riteh.hr 28

NCBI Bookshelf

15

27 February 2011 zeljko.jericevic@riteh.hr 29

NCBI Pocket BookshelfHandheld computer versions ready for downloading:•Blood Groups and Red Cell Antigens•Clinical Methods•Genes and Disease•Health Services/Technology Assessment Text (HSTAT)•Inflammatory Atherosclerosis: Characteristics of the

Injurious Agent•Medical Microbiology

AHRQ is Agency for Healthcare Research and Quality

27 February 2011 zeljko.jericevic@riteh.hr 30

NCBI PubMed

PubMed: citiranje & sažetci za preko 11 milijuna publikacijaPubMed Central: cijelokupni tekst za ~2 milijuna publikacija

Slobodan pristup (Open Access)

16

27 February 2011 zeljko.jericevic@riteh.hr 31

Programiranje potrebno za osnovnopretraživanje literature:

Logički operatori AND, OR, NOT

101

000

10AND

111

100

10OR

0 = FALSE 1 = TRUE

NOT 1 = 0NOT 0 = 1

27 February 2011 zeljko.jericevic@riteh.hr 32

MeSH (Medical Subject Heading)

U.S. National Library of Medicine MeSH jepomoćna baza podataka koja sadržikontrolirani rječnik termina upotrebljenih zaindeksiranje publikacija u MEDLINE/PubMed. MeSH terminologijaomogućuje pronalaženje izvora informacijakoji koriste različita imena za isti koncept.

17

33

MeSH (Medical Subject Heading)Primjer: Ptičja groznica → bird AND flu•Influenza in Bird•Avian Flu•Flu, Avian•Avian Influenza•Fowl Plague•Plague, Fowl•Influenza, Avian•Avian Influenzas•Influenzas, Avian

zeljko.jericevic@riteh.hr 34

MeSH (Medical Subject Heading)

Primjer: Ptičja groznica → bird AND flu•Influenza in Bird•Avian Flu•Flu, Avian•Avian Influenza•Fowl Plague•Plague, Fowl•Influenza, Avian•Avian Influenzas•Influenzas, Avian

Drugs:oseltamivir (Tamiflu) and zanamivir (Relenza)

18

27 February 2011 zeljko.jericevic@riteh.hr 35

NCBI PubMeddokumentacija

27 February 2011 zeljko.jericevic@riteh.hr 36

Automatsko pretraživanje literatureupotrebom Entrez programskih modula

Entrezprogramski moduli suPerl naredbenedatoteke

E-Utilities

19

27 February 2011 zeljko.jericevic@riteh.hr 37

NCBI PubChem

Podaci o preko 10 milijuna kemikalija

27 February 2011 zeljko.jericevic@riteh.hr 38

tamiflu

20

27 February 2011 zeljko.jericevic@riteh.hr 39

tamiflu78000

27 February 2011 zeljko.jericevic@riteh.hr 40

tamiflu65028

21

27 February 2011 zeljko.jericevic@riteh.hr 41

tamiflu449381

related

27 February 2011 zeljko.jericevic@riteh.hr 42

Strukture slične 449381

22

27 February 2011 zeljko.jericevic@riteh.hr 43

Ukratko1. Zašto i što su baze podataka

2. Navigacija u lavini medicinskih i bioloških informacija na internetu

3. O dostupnosti i obradi informacija

27 February 2011 zeljko.jericevic@riteh.hr 44

4. O dostupnosti i obradi informacija

• Može li informacija biti vlasništvo?• Što možemo naučiti iz povijesti

(usporedba kemije i biologije)?• Može li se poslovati s profitom bez

prava vlasništva nad informacijama?

23

45

The Federal Research Public Access Act of 2006 (Cornyn-Lieberman)

>>the policy would require that agencies with research budgets of more than $100 million enact policy to ensure that articles generated through research funded by that agency are made available online within 6 months of publication.<<(from article by Robin Peek, Newsbreaks, October 31, 2006)

46

The Federal Research Public Access Act of 2006

>>“Public access to research expands shared knowledge across scientific fields and is the best path for accelerating multi-disciplinary breakthroughs in research,” said Richard J. Roberts, a Nobel Prize laureate and research director at New England Biolabs. “As a scientist and a taxpayer, I support this bill because it lifts barriers that hinder, delay, or block the spread of scientific knowledge supported by federal tax dollars.”<< (from article by Robin Peek, Newsbreaks,

October 31, 2006)

24

47

The Federal Research Public Access Act of 2006

>>In an article in The Washington Post, Patricia S. Schroeder, president and chief executive of the Association of American Publishers, promised a fight (not surprisingly). “It is frustrating that we can’t seem to get across to people how expensive it is to do the peer review, edit these articles, and put them into a form everyone can understand”Schroeder said.<<

(from article by Robin Peek, Newsbreaks, October 31, 2006)

27 February 2011 zeljko.jericevic@riteh.hr 48

Open AccessWelcome to the Directory of Open Access Journals.

This service covers free, full text, quality controlled scientific and scholarly journals. We aim to cover all subjects and languages. There are now 3881 journals in the directory. Currently 5999 journals are searchable at article level. As of today 259860 articles are included in the DOAJ service.

http://www.doaj.org/

25

27 February 2011 zeljko.jericevic@riteh.hr 49

Ultimate Open Source & Open Access Application: Wikipedia

“Since Wikipedia was launched online in 2001 as "the free encyclopedia that anyone can edit," it has blossomed to more than a billion words spread over 10 million articles in 250 languages, including 2.5 million articles in English, according to Wikipedia cofounder Wales”

http://www.wikipedia.org/

27 February 2011 zeljko.jericevic@riteh.hr 50

Otvoreni kod (Open Source)• OpenCola

• Medicina: TDI (Tropical Diseases Initiative)

• MIT OpenCourseWare (MIT OCW) 2007

• Connexion Project @ Rice University

• Projekt Gutemberg

• Otvoreni Dokument (ODF)

• OpenOffice

• Linux, slobodni operacijski sustav

26

27 February 2011 51

Roditelji i potomstvo

Richard M. Stallman Linus B. Torvalds

52

Računarske kompetencijepotrebne za bioinformatiku

“Developing Bioinformatics Computer Skills”, C. Gibas& P. Jambeck, O’Reilly, Sebastopol, CA, 2001, pp 427

I IntroductionII The Bioinformatics Workstation

3. Setting Up Your Workstation4. Files and Directories in Unix5. Working on a Unix System

III Tools for BioinformaticsIV Databases and Visualization

27

27 February 2011 zeljko.jericevic@riteh.hr 53

Otvoreni kod(Open Source)Eric S. Raymond

“The Cathedral and the Bazaar”

“The Art of Unix Programming”

Larry Wall

PERL (Practical Extraction and Report Language)

27 February 2011 zeljko.jericevic@riteh.hr 54

Elementi slobodnog toka informacijaSlobodan protok bioloških informacija

uvelike je zasluga ljudi koji s biologijomnemaju (gotovo) ništa.

Značaj informatike (Computer science) zarazvoj moderne biologije je gotovonemoguće precijeniti.

28

55

Usporedba keminformatike i bioinformatike

• Zbog povijesnih i ekonomskih razloga, pristupkemijskim informacijama je bitno različit odpristupa biološkim informacijama

• Opseg i fokus se razlikuju

• Problemi na kojima kemičari i biolozi rade surazličiti

• Metode i pomagala nisu isti, iako postojiprekrivanje

27 February 2011 zeljko.jericevic@riteh.hr 56

Opseg i fokus informacija• Kemijske informacije

• Problemi na molekularnom i atomskom nivou

• Mehanizam uključuje razumijevanje svih atoma i njihove elektronske strukture u molekuli

• Biološke informacije• Problemi na molekularnom, staničnom i višim nivoima

• Mehanizam uključuje razumijevanje uloge svihmolekula i eventualno njihove regulacije, vrste stanica i tkiva

29

27 February 2011 zeljko.jericevic@riteh.hr 57

Kemijske i biološke informacije• Pristup kemijskim informacijama ima dužu tradiciju

- Uglavnom komercijaliziran- Chemical Abstract ima dugu tradiciju (od 1907)- Metode obrade kemijskih informacija su razvijene u

pre-računalnom vremenu i postupno programirane• Pristup biološkim informacijama je napredniji

- Uglavnom slobodan- Informacija se distribuira preko www- Metode za obradu velikih količina podataka su

relativno nove (HGP), masovno i brzo programirane.

27 February 2011 zeljko.jericevic@riteh.hr 58

Primjer kemijskog i biološkog časopisa

• Journal of Chemical Information and Modeling• 1961 Journal of Chemical Documentation

• 1975 Journal of Chemical Information and Computer Sciences

• 2005 Journal of Chemical Information and Modeling

• Bioinformatics• 1985 Computer Applications in the Biosciences (CABIOS)

• 1999 Bioinformatics

30

27 February 2011 zeljko.jericevic@riteh.hr 59

Završna riječ

27 February 2011 60

Budućnost bioinformatike ?Izgradnja informacijske infrastrukture

Slobodni pristup (Open Access) Otvoreni kod (Open Source)

UNIX operacijski sustav

31

27 February 2011 61

Budućnost bioinformatike ?Izgradnja informacijske infrastrukture

Slobodni pristup (Open Access) Otvoreni kod (Open Source)

UNIX operacijski sustav

27 February 2011 zeljko.jericevic@riteh.hr 62

Vježbe

• Visit NAR web site http://nar.oxfordjournals.org/

• Visit the web sites for NCBI, EBI & KEGG

• Visit the http://www.ncbi.nlm.nih.gov/About/

• Visit NCBI Bookshelf

• Visit PubChem

• Visit PubMed

• Visit MIT OCW site at http://ocw.mit.edu/index.htm