Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .
-
Upload
ann-barnett -
Category
Documents
-
view
216 -
download
0
Transcript of Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .
![Page 1: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/1.jpg)
Relational Databases: Basic Concepts
BCHB5242015
Lecture 21
By Edwards & Li
Slides: https://goo.gl/rl1wFL
![Page 2: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/2.jpg)
Outline
What is a (relational) database? When are relational databases used? Commonly used database management
systems Using existing databases Creating and populating new databases Python and relational databases Exercises
![Page 3: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/3.jpg)
(Relational) Databases
Databases store information Bioinformatics has lots of file-based information:
FASTA sequence databases Genbank format sequences Store sequence, annotation, references, annotation Good as archive or comprehensive reference Poor for a few items
Relational databases also store information Good for a few items at a time Flexible on which items
![Page 4: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/4.jpg)
Relational Databases
Store information in a table Rows represent items Columns represent items' properties or attributes
Name Continent RegionSurface
AreaPopulatio
nGNP
BrazilSouth
America South America 8547403 170115000 776739
Indonesia Asia Southeast Asia 1904569 212107000 84982
India AsiaSouthern and Central
Asia 3287263101366200
0 447114
China Asia Eastern Asia 9572900127755800
0 982268
Pakistan AsiaSouthern and Central
Asia 796095 156483000 61289
United States
North America North America 9363520 278357000 8510700
![Page 5: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/5.jpg)
Relational Databases
Tables can be millions of rows Can access a few rows fast
Countries more than 100,000,000 in population? Countries on the “Asia” continent? Countries that start with “U”? Countries with GNP = 776739
Name Continent RegionSurface
AreaPopulatio
nGNP
BrazilSouth
America South America 8547403 170115000 776739
Indonesia Asia Southeast Asia 1904569 212107000 84982
India AsiaSouthern and Central
Asia 3287263101366200
0 447114
China Asia Eastern Asia 9572900127755800
0 982268
Pakistan AsiaSouthern and Central
Asia 796095 156483000 61289
United States
North America North America 9363520 278357000 8510700
![Page 6: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/6.jpg)
When are Relational Databases Used? LARGE datasets
Does data fit in memory? Store data first ...
... ask questions later Lookup or sort by many keys
For single key, simple data structures often work Store results of expensive compute or data-cleanup
Compute once and return results many times "Random" or unknown access patterns Specialized data-structures not appropriate
Use string/sequence indexes for sequence data
![Page 7: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/7.jpg)
What is RDBMS?
• RDBMS stands for Relational Database Management System.
• RDBMS is the basis for SQL, and for all modern database
• The data in RDBMS is stored in database objects called tables.
• A table is a collection of related data entries and it consists of columns and rows.
![Page 8: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/8.jpg)
Common DBMS
Oracle Commercial, market leader, widely used in
businesses MySQL
Free, open-source, widely used in bioinformatics, suitable for large scale deployment
Sqlite Free, open-source, minimal installation
requirements, no users, suitable for small scale deployment
http://db-engines.com/en/ranking
![Page 9: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/9.jpg)
What is SQL?
• SQL stands for Structured Query Language
• SQL lets you access and manipulate databases
• SQL is an ANSI (American National Standards Institute) standard
• Although SQL is an ANSI (American National Standards Institute) standard, there are different versions of the SQL language.
http://www.w3schools.com/sql/sql_intro.asp
![Page 10: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/10.jpg)
What Can SQL do?
• SQL can execute queries against a database
• SQL can retrieve data from a database
• SQL can insert records in a database
• SQL can update records in a database
• SQL can delete records from a database
• SQL can create new databases
• SQL can create new tables in a database
• SQL can create stored procedures in a database
• SQL can create views in a database
• SQL can set permissions on tables, procedures, and views
http://www.w3schools.com/sql/sql_intro.asp
![Page 11: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/11.jpg)
Lets look at some examples
We'll use a third-party program to "look at" Sqlite databases: SqliteStudio (Linux), SqliteSpy (Windows), …
Download examples: World.db3, taxa.db3 from Course data folder
Use SqliteStudio to look at examples World.db3, taxa.db3
![Page 12: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/12.jpg)
Using existing databases
Use the "select" SQL command to find relevant rows select * from Country where Population > 100000000; select * from Country where Continent = 'Asia'; select * from Country where Name like 'U%'; select * from Country where GNP = 776739;
Each command ends in semicolon ";". "where" specifies the condition/constraint/rule. "*" asks for all attributes from the relevant rows. Lets experiment with world and taxa databases.
![Page 13: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/13.jpg)
Using existing databases
Select can combine (“join”) multiple tables Use the where condition to match rows from each
table and “link” corresponding rows…
select * from taxonomy, name where taxonomy.rank = 'species' and name.name_class = 'misspelling' and name.tax_id = taxonomy.tax_id
![Page 14: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/14.jpg)
Using existing databases
Select can sort and/or return top 10
select * from taxonomy limit 10;
select * from taxonomyorder by scientific_name;
select * from taxonomyorder by tax_id desclimit 10;
![Page 15: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/15.jpg)
Using existing databases
Select can count and do string matching.
"like" uses special symbols: % matches zero or more symbols _ match exactly one symbol
Some RDBMS support regular expressions MySQL, for example.
select count(*) from taxonomy where scientific_name like 'D%';
![Page 16: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/16.jpg)
Creating databases
Use the "create" SQL command to create tablesCREATE TABLE taxonomy ( tax_id INTEGER PRIMARY KEY, scientific_name TEXT, rank TEXT, parent_id INT);CREATE TABLE name ( id INTEGER PRIMARY KEY, tax_id INT, name TEXT, name_class TEXT);
![Page 17: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/17.jpg)
Populating databases
Use the "insert" SQL command to add rows to tables Usually, the special id column is initialized
automatically
INSERT INTO name (tax_id,name,name_class) VALUES (9606,'H. sapiens','synonym');
SELECT * from name where tax_id = 9606;
![Page 18: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/18.jpg)
Python and Relational Databases
Issue select statements from python and iterate through the results
Sometimes it is easiest to make Python do some of the work!
import sqlite3conn = sqlite3.connect(‘taxa.db3')c = conn.cursor()c.execute(""" select * from name where name like 'D%' limit 10; """)for row in c: print row
![Page 19: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/19.jpg)
Python and Relational Databases
Use parameter substitution for run-time values
import sysimport sqlite3
tid = int(sys.argv[1])
conn = sqlite3.connect('taxa.db3')params = [tid,'scientific name']c = conn.cursor()c.execute(""" select * from name where tax_id = ? and name_class = ?;""",params)for row in c: print row
![Page 20: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/20.jpg)
Next-time: Object-relational mappers
Setup python to treat tables as classes, rows as objects
# Set up data-modelfrom model import *
hs = Taxonomy.get(9606)for n in hs.names: print n.name, "|", n.nameClass
condition = Name.q.name.startswith('Da')for n in Name.select(condition): print n.name, "|", n.nameClass
![Page 21: Relational Databases: Basic Concepts BCHB524 2015 Lecture 21 By Edwards & Li Slides: .](https://reader036.fdocuments.net/reader036/viewer/2022070414/5697c0021a28abf838cc2bc2/html5/thumbnails/21.jpg)
Lab exercises
Read through an online course in SQL sqlcourse.com, sql-tutorial.net, ...
Write a python program to lookup the scientific name for a user-supplied organism name.