Working with databases in Perl
-
Upload
ldami -
Category
Technology
-
view
6.865 -
download
1
description
Transcript of Working with databases in Perl
11.04.23 - Page 1
DépartementOffice
Working with databasesin Perl
Tutorial for FPW::2011, Paris
DépartementOffice
Overview
• intended audience : beginners– in Perl– in Databases
• main topics– Relational databases– Perl DBI basics– Advanced Perl DBI– Object-Relational Mappings
• disclaimer– didn't have personal exposure to everything mentioned in this
tutorial
11.04.23 - Page 1
DépartementOffice
Relational databases
RDBMS = Relational Database Management System
join on c3
Relational model
c1 c2 c3
1 foo 1
2 foo 2
3 bar 1
c3 c4
1 xx
2 yyfilter
Table (rows + columns)
projection
c1 c2 c3 c4
1 foo 1 xx
2 foo 2 yy
3 bar 1 xx
Maybe you don't want a RDBMS
• Other solutions for persistency in Perl:• BerkeleyDB : persistent hashes / arrays• Judy : persistent dynamic arrays / hashes• Redis : persistent arrays / hashes / sets / sorted sets• CouchDB : OO/hierarchical database • MongoDB : document-oriented database • KiokuDB : persistent objects, front-end to BerkeleyDB / CouchDB
/ etc.• Plain Old File (using for example File::Tabular )• KinoSearch : bunch of fields with fulltext indexing• LDAP : directory• Net::Riak : buckets and keys
– See http://en.wikipedia.org/wiki/NoSQL
Features of RDBMS
• Relational• Indexing• Concurrency• Distributed• Transactions (commit / rollback )• Authorization• Triggers and stored procedures• Internationalization• Fulltext• …
Choosing a RDBMS
• Sometimes there is no choice (enforced by context) !
• Criteria– cost, proprietary / open source– volume– features– resources (CPU, RAM, etc.)– ease of installation / deployment / maintenance– stored procedures
• Common choices (open source)– SQLite (file-based)– mysql– Postgres
• Postgres can have server-side procedures in Perl !
Talking to a RDBMS
• SQL : Standard Query Language.
Except that
– the standard is hard to find (not publicly available)
– vendors rarely implement the full standard
– most vendors have non-standard extensions
– it's not only about queries• DML : Data Manipulation Language• DDL : Data Definition Language
Writing SQL
SQL is too low-level, I don't ever want to see it
SQL is the most important part of my application, I won't let
anybody write it for me
Data Definition Language (DDL)
CREATE TABLE author (author_id INTEGER PRIMARY KEY,author_name VARCHAR(20),e_mail VARCHAR(20),…
);
CREATE/ALTER/DROP/RENAMEDATABASEINDEXVIEWTRIGGER
Data Manipulation Language (DML)
SELECT author_name, distribution_nameFROM author INNER JOIN distribution ON author.author_id = distribution.author_id WHERE distribution_name like 'DBD::%';
INSERT INTO author ( author_id, author_name, e_mail ) VALUES ( 123, 'JFOOBAR', '[email protected]' );
UPDATE authorSET e_mail = '[email protected]'
WHERE author_id = 3456;
DELETE FROM author WHERE author_id = 3456;
Best practice : placeholders
SELECT author_name, distribution_nameFROM author INNER JOIN distribution ON author.author_id = distribution.author_id WHERE distribution_name like ? ;
INSERT INTO author ( author_id, author_name, e_mail ) VALUES ( ?, ?, ? );
UPDATE authorSET e_mail = ?
WHERE author_id = ? ;
DELETE FROM author WHERE author_id = ?;
no type distinction (int/string) statements can be cached avoid SQL injection problems
SELECT * FROM foo WHERE val = $x;
$x eq '123; DROP TABLE foo'
• sometimes other syntax (for ex. $1, $2)
11.04.23 - Page 1
DépartementOffice
Perl DBI Basics
Architecture
Database
DBD driver
DBI
Object-Relational Mapper
Perl program
TIOOWTDI
There is onlyone way to do it
TAMMMWTDI
There are many,many manyways to do it
TIMTOWTDI
There is more thanone way to do it
DBD Drivers
– Databases• Adabas DB2 DBMaker Empress Illustra Informix Ingres InterBase
MaxDB Mimer Oracle Ovrimos PO Pg PrimeBase QBase Redbase SQLAnywhere SQLite Solid Sqlflex Sybase Unify mSQL monetdb mysql
– Other kinds of data stores• CSV DBM Excel File iPod LDAP
– Proxy, relay, etc• ADO Gofer JDBC Multi Multiplex ODBC Proxy SQLRelay
– Fake, test• NullP Mock RAM Sponge
When SomeExoticDB has no driver
• Quotes from DBI::DBD :" The first rule for creating a new database driver for the Perl DBI is very
simple: DON'T! "" The second rule for creating a new database driver for the Perl DBI is
also very simple: Don't -- get someone else to do it for you! "
• nevertheless there is good advice/examples– see DBI::DBD
• Other solution : forward to other drivers– ODBC (even on Unix)– JDBC– SQLRelay
DBI API
• handles– the whole package (DBI)– driver handle ($dh)– database handle ($dbh)– statement handle ($sth)
• interacting with handles– objet-oriented
• ->connect(…), ->prepare(…), ->execute(...), …
– tied hash• ->{AutoCommit}, ->{NAME_lc}, ->{CursorName}, …
Connecting
my $dbh = DBI->connect($connection_string);
my $dbh = DBI->connect($connection_string, $user,
$password, { %attributes } );
my $dbh = DBI->connect_cached( @args );
Some dbh attributes
• AutoCommit – if true, every statement is immediately committed– if false, need to call
$dbh->begin_work();… # inserts, updates, deletes$dbh->commit();
• RaiseError– like autodie for standard Perl functions : errors raise exceptions
• see also– PrintError– HandleError– ShowErrorStatement
• and also– LongReadLen– LongTrunkOK– RowCacheSize– …
hash API : attributes can be set dynamically
[local] $dbh->{$attr_name} = $val
• peek at $dbh internals
DB<1> x $dbh {} DB<2> x tied %$dbh {…}
Data retrieval
my $sth = $dbh->prepare($sql);$sth->execute( @bind_values );
my @columns = @{$sth->{NAME}};
while (my $row_aref = $sth->fetch) { …}
# or$dbh->do($sql);
• see also : prepare_cached
Other ways of fetching
• single row• fetchrow_array• fetchrow_arrayref (a.k.a fetch)• fetchrow_hashref
• lists of rows (with optional slicing)• fetchall_arrayref• fetchall_hashref
• prepare, execute and fetch• selectall_arrayref• selectall_hashref
• vertical slice• selectcol_arrayref little DBI support for
cursors
11.04.23 - Page 1
DépartementOffice
Advanced Perl DBI
Transactions
$dbh->{RaiseError} = 1; # errors will raise exceptions
eval {$dbh->begin_work(); # will turn off AutoCommit… # inserts, updates, deletes$dbh->commit();
};if ($@) {
my $err = $@;eval {$dbh->rollback()};my $rollback_result = $@ || "SUCCESS";die "FAILED TRANSACTION : $err" . "; ROLLBACK: $rollback_result";
} • encapsulated in DBIx::Transaction or ORMs $schema->transaction( sub {…} );
• nested transactions : must keep track of transaction depth
• savepoint / release : only in DBIx::Class
Efficiency
my $sth = $dbh->prepare(<<'');SELECT author_id, author_name, e_mail FROM author
my ($id, $name, $e_mail);$sth->execute;$sth->bind_columns(\ ($id, $name, $e_mail));
while ($sth->fetch) { print "author $id is $name at $e_mail\n";}
avoids cost of allocating / deallocating Perl variables don't store a reference and reuse it after another fetch
Metadata
• datasourcesmy @sources = DBI->data_sources($driver);
• table_infomy $sth = $dbh->table_info(@search_criteria);while (my $row = $sth->fetchrow_hashref) { print "$row->{TABLE_NAME} : $row->{TABLE_TYPE}\n";}
• others– column_info()– primary_key_info()– foreign_key_info()
many drivers only have partial implementations
Lost connection
• manual recoverif ($dbh->errstr =~ /broken connection/i) { … }
• DBIx::RetryOverDisconnects– intercepts requests (prepare, execute, …)– filters errors– attemps to reconnect and restart the transaction
• some ORMs have their own layer for recovering connections
• some drivers have their own mechanism$dbh->{mysql_auto_reconnect} = 1;
Datatypes
• NULL undef
• INTEGER, VARCHAR, DATE perl scalar– usually DWIM works– if needed, can specify explicitly
$sth->bind_param($col_num, $value, SQL_DATETIME);
• BLOB perl scalar
• ARRAY (Postgres) arrayref
Large objects
• usually : just scalars in memory
• when reading : control BLOB size$dbh->{LongReadLen} = $max_bytes;$dbh->{LongTrunkOK} = 1
• when writing : can inform the driver$sth->bind_param($ix, $blob, SQL_BLOB);
• driver-specific stream API. Ex :– Pg : pg_lo_open, pg_lo_write, pg_lo_lseek– Oracle : ora_lob_read(…), ora_lob_write(…),
ora_lob_append(…)
Tracing / profiling
• $dbh->trace($trace_setting, $trace_where)– 0 - Trace disabled. – 1 - Trace top-level DBI method calls returning with results or
errors. – 2 - As above, adding tracing of top-level method entry with
parameters.– 3 - As above, adding some high-level information from the driver
and some internal information from the DBI.
• $dbh->{Profile} = 2; # profile at the statement level
– many powerful options– see L<DBI::Profile>
Stored procedures
my $sth = $dbh->prepare($db_specific_sql);
# prepare params to be passed to the called procedure$sth->bind_param(1, $val1);$sth->bind_param(2, $val2);
# prepare memory locations to receive the results$sth->bind_param_inout(3, \$result1);$sth->bind_param_inout(4, \$result2);
# execute the whole thing$sth->execute;
11.04.23 - Page 1
DépartementOffice
Object-Relational Mapping (ORM)
ORM Principle
r1r2...
c1 c2 c3
...
c3 c4
+c1: String+c2: String+c3: class2
r1 : class1
RDBMS
r2 : class1
Application
table1
table2
ORM: What for ?
[catalyst list] On Thu, 2006-06-08, Steve wrote:
Not intending to start any sort of rancorous discussion, but I was wondering whether someone could illuminate me a little?
I'm comfortable with SQL, and with DBI. I write basic SQL that runs just fine on all databases, or more complex SQL when I want to target a single database (ususally postgresql).
What value does an ORM add for a user like me?
ORM useful for …
• dynamic SQL– navigation between tables– generate complex SQL queries from Perl datastructures– better than phrasebook or string concatenation
• automatic data conversions (inflation / deflation)• expansion of tree data structures coded in the relational model• transaction encapsulation • data validation• computed fields• caching• schema deployment• …
See Also : http://lists.scsys.co.uk/pipermail/catalyst/2006-June/008059.html
Impedance mismatch
• SELECT c1, c2 FROM table1 missing c3, so cannot navigate to class2 is it a valid instance of class1 ?
• SELECT * FROM table1 LEFT JOIN table2 ON … what to do with the c4 column ? is it a valid instance of class1 ?
• SELECT c1, c2, length(c2) AS l_c2 FROM table1 no predeclared method in class1 for accessing l_c2
c1 c2 c3 c3 c4+c1: String+c2: String+c3: class2
r1 : class1 RDBMSRAMtable1 table2
ORM Landscape
• Leader– DBIx::Class (a.k.a. DBIC)
• Also discussed here– DBIx::DataModel
• Many others– Rose::DB, Jifty::DBI, Fey::ORM, ORM,
DBIx::ORM::Declarative, Tangram, Coat::Persistent,DBR, DBIx::Sunny, DBIx::Skinny, DBI::Easy, …
Model (UML)
Artist
CD Track
1
*
1 *
DBIx::Class Schema
package MyDatabase::Main; use base qw/DBIx::Class::Schema/; __PACKAGE__->load_namespaces;
package MyDatabase::Main::Result::Artist; use base qw/DBIx::Class/; __PACKAGE__->load_components(qw/PK::Auto Core/); __PACKAGE__->table('artist'); __PACKAGE__->add_columns(qw/ artistid name /); __PACKAGE__->set_primary_key('artistid'); __PACKAGE__->has_many('cds' => 'MyDatabase::Main::Result::Cd');
package ... ...
DBIx::Class usage
my $schema = MyDatabase::Main ->connect('dbi:SQLite:db/example.db');
my @artists = (['Michael Jackson'], ['Eminem']); $schema->populate('Artist', [ [qw/name/], @artists, ]);
my $rs = $schema->resultset('Track')->search( { 'cd.title' => $cdtitle }, { join => [qw/ cd /], } ); while (my $track = $rs->next) { print $track->title . "\n"; }
DBIx::DataModel Schema
package MyDatabase;use DBIx::DataModel;
DBIx::DataModel->Schema(__PACKAGE__)
->Table(qw/Artist artist artistid/)->Table(qw/CD cd cdid /)->Table(qw/Track track trackid /)
->Association([qw/Artist artist 1 /], [qw/CD cds 0..* /])->Composition([qw/CD cd 1 /], [qw/Track tracks 1..* /]);
DBIx::DataModel usage
my $dbh = DBI->connect('dbi:SQLite:db/example.db');
MyDatabase->dbh($dbh);
my @artists = (['Michael Jackson'], ['Eminem']);MyDatabase::Artist->insert(['name'], @artists);
my $statement = MyDatabase->join(qw/CD tracks/)->select( -columns => [qw/track.title|trtitle …/], -where => { 'cd.title' => $cdtitle }, -resultAs => 'statement', # default : arrayref of rows);
while (my $track = $statement->next) { print "$track->{trtitle}\n";}
11.04.23 - Page 1
DépartementOffice
Conclusion
Further info
• Database textbooks• DBI manual (L<DBI>, L<DBI:.FAQ>,
L<DBI::Profile>)• Book : "Programming the DBI"• Vendor's manuals• ORMs
– DBIx::Class::Manual– DBIx::DataModel
mastering databases requires a lot of reading !
11.04.23 - Page 1
DépartementOffice
Bonus slides
Names for primary / foreign keys
• primary : unique; foreign : same name
author.author_id distribution.author_id• RDBMS knows how to perform joins ( "NATURAL JOIN" )
• primary : constant; foreign : unique based on table + column name
author.id distribution.author_id• ORM knows how to perform joins (RoR ActiveRecord)• SELECT * FROM table1, table2 …. which id ?
• primary : constant; foreign : just table name
author.id distribution.author• $a_distrib->author() : foreign key or related record ?
columns for joins should always be indexed
Locks and isolation levels
• Locks on rows– shared
• other clients can also get a shared lock• requests for exclusive lock must wait
– exclusive• all other requests for locks must wait
• Intention locks (on whole tables)– Intent shared– Intent exclusive
• Isolation levels– read-uncommitted– read-committed– repeatable-read– serializable
SELECT … FOR READ ONLYSELECT … FOR UPDATESELECT … LOCK IN SHARE MODE
LOCK TABLE(S) … READ/WRITE
SET TRANSACTION ISOLATION LEVEL …
Cursors
my $sql = "SELECT * FROM SomeTable FOR UPDATE"; my $sth1 = $dbh->prepare($sql);$sth1->execute();my $curr = "WHERE CURRENT OF $sth1->{CursorName}";
while (my $row = $sth1->fetch) {if (…) { $dbh->do("DELETE FROM SomeTable WHERE $curr");
} else { my $sth2 = $dbh->prepare( "UPDATE SomeTable SET col = ? WHERE $curr");
$sth2->execute($new_val); …
Modeling (UML)
Author
Distribution Module
1
*
1 *
► depends on* *
► contains
Terminology
Author
Distribution Module
1
*
1 *
► depends on* *
► contains
multiplicity
associationname
class
association
composition
Implementation
author_idauthor_namee_mail
1
*
1 *
* *
Author
distrib_idmodule_id
Dependency
distrib_iddistrib_named_releaseauthor_id
Distribution
module_idmodule_namedistrib_id
Module
1 1
link table forn-to-n association