PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National...

29
PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( [email protected] )

Transcript of PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National...

Page 1: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

PostgreSQL: Beyond "Standard" Relational Model

Igor A.Gaponenko

Lawrence Berkeley National Laboratory( [email protected] )

Page 2: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

2

Motivation…

Anti-goals Not to cover SQL92 or SQL99 standards Not to compare directly MySql and PostgreSQL or any other (object-)relational

databases like Oracle 9i

Goals Cognitive

learning advanced features of object-relational model and its particular implementation (What's beyond the original relational model and primitive data types.).

Practical looking for an adequate persistent technology to re-implement the Condition/DB and

other non- Event Store databases of the BaBar Experiment. Note, that requirements for the migration are beyond the scope of the talk - they are "implied" by

the problem domain.

Page 3: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

3

Object-Relational Model Foundation (1)

Why? An attempt to benefit from both “Real World” modeling and performance

superiority of ODBMS over RDBMS. An attempt to address shortcomings of SQL and "Object Oriented" database

systems. As a way of rethinking original Codd's (relational) model.

The Third Object-Relational Database Manifesto A proposal for the future direction of data and database management systems.

Provides a foundation for integrating relational and object technologies. Published:

C.J. Date and H. Darwen. "A Foundation for Object Relational Database Systems: The third manifesto. Addison-Wesley, 1998.“

C.J. Date and H. Darwen. "Foundation for Future Database Systems: The Third Manifesto (2nd Edition). Addison-Wesley, 2000.“

Standards: SQL92 SQL99 (The Object-Relational one)

Page 4: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

4

Object-Relational Model Foundation (2)

The relation remains THE cornerstone concept, however in ORDBMS it's extended with: Structured types for attributes (in addition to atomic types):

structures, sets, arrays, bugs, etc.

Methods: special operations to be defined and applied to values of user defined types.

Identifiers for tuples (similar to "object identifiers" in ODBMS). They're generally invisible to users.

References (to tuples). Nested relations ("inclusive polymorphism") as a way it's a way to extend relations

SQL99 defines both single and multiple inheritance

Page 5: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

5

PostgreSQL: History, Versions and Platforms

Derives from UC Berkeley's "Ingres" and "Postgres95" academic databases projects. Is freely available with the source. Is distributed under BSD license.

Current stable version 7.3.2 (beginning 2003). Current development version 7.4 (to be completed by the end of 2003). Next stable version 7.5 or 8.0 (2004). In constantly under improvement (there are real people behind it!). Commercial flavors are also available.

All UNIX-es are supported (including MacOS X). There are "native" ports of some earlier (7.2.x) versions onto MS Windows NT/2000/XP. Cygwin is also an option for newer versions. Full support for MS will be added as of 7.5 or 8.0.

Page 6: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

6

PostgreSQL vs MySQL: Which one is better (“better” for what)?

See a very interesting discussion on this subject at:

“A Response to the Featurewise Comparison of MySQL and PostgreSQL” by Peter Eisentraut, PostgreSQL Global Development Group

http://developer.postgresql.org/~petere/comparison.html

Conclusion (no surprise!): PostgreSQL is better :-) Now let’s take a tour over advanced features of PostgreSQL…

Page 7: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

7

Client-Server Architecture

Current state of affairs: There is a (post-)"master" process launching a number of "work" processes doing

the actual work on behalf of clients. The number is controlled through a configuration file.

It fits well into SMP architecture by relying on automatic load balancing done by the corresponding operating system.

Does not benefit from multi-threading. Is this really needed? There is no support for cluster based installation (to run "work" processes on

different hosts).

On-going developments: "Replicated" DBMS. There are a few projects. see GBorg from

WWW.PostgreSQL.org for details.

Page 8: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

8

Limitations of PostgreSQL

Theoretical limits Maximum size for a database : unlimited (4 TB databases exist) Maximum size for a table : 16 TB on all operating systems Maximum size for a row : 1.6 TB Maximum size for a field : 1 GB Maximum number of rows in a table : unlimited Maximum number of columns in a table : 250 - 1600 (depending on column types) Maximum number of indexes on a table : unlimited

System configuration limits The are imposed by available disk space and memory/swap space. This is also

related to the performance of the database. The maximum table size and maximum number of columns can be increased if the

default block size is increased to 32k.

2 GB File Limit Issue (is not an issues): The maximum table size of 16 TB does not require large file support from the

operating system. Large tables are stored as multiple 1 GB files so file system size limits are not important.

Page 9: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

9

Concurrency Control: MVCC Transaction Model

"...Unlike traditional database systems which use locks for concurrency control, PostgreSQL maintains data consistency by using a multi-version model (Multi-version Concurrency Control, MVCC). This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session...“

"...The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading...“

"...Table- and row-level locking facilities are also available in PostgreSQL for applications that cannot adapt easily to MVCC behavior. However, proper use of MVCC will generally provide better performance than locks..."

Page 10: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

10

Concurrency Control: SQL Transaction Isolation Levels (1)

Three known phenomena: "dirty read" : A transaction reads data

written by a concurrent uncommitted transaction.

"nonrepeatable read" : A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).

"phantom read" : A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.

level dirty Non-repeatable

phantom

Read uncommited

x x x

Read commited

x x

Repeatable read

x

serializable

Page 11: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

11

Concurrency Control: SQL Transaction Isolation Levels (2)

START TRANSACTION

[ ISOLATION LEVEL { READ COMMITTED|SERIALIZABLE } ]

[ READ WRITE | READ ONLY ]

ROLLBACK

COMMIT

SELECT FOR UPDATE

PostgreSQL provides: READ COMMITED SERIALIZABLE

Implicit locking (tables) Explicit locking (rows)

A problem of “deadlocks”

Page 12: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

12

Tables with OID-s

Non-standard SQL92 or 99 option! OID := unsigned 4 byte integer; can't be used to address rows in

databases and even large tables. Their use as primary keys is discouraged except system tables.

OID-s are used internally by PostgreSQL as primary keys for various system tables.

User defined tables (tuples) may have OID-s explicitly visible to clients:

CREATE TABLE <name> ...[ WITHOUT OIDS ] ...

SELECT * FROM sample; oid | Name--------+---------------- 123456 | Igor Gaponenko (1 rows)

Page 13: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

13

Arrays

CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);

Any atomic type can be used here.Arrays can be multidimensional.

CREATE TABLE tictactoe ( squares integer[3][3]);

Using predefined size (is not actually enforced)

INSERT INTO sal_emp

VALUES ('Carol',

'{20000, 25000, 25000, 25000}',

'{{"talk", "consult"}, {"meeting"}}‘

);

Special syntax for inserting multiple values:

Page 14: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

14

Creating new types (1)

CREATE TYPE typename (

INPUT = input_function,

OUTPUT = output_function

, INTERNALLENGTH = { internallength | VARIABLE }

[ , EXTERNALLENGTH = { externallength | VARIABLE } ]

[ , DEFAULT = "default" ] [ , ELEMENT = element ]

[ , DELIMITER = delimiter ]

[ , SEND = send_function ]

[ , RECEIVE = receive_function ]

[ , PASSEDBYVALUE ]

[ , ALIGNMENT = alignment ]

[ , STORAGE = storage ] )

Page 15: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

15

Creating new types (2)

CREATE FUNCTION zero_out(opaque) RETURNS opaque

AS '/usr/local/pgsql/lib/zero.so' LANGUAGE 'C';

CREATE FUNCTION zero_in(opaque) RETURNS zero

AS '/usr/local/pgsql/lib/zero.so' LANGUAGE 'C';

CREATE TYPE zero

(internallength = 16,

input = zero_in,

output = zero_out);

Page 16: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

16

Creating new types (3)

Where to see real life examples: Go to the "PostGIS" Web site:

http://postgis.refractions.net/

as an example of extending PostgreSQL with 3-D geographic objects.

Geometric objects in PostgreSQL is another example

Page 17: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

17

Defining operators and functions…

To be finished…

Page 18: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

18

Triggers…

To be finished…

Page 19: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

19

Cursors...

Page 20: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

20

Stored procedures…

PL/pgSQL PL/pgPerl PL/pgPython C

Page 21: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

21

Rules

Creating user defined rules is PSQL extension allowing to change the semantics of SELECT, INSERT, UPDATE, or DELETE commands.

It's a way of doing something extra in addition to the original command or even substitute the command with another command.

CREATE RULE "_RETURN" AS

ON SELECT TO t1 DO INSTEAD

SELECT * FROM t2;

SELECT * FROM t1;

CREATE RULE notify_me AS ON UPDATE TO mytable DO NOTIFY mytable;

UPDATE mytable SET name = 'foo' WHERE id = 42;

Page 22: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

22

Indexes

Available indexes: B-Tree (Lehman-Yao high concurrency algorithm) R-Tree (standard R-trees using Guttman’s quadratic split algorithm)

for spatial information) (deprecated in favor of GiST) GiST Hash

Multicolumn indexes are also allowed (up to 32 columns) Query Optimizer will use appropriate ones when performing queries

CREATE TABLE test (

integer id,

...

);

CREATE INDEX test_id_index ON test (id);

CREATE INDEX test_id_hash ON test USING RTREE (id);

Page 23: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

23

Database Management : Requirements

General problems to be solved in a context of HEP databases: managing a database installation(-s) at a site

providing database integrity: backup, restore, contents management

controlling access to the database: Authentications, authorization, ACL-s, etc.

distributing data between database installations around a collaboration “master” and “mirrors”

Sharing (exchanging) data between database installations: “user1” and “user2”

Page 24: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

24

Database Management : Contents Management

PostgeSQL has: Certain degree of control over data clustering:

A server can serve multiple “database clusters” (maps to ‘DATABASE’ in SQL)

Each “database cluster” is self-sufficient (schema + tables)

Clusters may spread across different file systems

“Garbage Collection“ mechanism: VACUUM command (not SQL Standard)

It provides: Remove any leftover data from rollbacks and other processes that can leave temporary data

(garbage collection) Analyze activity in the database to assist PostgreSQL in designing efficient queries.

It’s meant for space/performance optimization

It does not interfere with normal database operations. It will slow them down though!

Is supposed to be run in periods of “natural” inactivity

"Schema documentation": COMMENT command (not SQL Standard)

It complements naming conventions for database schema components.

Page 25: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

25

Database Management : Copy/Backup/Restore

Storing the contents of tables in files in binary/text format and loading them back into tables.

COPY [ BINARY ] table [ WITH OIDS ]

FROM { 'filename' | stdin }

...

COPY [ BINARY ] table [ WITH OIDS ]

TO { 'filename' | stdout }

...

“Hot” backup/restore operations w/o interrupting users of a database 'pg_dump' creates a set of SQL commands to backup/restore whole database the actual backup/restore procedures: 'psql' for plain text dumps, 'pg_restore' for other

(compressed, binary) dumps. “HOT” restore mode is supported for data backups oonly. It’s available due to 'multi-

version' transaction control (MVCC) system with active users in the middle of transactions.

Page 26: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

26

Database Management : Authentication, etc.

Encryption of client-server protocol: Build-in SSL (compilation option) SSH/OpenSSH tunneling (requires S-Shell access to the database server) Stunnel (no S-Shell access is required)

"host-based authentication“ Special files in each “database cluster”:

pg_hba.conf pg_ident.conf

From PostgreSQL documentation: "...Put simply, the pg_hba.conf file allows you to determine who is allowed to connect to

which databases from what machines, and to what degree they must prove their authenticity to gain access..."

host all 127.0.0.1 255.255.255.255 trust

host template1 192.167.123.15 255.255.255.255 reject

host gapon 192.167.123.14 255.255.255.255 crypt

host template1 192.167.123.13 255.255.255.255 ident sales

Page 27: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

27

Database Management : Access Control Lists (1)

From PostgreSQL documentation: "...users and groups can allow for fine-grained, versatile access control to your

database objects...“

"...PostgreSQL stores both user and group data within its own system catalogs. These are different from the users and groups defined within the operating system on which the software is installed. Any connection to PostgreSQL must be made with a specific user, and any user may belong to one or more defined groups...“

"...Users control the allocation of rights and track who is allowed to perform actions on the system (and which actions they may perform). Groups exist as a means to simplify the allocation of these rights. Both users and groups exist as global database objects, which means they are not tied to any particular database..."

Page 28: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

28

Database Management : Access Control Lists (2)

CREATE USER ... WITH PASSWORD '...' ALTER USER ... WITH PASSWORD '...' SELECT * FROM pg_shadow;

Users and passwords are stored in a special system table

GRANT privilege [, ...] ON object [, ...] TO { PUBLIC | username | GROUP groupname }

REVOKE privilege [, ...] ON object [, ...] FROM { PUBLIC | username | GROUP groupname }

Users are owners of database objects they’re creating. They can also grant/revoke privileges to/from other users or groups.

Page 29: PostgreSQL: Beyond "Standard" Relational Model Igor A.Gaponenko Lawrence Berkeley National Laboratory ( IAGaponenko@lbl.gov )IAGaponenko@lbl.gov.

November 14, 2003 Igor A. Gaponenko: PostgreSQL: Beyond "Standard" Relational Model

29

Database Management : Access Control Lists (3)

CREATE VIEW stock_view

AS SELECT isbn, retail, stock FROM stock;

GRANT SELECT ON stock_view TO GROUP sales;

CREATE USER barbara;

GRANT USER barbara SELECT ON stock_view;

SELECT * FROM stock_view;

Using views to grant access to subsets of tables