INTRODUCTION TO SQL. SQL stands for « Structured Query Language » Progamming language for database...

37
INTRODUCTION TO SQL

Transcript of INTRODUCTION TO SQL. SQL stands for « Structured Query Language » Progamming language for database...

INTRODUCTION TO SQL

• SQL stands for « Structured Query Language »• Progamming language for database closer to natural English than the

other (based on « sentence » instead of « procedure »)• Aim is to ease the querying of data by the human and the

programmation of interfaces• Powerful functions for text recognition• Powerfull extensions for GIS (PostGIS, Oracle)• Standardized and recognized by most of the recent relational database

BUT1)...minor differences of syntax between vendors and enhanced

functions prevent easy interoperability between products2) SQL databases often imply that the development of the

interfaces is a distinct from the development of the core of the database

Interoperability problem between vendors

– possible solutions• use an intermediate layer between the database and

the interface– ODBC/JDBC (connectors used by other software by

Windows/Java)– use ORM (Object Relational Mapper) software that allows the

programmer to use the same syntax when developing interfaces

e.g : Doctrine (Open-Source)

SQL and NoSQL• SQL is pretty useful for normalized database where the control of data

integrity is important (scientific value)• ...but it is not scalable : huge amount (> 300 000) of data lower peformances)• since 4/5 years, with the explosion of the Internet there is a trend in NoSQL

database; fastr databases that can handle huge amount of data raplidly, e.g: Solr (to index Words and PDF),MongoDB, Cassandra etc...

• NoSQL offers speed, fast replication between locations, flexible structure but no control on integrity. It doesn’t replace SQL but complements it

(SQL=> control of the integrity and of the completness of data is more important than speed + good interaction with GIS

NoSQL: high availability of data on the Internet but no schema to validate integrity and not yet GIS plug in )Problem: scientic information network requires both quality control and high availiability

SQL: 4 parts• Data Query Language (DQL)

– Search and display data matching specific criteria• Data manipulation language (DML):

– modify data (insert, update, delete)– lock (atomicity of data: two user cannot modify the ame data in parallel)– use transation (rollback to the previous state of the database if a

modification fails)• Data Definition language (DDL)

– create the schema of the database (the normalised structure, the index): you can defined yourselve how to check the integrity of the database

• Data Control language (DCL)– create authorization and access rule for users

Vocabulary

Field

Record(or tuple)

Field name Field

Type

Table

Recommandations

To ease the manipulation with SQLwhen creating a database:

– Avoid uppercase letters in field names– Avoid accented characters in field names (but you must keep them

in the content of course!)– replace white spaces with underscore – avoid at any price other non alphabetical or numerical characters– avoid giving the same name to two fields in different tables (not

always possible...) – table name in plural– field name singular– use descriptive field name (e.g: not ‘dc’ but ‘date_collected’)

Querying

Pattern: SELECT <comma-separated list of fields> FROM <name of Table> ;e.g.SELECT Locality FROM localities;SELECT Locality, Country FROM localities;SELECT * FROM Localities;

« * »=> all fields (wildcard)

Querying II

Pattern: SELECT <comma-separated list of fields> FROM <name of Table> WHERE [condition] ;e.g.SELECT pk_locality, latitude_decimals, longitude_decimals FROM localities WHERE Locality =‘Tienen’;

Querying II

Pattern: SELECT <comma-separated list of fields> FROM <name of Table> WHERE [condition] ;e.g.SELECT * FROM localities

WHERE latitude_decimals >50.80

AND latitude_decimals<50.85

Querying III (boolean)

Compare the resultSELECT * FROM localities

WHERE latitude_decimals >50.80

AND latitude_decimals<50.85SELECT * FROM localities

WHERE latitude_decimals >50.80

OR latitude_decimals<50.85

Querying IV (boolean)

Compare the resultSELECT * FROM localities

WHERE locality=‘Tienen’

AND locality=‘Bunsbeek’;SELECT * FROM localities

WHERE locality=‘Tienen’

OR locality=‘Bunsbeek’;

Querying II

Pattern: SELECT <comma-separated list of fields> FROM <name of Table> WHERE [condition] ;e.g.SELECT * FROM localities

WHERE locality <> ‘Hensberg’;

SELECT * FROM localities WHERE

locality IS NULL;

JOINING (I)SELECT *FROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_name[+ WHERE CONDITION] ;

Joining II

• Exercice– Find the collectors of ‘Agostis’

Joining II

• Exercice• Find the collectors of ‘Agostis’

SELECT collector_name, genusFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_namewhere genus='Agrostis';

Joining III

• Exercice– Find the scientific names having been collected in

Tienen

Joining III

• Exercice– Find the scientific names having been collected in TienenSELECT scientific_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localitywhere locality='Tienen';

Joining III (ordering)• Exercice Find the scientific names having been collected in Tienen

SELECT scientific_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localitywhere locality='Tienen‘ORDER BY scientific_name;

Joining III

• Exercice Find the collectors of ‘Balsaminaceae’– Find the collectors of ‘Balsaminaceae’

Joining III• Exercice

– Find the collectors of ‘Balsaminaceae’SELECT collector_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINfamilies ONscientific_names.fk_family=families.pk_familywhere family='Balsaminaceae';

Views‘Save’ and make complex queries permanent in the database(useful for programming of filtering)CREATE VIEW v_specimen_names_localities

AS SELECT scientific_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_locality

Search on Text Patterns (I)

a) match one position: '_';Þ ‘_’ means any character present one time

b) match several positions: '%';Þ ‘%’ means the absence or repetition of any

character

Note: white space counts for one character

Search on Text Patterns (II)

• SQL SyntaxSELECT ...WHERE field LIKE 'pattern';

• PostgresSQL SyntaxSELECT ...WHERE field SIMILAR TO 'pattern';

Search on Text Patterns (III)

Example:find the scientific names having «’e’ » as second letter of genus:SELECT scientific_name FROM scientific_names WHERE genus SIMILAR TO '_e%';

Search on Text Patterns (IV)

Example:Pattern: '_e%';Response: ‘Aegopodium’

‘Aethusa’‘Bellis’‘Betula’...

Search on Text Patterns (V)

Example:Pattern: '_e%';

Response: ‘Aegopodium’‘Aethusa’‘Bellis’‘Betula’...

Search on text pattern (VI)

• Interval of characters• Use brackets

[a-z]: any lower case letter[A-Z]: any uppercase letter[0-9]: any numer[aA]: ‘a’ or ‘A’

Search on text pattern (VII)

• Useful to control nomenclature!!• Exercice: Search the species containing

uppercase characters:

Search on text pattern (VII)

• Useful to control nomenclature• Exercice: Search the species containing

uppercase characters:SELECT *FROM scientific_namesWHERE species SIMILAR TO '%[A-Z]%';

Search on text pattern (VIII)

• Useful to control nomenclature• Exercice: Search the genus containing

uppercase letters after the first one:

Search on text pattern (VIII)

Exercice: Search the genus containing uppercase letters after the first letter:

SELECT *FROM scientific_namesWHERE genus SIMILAR TO ‘_%[A-Z]%';

Search on text pattern (IX)

• Useful to control nomenclature• Exercice: Search the genus containing more

than one word:

Search on text pattern (IX)

Exercice: Search the genus containing more than one word

SELECT *FROM scientific_namesWHERE genus SIMILAR TO '%[a-z]% %[a-z]%';

Search on text pattern (X)

• PostgreSQL is also compliant with an even more powerfull mechanism called « regular expression »– standard syntax shared by several programming

languages– allow matching complex patterns– can perform replacements and extractions

<optional if somebody ask how to group information in one row>

Group specimen collected in Tienen per CollectorSELECT array_to_string(array_agg(scientific_name), ','), collector_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localitywhere locality='Tienen'GROUP BY collector_nameORDER BY collector_name;

<optional if somebody ask how to group information in one row>

Group localities per collectorsSELECT array_to_string(array_agg(locality), ','), collector_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localityGROUP BY collector_nameORDER BY collector_name;