Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm...

35
Storing data in databases The webinar will begin at 3pm You now have a menu in the top right corner of your screen. The red button with a white arrow allows you to expand and contract the webinar menu, in which you can write questions/comments. We won’t have time to answer questions while we are presenting, but will answer them at the end You will be on mute throughout – we can’t hear you.

Transcript of Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm...

Page 1: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Storing data in databases

The webinar will begin at 3pm

• You now have a menu in the top right corner of your screen.

• The red button with a white arrow allows you to expand and contract the webinar menu, in which you can write questions/comments.

• We won’t have time to answer questions while we are presenting, but will answer them at the end

• You will be on mute throughout – we can’t hear you.

Page 2: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Storing data in databases

Webinar

25 October 2016

Peter SmythUK Data Service

Page 3: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Can you hear us?

Page 4: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Can you hear us?

• If Not:

• Check your volume, and that your speaker/headset is

plugged in.

• Your invitation also included a phone number, you can

call that to listen in.

o UK +44 (0) 330 221 9914

o US +1 (914) 614-3429

• We are recording this webinar, so you can always

listen to it later.

Page 5: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Overview of this webinar• Definition of a database• Why Excel isn’t always good enough• Different Database types and availability• Relational Databases

• A bit of history• Data organisation• Limitations• Query examples

• Document Databases• MongoDB• Query examples

• Graph Database demo

Page 6: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Definition of Database

“A structured set of data held in a computer, especially one that is accessible in various ways.”(Oxford University Press)

• Structured = Ordered? Or Arranged?• Nothing about the details of the structuring

• Accessible = Searchable, able to query the contents to see what is there

Page 7: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Not a database! - Why not?

Page 8: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

What about Excel?

• Worksheets are tabular in nature - very structured

• You can join sheets together using the VLOOKUP

function

• There is a set of Database type functions (DSUM,

DCOUNT etc.)

• You can write queries to filter the rows

Page 9: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Excel Restrictions

• Sheets have limit of 1 million rows (220)

• VLOOKUP can only return a single column

• The database functions can only return a single value

• Setting up queries is quite complex

Page 10: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Why use a desktop database?

• Size of data

• Convenience of a desktop system

• Flexibility in collecting and persisting data

• Flexibility in querying and analysis

Page 11: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Growing and shrinking data

Tweets

Smart meter data

Sent Tweet

All Smart meter data

All tweets from user

All tweets from User & Friends

Data from Tweet

Smart meter by day

Smart meter by Month

By Month and Geography

1Kb 1Mb 1Gb 10+ Gb

Desktop Application Big Data Environment

Page 12: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Growing and shrinking data Tweets

Smart meter data

Sent Tweet

All Smart meter data

All tweets from user

All tweets from User & Friends

Data from Tweet

Smart meter by day

Smart meter by Month

By Month and Geography

1Kb 1Gb 25 Gb

Desktop Application

Big Data Environment

5GB 25+ GB

Desktop Database

Page 13: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Types of Databases

There are many different types of DatabasesFor the end user there are probably four main types.

• Relational Databases • (MySQL, MS SQL, SQLite, Postgres …)

• Document databases• MongoDB, CouchDB, …)

• Graph databases• (Neo4j, Titan, …)

• Wide column stores• (Cassandra, Hbase,,…)

Page 14: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Types of Databases

• Relational Databases predominate – by a long way• Data held in tables with defined relationships between the tables

• Document databases and wide column databases use storage architectures designed to overcome some of the scalability problems of relational databases. Since Big Data sources have become available, these are gaining in popularity

• Graph Databases are designed to optimise specific type of querying of data – where you are more interested in the relationship between different items that the actual attributes of the items, often used with Social networks

Page 15: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Types of Databases

• http://db-engines.com/en/ranking

• The link below provides a table of the different Databases

systems available and their relative use. Both Commercial

and Free databases systems are included.

Page 16: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Types of Databases (Table)Freely available options

Page 17: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model

• Why do we have it?

• What is it good for?

• What are the pros and cons?

• What do we mean by relational?

Page 18: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model - History

• The term "relational database" was first used by E. F.

Codd in 1970 in the paper "A Relational Model of Data

for Large Shared Data Banks”

• Although not necessarily the primary driver, it should be

noted that at the time computer storage was very

expensive

• The Relational model can be very efficient when storing data.

Typically data items are stored only once

Page 19: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model - History

Storage prices fell from about $193K per Gb in 1980 to about $0.03 in 2014

http://www.mkomo.com/cost-per-gigabyte-update

Page 20: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model – How it works• If I wanted to record the details of a house and the people

who lived there, I could create a table like this:

• I would need a single record for each person at that address

HouseHold_AllHouseHold_IdAddressPostCodePerson_idFirstNameLastNameDOBSexAgeNo_of _RoomsNo_of_OccupantsTypeConstruction

Page 21: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model – How it works

And populate it with data, like this

These records all relate to the same household, but the data about the house itself is repeated for each person in the house

HouseHold_Id Address PostCode Person_id FirstName LastName DOB Sex AgeNo_of _Rooms

No_of_Occupants Type Construction

1Some street, Some Town AA1 2BB 1Alfie Smith 17/09/1963 M 60 8 5Semi Brick

1Some street, Some Town AA1 2BB 2Jane Smith 05/02/1970 F 60 8 5Semi Brick

1Some street, Some Town AA1 2BB 3John Smith 03/01/2001 M 60 8 5Semi Brick

1Some street, Some Town AA1 2BB 4Jack Smith 10/10/2005 M 60 8 5Semi Brick

1Some street, Some Town AA1 2BB 5Jenny Smith 07/05/2009 F 60 8 5Semi Brick

Page 22: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model – How it works

• It makes more sense to use multiple tables and split the data

between them

• This eliminates the need to duplicate data

• The arrows represent relationships between the tables.

• If I only wanted details about the a person, I wouldn’t need to

refer to the other tables

Page 23: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model – How it works

• All of the Occupant information is kept in a single table.

• Details of the Property are only recorded once in the three

smaller tables

Page 24: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model - Advantages

• Data is only stored once (across multiple tables if

necessary)

• Efficient for well known and structured data

• Well defined and understood query language (SQL)

• variants available for all relational databases

• Schema on Write allows comprehensive data checking

before loading – making for cleaner data

Page 25: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

The Relational Model - Disadvantages

• The need for multiple tables increases loading times

• Uses vertical scaling

• Not really relevant for desktop databases

• Schema on write cannot deal with unstructured data

efficiently, if at all

Page 26: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Document Databases

• Why do we have it?

• What is it good for?

• What are the pros and cons?

• What is meant by a document?

Page 27: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Document Database

• A ‘document’ does not mean a pdf or word document

• A document is semi-structured data

• It is ‘structured’ in that every data item in the document

has name associated with it

• It is ‘semi-’ in that different documents in the same

collection of documents don’t have to have the same set

of names

Page 28: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

JSON Example – semi-structured data

• The most popular format for Semi-structured data is

JSON.

• Most data that can be downloaded from a Web based

API will be in JSON format (or at least offer JSON as a

choice of format)

Page 29: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

JSON Example – semi-structured data

The following is a simple example of JSON formatted data

{ ‘Name’ : ‘Manchester’,‘PostCode’ : ‘M13 9PL’,‘Established’ : 1824 }

It is split over several lines just to aid reading. Everything between the ‘{’ and ‘}’ represents a single record, or document

Page 30: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Document Databases

• The semi-structured nature means that it is difficult to store the data in tables• Not all fields need to be in each document• Fields don’t need to be in the same order

{ 'id' : 1234, 'Name' : 'Peter', 'Tel' : 012345678 }{ 'Name' : 'John', 'id' : 3523, 'Email' : ['[email protected]', '[email protected]'] ,'Mob' : 012345678}

• Even more difficult to create a schema for the data in advance

• Instead, data is stored ‘as-is’ and a schema is ‘created’ when the data is read – Schema on read

Page 31: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Document Databases - NoSQL

• Non-Relational databases like MongoDB typically do not use

SQL to query the data.

• When you install MongoDB you are provided with a Simple

Shell interface from which you can query the database.

• Use of the Shell to query requires a knowledge of Javascript.

• As an alternative, both Python and R have packages which

interface to MongoDB to allow querying of the database using

native Python or R like constructs

• The unstructured nature of the data, adds to the complexity of

querying

Page 32: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

A Graphics Database – Neo4j

• The default installation of Neo4j provides a simple

default ‘Movies’ database.

• It also comes with tutorials to help get you started

Page 33: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Summary

• The size of your data may be enough to make you

decide on using a desktop database

• But it may not be the only consideration

o How are you collecting the data over time?

o What is the structure of the data?

o How do you intend to use the data

o Can you clean and structure the data as you collect it?

o Do you need to keep all of the raw data just in case?

Page 34: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button

Questions

Peter Smyth

[email protected]

ukdataservice.ac.uk/help/

Subscribe to the UK Data Service news list at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKDATASERVICE

Follow us on Twitter https://twitter.com/UKDataServiceor Facebook https://www.facebook.com/UKDataService

Page 35: Storing data in databases · 2016-10-26 · Storing data in databases The webinar will begin at 3pm • You now have a menu in the top right corner of your screen. • The red button