Analytics: SQL or NoSQL? Richard Taylor Chair Business Intelligence SIG.

Analytics: SQL or NoSQL?

Richard TaylorChair Business Intelligence SIG

The NoSQL Movement

Meetup June 11 2009 in San FranciscoNoSQL name proposed by Eric Evans

2004 BigTable (Google)

2007 Dynamo (Amazon)

2008 Cassandra (Facebook)

Hadoop/HBase (Yahoo)

Project Voldemort (LinkedIn)

NoSQL Conferences

Relational Database/SQL

1981 Bernstein and GoodmanMulti-version ConcurrencyControl

Database Timeline

19701970 1990 2000 2010

1969 CODASYL- Network database- Schema- DDL/DML

1970 CoddRelational Model

1980 GrayTransaction

1995 Bernstein et alCritique of ANSI SQLIsolation Levels

1989 SQL-89

1992 SQL-92

1999 SQL:1999Object Relational

2003 SQL:2003Analytics extensions

1979 Oracle

1974 SEQUEL

RowColumn

Relational Model

Normalized data “Atomic” Multi-column Key

Operations on tables: select, project, join

Relationship on key Primary Key Foreign Key

Table – n-tuple

SQL Designed for Transaction Processing Good

Easily handles simple cases Everyone has a Query Language

Bad Data access language (not Turing complete) Declarative Language (4GL)

Impedance mismatch with procedural languages Complicated cases get repetitive

Normalization

Refine design of structured data “Atomic” No repeating groups Data item depends on key (and nothing else)

Avoid modification anomalies Ensure every data item is stored only once

Avoid bias to any particular pattern of querying Allow data to be accessed from every angle

Denormalization

Star Schema Example

FactTable

Product

Promotion

Date_key

Store_key

Promotion_key

Product_key

Receipt_number

Quantity

Revenue

Unit_price

Date_key

Day_in_week

Day_in_month

Day_in_year

Day_name

Week_in_month

Week_in_year

Month_nbr

Month_name

Quarter

Holiday

Holiday_desc

Database Summary• Costs

– Fixed schema– Normalization– Transform data on load– Cost of scaling– Problems with large objects– Complicated software

• Benefits– Mature technology– Precise querying– Star Schema – historic data

Tuple Store/NoSQL

Tuple Storage Systems

• Google Database System– Chubby – Lock/metadata manager– Google File System – Distributed file system– Bigtable – Tuple storage on GFS– Map Reduce – Data processing on tuples

• Other tuple stores– Voldemort – Amazon Dynamo– Cassandra– HBase– Hypertable

Tuple Store Model

One Table Operate on Map

Set of (Key, Value) Structured Key Unstructured Value Operations:

select, project Map Reduce

Tuple Store

Key Value

Key Column Timestamp

Map Reduce

• Define two functions– Map

• Input: tuple

• Output: list of tuples

– Reduce• Input: key, list of values

• Output: list or tuple

• Specify a cluster• Specify input and output tuple stores• Framework does the rest

{ Map(k1, v1) } -> { list(k2, v2) }

{ list(k2, v2) } -> { (k2, list(v2)) }

{ Reduce(k2, list(v2)) } -> { list(v3) } -> { (k2, v3) }

Map Reduce Example

For each web page count the number of pages that reference that page

Input tuple store is WWW

Map Function:for each anchor on web page, emit (anchorURL, 1)

Reduce Function:emit (anchorURL, sum(list))

{ Map(k1, v1) } -> { list(k2, v2) }

{ list(k2, v2) } -> { (k2, list(v2)) }

{ Reduce(k2, list(v2)) } -> { (k2, v3) }

URL Web PageURL Web PageURL Web PageURL Web Page

Output tuple store is{ (URL, count) }

Example in SQL

CREATE TABLE links ( URL page NOT NULL,

URL ref_page NOT NULL,PRIMARY KEY page, ref_page

SELECT ref_page, count(DISTINCT page)FROM linksGROUP BY ref_page

For each web page count the number of pages that reference that page

Tuple Store Summary

• Semi-structured data– No need to normalize data

• Simple implementations– Cheap, fast, scalable

• Map Reduce Processing– Simple programming (for geeks)

• Issues– No guidance from schema– No model for historic data

Hadoop winsSort Benchmark

Synthesis

Summary

• SQL– Structured data

– Precise

– Historic data

– Needs transformation

– Scalability issues

• NoSQL– Cheap

– Scalable

– Handles large data

Enterprise Model

Money Content Analytics

?NoSQLRelational

Metadata?

Issues:- Data volume- Query requirements

Analytics Architecture

Map ReduceProcessing TB+/day

RDBData Warehouse

++/day

ReportsTupleStore

CubesReports

Summary

It is all about structured dataHow much do we want?

How much can we afford?

Analytics: SQL or NoSQL? Richard Taylor Chair Business Intelligence SIG.

Documents

Transcript of Analytics: SQL or NoSQL? Richard Taylor Chair Business Intelligence SIG.

NOSQL - CRS4dassia.crs4.it/wp-content/uploads/2014/11/01_NOSQL.pdf · 2015-03-06 · NOSQL Origini e Significato NOSQL = NO a SQL NOSQL = Not Only SQL Il termine NOSQL fu introdotto

NoSQL Racket: A Testing Tool for Detecting NoSQL Injection ...thesai.org/Downloads/Volume8No11/Paper_78-NoSQL... · CouchDB and so on for other NoSQL Database Types (Cassandra, Amazon

NoSQL Smackdown!

Warum NoSQL?

NoSQL - amonra.co.uk · NoSQL An Introduction Amonra IT - 2019. What is NoSQL? NoSQL is a non-relational database management systems, different from traditional relational database

M. Grigorieva, M. Golosova€¦ · Database performance tests : SQL - NoSQL, NoSQL - NoSQL Technology evaluation tests results for NoSQL databases: MongoDB, HBase, Cassandra, Dremel,

SQL vs NoSQL: The NoSQL way

Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015

PostSQL Using PostgreSQL as a better NoSQL - NoSQL Matters

Notonly NoSQL

NoSQL and Big Data Analytics at NOSQL NOW! 2013

NoSQL: Graph Databases. Databases Why NoSQL Databases?

Breizhcamp NoSQL

“Rooting Out” Rootkits David Taylor & John Lupton ISC Information Security Security-SIG, 15 December 2005 ISC/Information Security.

Couchdb Nosql

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14-03-2017)

Nosql seminar

Bases de données NoSQL - Geomatics Unitgeomatics.ulg.ac.be/download/SIG/8-NoSQL.pdf · 8. GA Nys –2019 Bases de données NoSQL S.I.G. 2 – 1.2. Évolutivité des systèmes (scalability)

Who am I?assets.astrails.com/.../wtf-is-mysql.pdf · Twitter Rackspace Digg Everybody LinkedIn Wednesday, June 16, 2010. NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL

NoSQL - expertojava.ua.esexpertojava.ua.es/experto/restringido/2014-15/nosql/slides/nosql03.p… · NoSQL © 2014-2015 Depto. Ciencia de la Computación e IA NoSQL - MongoDB Avanzado