Infovision Anand S _ no sql workshop

16
NoSQL Data Scientist Gramener.com Relational Databases Non

description

NoSQL, Non relational databases

Transcript of Infovision Anand S _ no sql workshop

Page 1: Infovision Anand S _ no sql workshop

NoSQLData ScientistGramener.com

Relational DatabasesNon

Page 2: Infovision Anand S _ no sql workshop

Relational DatabasesNon

Professor Hired on Course

Anand 10 Jan 2012 Maths

Bala 15 Jan 2012 Physics

Chandra 20 Jan 2012 Chemistry

Dileep 25 Jan 2012 ???

INSERT ANOMALY

UPDATE ANOMALY

Professor Hired on Course

Anand 10 Jan 2012 Maths

Bala 15 Jan 2012 Physics

Chandra 20 Jan 2012 Chemistry

Dileep 25 Jan 2012 Biology

DELETE ANOMALY

Name Address PhoneAnand 10 Mount Rd, Chennai 98765 43210

Bala 15, Janpath, New Delhi 90123 45678

Chandra 20, Marine Dr, Mumbai 91234 56780

Chandra 20, Marine Dr, Mumbai 91234 56781

Page 3: Infovision Anand S _ no sql workshop

WHY NOW?

DATA VOLUME IS GROWING

SEMI-STRUCTURED DATA DISTRIBUTED ARCHITECTURE

161253

397

623

988

2006 2007 2008 2009 2010

DATA IS INCREASINGLY NETWORKED

Page 4: Infovision Anand S _ no sql workshop

How many programmers?

… who’ve programmed NoSQL DBs

A POLL

How many non-IT folks?

Page 5: Infovision Anand S _ no sql workshop

data is stored in

TABLES

Key-value stores

Document databases

Graph databases

Page 6: Infovision Anand S _ no sql workshop

C

A

P

Brewer’s CAPTheorem

Pick Two

Consistency PartitionTolerance

Availability

Page 7: Infovision Anand S _ no sql workshop

Key-value stores

Document databases

Graph databases

Columnar databases

Page 8: Infovision Anand S _ no sql workshop

KEY VALUE STORES DOCUMENT DATABASES

COLUMNAR DATABASES GRAPH DATABASES

RedisCassandraMemcacheVoldemortDynamoTokyo Cabinet

CouchDBMongoDBSimpleDBRiakTerrastoreLotus Domino

CassandraBigTableHypertableHbaseVerticaInfiniDB

Neo4jFlockDBGraphDBOrientDBInfiniteGraphAllegroGraph

Page 9: Infovision Anand S _ no sql workshop

KEY VALUE STORES DOCUMENT DATABASES

COLUMNAR DATABASES GRAPH DATABASES

RedisCassandraMemcacheVoldemortDynamoTokyo Cabinet

CouchDBMongoDBSimpleDBRiakTerrastoreLotus Domino

CassandraBigTableHypertableHbaseVerticaInfiniDB

Neo4jFlockDBGraphDBOrientDBInfiniteGraphAllegroGraph

Page 10: Infovision Anand S _ no sql workshop
Page 11: Infovision Anand S _ no sql workshop
Page 12: Infovision Anand S _ no sql workshop

The first time round, the mistakes were around scalability. I used a SQL “ORDER BY RAND()” statement to return the next page to review. I knew this was an inefficient operation, but I assumed that it wouldn’t matter since the button would only be clicked occasionally.

Something like 90% of our database load turned out to be caused by that one SQL statement, and it only got worse as we loaded more pages in to the system. This caused multiple site slow downs and crashes.

The second time round I turned to my new favourite in-memory data structure server, redis, and its SRANDMEMBER command (a feature I requested a while ago with this exact kind of project in mind). The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the SDIFFSTORE command) to create a set of unreviewedpages for the current assignment and then SRANDMEMBER to pick one of those pages.

Page 13: Infovision Anand S _ no sql workshop

CouchDB

Page 14: Infovision Anand S _ no sql workshop

[email protected] gramener.com s-anand.net @sanand0 on Twitter

+91 9741 552 552

Page 15: Infovision Anand S _ no sql workshop

EXERCISE: DESIGN THE SSLC MARKS DATABASE

Each student has an ID.There are totally 11 languages and 92 non-language subjects.

Students usually write 3 language and 3 non-language exams.

For example,

• (English, Hindi, Sanskrit), (Maths, Physics, Chemistry)• (Kannada, Urdu, Marathi), (Commerce, Accountancy, Economics)

You need to record their marks in all 6 subjects, and the total.

Page 16: Infovision Anand S _ no sql workshop

EXERCISE: DESIGN THE SSLC MARKS DATABASE

Common queries:

Who scored the highest in Maths?

Which subject had the highest fail %?

How many failed in 1 subject?

Some scenarios:

Access from multiple locations

Real-time marks updation

Guarantee of correctness