Not only SQL - Database Choices
-
Upload
lynn-langit -
Category
Technology
-
view
120 -
download
2
description
Transcript of Not only SQL - Database Choices
![Page 1: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/1.jpg)
Database ChoicesLynn Langit
Jan 2014 – Startup Code Camp in the OC
![Page 2: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/2.jpg)
Data Expertise / Lynn Langit
• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB
• Practicing Architect• Technical author / trainer
– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server 2012 Series – 2 books on SQL Server BI– Cloudera trainer (certified)
• Former MSFT FTE– 4 years
![Page 3: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/3.jpg)
Databases Now a Menu of
Choices
![Page 4: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/4.jpg)
Data Pipeline
Clean Existing
Acquire New
Process All
Store Some
Query & Mine
![Page 5: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/5.jpg)
Is Big Data = NoSQL and just Hadoop?
HUGE Hype factor since 2011
Apache Hadoop • a software framework that supports data-intensive
distributed applications • under a free license enables applications to work with thousands of
nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS)
papers
![Page 6: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/6.jpg)
Hadoop in the Enterprise
![Page 7: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/7.jpg)
How you ‘get’ Hadoop
• roll your own
Open source
• Cloudera• MapR• Hortonworks• More…
Commercial distribution
• AWS• HDInsight
Rent it via the cloud
![Page 8: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/8.jpg)
Demo – AWS MapReduce
![Page 9: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/9.jpg)
Working with Hadoop
![Page 10: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/10.jpg)
About Hadoop MapReduce
Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
![Page 11: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/11.jpg)
The Hadoop on premises
Market LeaderIs
Cloudera
![Page 12: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/12.jpg)
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and greater
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
![Page 13: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/13.jpg)
“Small” BigData vs. “Big” BigData
Hadoop
NoSQL
RDBMS
Hadoop
NoSQL
RDBMS
On Premises In the Cloud
![Page 14: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/14.jpg)
But wait…
is there a relational database
that scalesthat is cheap
that runs in the cloud?
![Page 15: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/15.jpg)
DEMO - AWS Redshift• About $1k per Terabyte per year - relational
![Page 16: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/16.jpg)
Cloud-hosted NoSQL up to 50x CHEAPER
![Page 17: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/17.jpg)
So many NoSQL options
• More than just the Elephant in the room• Over 150+ types of NoSQL databases
![Page 18: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/18.jpg)
Flavors of NoSQLKey/ValueVolatile
Key/valuePersistent
Wide-Column Document Graph
![Page 19: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/19.jpg)
Key / Value Database• Just keys and values
– No schema• Persistent or Volatile• Examples
– AWS Dynamo DB– Riak
![Page 20: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/20.jpg)
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
![Page 21: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/21.jpg)
File (BLOB) Storage Buckets in the Cloud
• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS
![Page 22: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/22.jpg)
DEMO - Battle of the Buckets
• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 (Archiving) in to AWS Glacier
![Page 23: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/23.jpg)
Column Database
• Wide, sparse column sets• Schema-light
• Examples:– HBase w/Hadoop– Google Cloud Datastore– SQL Server Columnstore Indexes or SSAS Tabular Models
![Page 24: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/24.jpg)
Types of Column Databases• Column-families
– Non-relational– Sparse– Examples:
• HBase• Cassandra• xVelocity (SQL 2012 Tabular)
• Column-stores– Relational– Dense– Example:
• SQL Server 2012 – Columnstore index
![Page 25: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/25.jpg)
DEMO – Google Cloud Datastore
![Page 26: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/26.jpg)
DEMO – SQL Server ‘NoSQL’
• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)
![Page 27: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/27.jpg)
Document Database (Mongo DB)• document-oriented (collection of
JSON documents) w/semi structured data– Encodings include BSON, JSON, XML…
• binary forms – PDF, Microsoft Office documents --
Word, Excel…)
• Examples:– MongoDB– Couchbase
![Page 28: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/28.jpg)
Demo - Mongo DB
![Page 29: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/29.jpg)
Graph Databases
• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly
finding connections, patterns and relationships between the objects within lots of data
• Examples:– Neo4J– Google Freebase
![Page 30: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/30.jpg)
DEMO – Neo4J
![Page 31: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/31.jpg)
“Small” BigData vs. “Big” BigData
Hadoop
Key/Value or Column
Document or Graph
RDBMS
On Premise or In the Cloud
![Page 32: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/32.jpg)
Cloud-hosted RDBMS
• AWS RDS – SQL Server, mySQL, Oracle– Medium cost– Solid feature set, i.e.
backup, snapshot– Use existing tooling
• Google – mySQL– Lowest cost– Most limited RDBMS
functionality• Microsoft – SQLAzure
– Highest cost
![Page 33: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/33.jpg)
DEMO - AWS RDS
• SQL Server, MySQL or Oracle• Essential to understand pricing models
![Page 34: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/34.jpg)
Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
![Page 35: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/35.jpg)
NoSQL Applied
Soci
al G
ames
Prod
uct C
atal
ogs
Soci
al a
ggre
gato
rs
Log
File
s
Line
-of-B
usin
ess
ColumnstoreHBase
Key/ValueDynamoDB
DocumentMongoDB
GraphNeo4j
RDBMSSQL Server
![Page 36: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/36.jpg)
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables
Streaming ML or (Mahout)
Custom EC2 Prospective Search &Prediction API
StreamInsight
NoSQL Document or Graph
MongoDB on EC2 Freebase MongoDB on Windows Azure
NoSQL – ColumnHadoop (HBase)
Elastic MapReduce using S3 & EC2
none HDInsight
Dremel/Warehousing
RedShift BigQuery none
![Page 37: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/37.jpg)
But wait…how do I queryNoSQL data?
![Page 38: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/38.jpg)
Alw
ays
Map
Redu
ce?
![Page 39: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/39.jpg)
Can Excel help?
Connector to Hadoop Data Explorer Data Quality
Services
Master Data Services
Integration with Azure
Data Market
Visualize with PowerView
Data Mining w/Predixion
![Page 40: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/40.jpg)
Demo - Hadoop Connector to Excel
![Page 41: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/41.jpg)
Other types of cloud data services
Hosting public datasets• Pay to read• Earn revenue by offering for
read
Cleaning / matching (your) data • ETL – Microsoft Data
Explorer, Google Refine• Data Quality – Windows
Azure Data Market, InfoChimps, DataMarket.com
![Page 42: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/42.jpg)
Collecting for “BigData”• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data
Standards• M2M• Public Datasets
– Freebase– Azure DataMarket– Hillary Mason’s list
42
![Page 43: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/43.jpg)
NoSQL To-Do List
Understand types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments
Learn NoSQL access technologies & services• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…• Windows Azure Data Market, other public data markets
![Page 44: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/44.jpg)
www.TeachingKidsProgramming.org• Free Courseware (Java, Small Basic or C# [on Pluralsight])• Do a Recipe Teach a Kid (Ages 10 ++)
• recipes)
![Page 45: Not only SQL - Database Choices](https://reader033.fdocuments.net/reader033/viewer/2022061223/54c650654a79591a6d8b461b/html5/thumbnails/45.jpg)
Keep Learning• Twitter: @LynnLangit• YouTube:
http://www.youtube.com/user/SoCalDevGal
• Hire me– To help build your BI/Big Data solution– To teach your team next gen BI– To learn more about using NoSQL
solutions