How to Select an Analytic DBMS
description
Transcript of How to Select an Analytic DBMS
![Page 1: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/1.jpg)
How to Select an Analytic DBMS
Overview, checklists, and tips
byCurt A. Monash, Ph.D.
President, Monash ResearchEditor, DBMS2
contact @monash.comhttp://www.monash.comhttp://www.DBMS2.com
![Page 2: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/2.jpg)
Curt Monash
Analyst since 1981, own firm since 1987 Covered DBMS since the pre-relational days Also analytics, search, etc.
Publicly available research Blogs, including DBMS2 (www.DBMS2.com -- the
source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com
User and vendor consulting
![Page 3: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/3.jpg)
Our agenda
Why are there such things as specialized analytic DBMS?
What are the major analytic DBMS product alternatives?
What are the most relevant differentiations among analytic DBMS users?
What’s the best process for selecting an analytic DBMS?
![Page 4: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/4.jpg)
Why are there specialized analytic DBMS?
General-purpose database managers are optimized for updating short rows …
… not for analytic query performance 10-100X price/performance differences
are not uncommon
At issue is the interplay between storage, processors, and RAM
![Page 5: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/5.jpg)
Moore’s Law, Kryder’s Law, and a huge exception
Growth factors:
Transistors/chip:
>100,000 since 1971 Disk density:
>100,000,000 since 1956 Disk speed:
12.5 since 1956
The disk speed barrier dominates everything!
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Compound Annual Growth Rate
Transistors/Chipssince 1971
Disk Density since 1956
Disk Speed since 1956
04/08/23 DRAFT!! THIRD TEST!!
![Page 6: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/6.jpg)
Software strategies to optimize analytic I/O
Minimize data returned Classic query optimization
Minimize index accesses Page size
Precalculate results Materialized views OLAP cubes
Return data sequentially Store data in columns Stash data in RAM
![Page 7: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/7.jpg)
Hardware strategies to optimize analytic I/O
Lots of RAM Parallel disk access!!! Lots of networking
Tuned MPP (Massively Parallel Processing) is ideal.
“Recommended configurations” are a mixed bag.
![Page 8: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/8.jpg)
Specialty hardware strategies
Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts
Appliances or recommended configurations
And there’s also SaaS.
![Page 9: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/9.jpg)
18 contenders (and there are more)
Aster Data Dataupia Exasol Greenplum HP Neoview IBM DB2 BCUs Infobright/MySQL Kickfire/MySQL Kognitio Microsoft Madison
Netezza Oracle Exadata Oracle w/o Exadata ParAccel SQL Server w/o
Madison Sybase IQ Teradata Vertica
![Page 10: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/10.jpg)
General areas of feature differentiation
Most influenced by architecture Query performance Update/load performance Alternate datatypes
Most influenced by product maturity Compatibilities Advanced analytics Manageability and availability Encryption and security
![Page 11: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/11.jpg)
Major analytic DBMS product groupings
Architecture is a good first categorization
Traditional OLTP Row-based MPP Columnar (Not covered tonight) MOLAP/array-based
![Page 12: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/12.jpg)
Traditional OLTP examples
Oracle (especially pre-Exadata) IBM DB2 (especially mainframe) Microsoft SQL Server (pre-Madison)
![Page 13: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/13.jpg)
Analytic optimizations for OLTP DBMS
Performance Two major kinds of precalculation
Star indexes Materialized views
Other specialized indexes Query optimization tools
Other OLAP extensions SQL 2003 Other embedded analytics
![Page 14: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/14.jpg)
Drawbacks
Complexity and people cost Hardware cost Software cost Absolute performance
![Page 15: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/15.jpg)
Legitimate use scenarios
When TCO isn’t an issue Undemanding performance (and therefore
administration too) When specialized features matter
OLTP-like Integrated MOLAP Edge-case analytics
Rigid enterprise standards Small enterprise/true single-instance
![Page 16: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/16.jpg)
Row-based MPP examples
Teradata DB2 (open systems version) Netezza Oracle Exadata (sort of) DATAllegro/Microsoft Madison Greenplum Aster Data Kognitio HP Neoview
![Page 17: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/17.jpg)
Typical design choices in row-based MPP
“Random” (hashed or round-robin) data distribution among nodes
Large block sizes Suitable for scans rather than random accesses
Limited indexing alternatives Or little optimization for using the full boat
Carefully balanced hardware High-end networking
![Page 18: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/18.jpg)
Tradeoffs among row MPP alternatives
Enterprise standards Vendor size Hardware lock-in Total system price Features
![Page 19: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/19.jpg)
Columnar DBMS examples
Sybase IQ Vertica InfoBright SAND ParAccel Kickfire Exasol MonetDB SAP BI Accelerator (sort of)
![Page 20: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/20.jpg)
Columnar pros and cons
Bulk retrieval is faster Pinpoint I/O is slower Compression is easier Memory-centric processing is easier MPP is not as crucial
Being columnar reduces I/O So does (better) compression
![Page 21: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/21.jpg)
Segmentation made (too) simple
One database to rule them all One analytic database to rule them all Frontline analytic database Very, very big analytic database Big analytic database handled very cost-
effectively
![Page 22: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/22.jpg)
Basics of systematic segmentation
Use cases Metrics Platform preferences
There isn’t just one checklist.
![Page 23: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/23.jpg)
Use cases – a first cut
Light reporting Diverse EDW Big Data Operational analytics
![Page 24: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/24.jpg)
Metrics – a first cut
Total raw/user data Below 1-2 TB, references abound 10 TB is another major breakpoint
Total concurrent users 5, 15, 50, or 500?
Data freshness Hours Minutes Seconds
![Page 25: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/25.jpg)
Basic platform issues
Enterprise standards Appliance-friendliness Need for MPP? Cloud/SaaS
![Page 26: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/26.jpg)
The selection process in a nutshell
Figure out what you’re trying to buy Make a shortlist Do free POCs* Evaluate and decide
*The only part that’s even slightly specific to the analytic DBMS category
![Page 27: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/27.jpg)
Figure out what you’re trying to buy
Inventory your use cases Current Known future Wish-list/dream-list future
Set constraints People and platforms Money
Establish target SLAs Must-haves Nice-to-haves
![Page 28: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/28.jpg)
Use-case checklist -- generalities
Database growth As time goes by … More detail New data sources
Users (human) Users/usage (automated) Freshness (data and query results)
![Page 29: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/29.jpg)
Use-case checklist – traditional BI
Reports Today Future
Dashboards and alerts Today Future Latency
Ad-hoc Users Now that we have great response time …
![Page 30: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/30.jpg)
Use-case checklist – predictive analytics
How much do you think it would improve results to Run more models? Model on more data? Add more variables? Increase model complexity?
Which of those can the DBMS help with anyway?
What about scoring? Real-time Other latency issues
![Page 31: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/31.jpg)
SLA realism
What kind of turnaround truly matters? Customer or customer-facing users Executive users Analyst users
How bad is downtime? Customer or customer-facing users Executive users Analyst users
![Page 32: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/32.jpg)
Short list constraints
Cash cost But purchases are heavily negotiated
Deployment effort Appliances can be good
Platform politics You might as well consider incumbent(s) Appliances can be frowned on
![Page 33: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/33.jpg)
Filling out the shortlist
Who matches your requirements in theory?
What kinds of evidence do you require? References?
How many? How relevant?
A careful POC? Analyst recommendations? General “buzz”?
![Page 34: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/34.jpg)
A checklist for shortlists
What’s your tolerance for specialized hardware? What’s your tolerance for set-up effort? What’s your tolerance for ongoing administration? What are your insert and update requirements? At what volumes will you run fairly simple
queries? What are your complex queries like? For which third-party tools do you need support?
and, most important,
Are you madly in love with your current DBMS?
![Page 35: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/35.jpg)
Proof-of-Concept basics
The better you match your use cases, the more reliable the POC is
Most of the effort is in the set-up You might as well do POCs for several
vendors – at (almost) the same time! Where is the POC being held?
![Page 36: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/36.jpg)
The three big POC challenges
Getting data Real?
Politics Privacy
Synthetic? Hybrid?
Picking queries And more?
Realistic simulation(s) Workload Platform Talent
![Page 37: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/37.jpg)
POC tips
Don’t underestimate requirements Don’t overestimate requirements Get SOME data ASAP Don’t leave the vendor in control Test what you’ll actually be buying Use the baseball bat
![Page 38: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/38.jpg)
Evaluate and decide
It all comes down to
Cost Speed Risk
and in some cases
Time to value Upside
![Page 39: How to Select an Analytic DBMS](https://reader033.fdocuments.net/reader033/viewer/2022061115/545c96f9b1af9f3c0a8b47d4/html5/thumbnails/39.jpg)
Further information
Curt A. Monash, Ph.D.President, Monash Research
Editor, DBMS2
contact @monash.comhttp://www.monash.comhttp://www.DBMS2.com