Hauenstein Real Life Performance Database
-
Upload
raneskumar -
Category
Documents
-
view
226 -
download
3
description
Transcript of Hauenstein Real Life Performance Database
-
D1 Solutions AGa Netcetera Company
Real Life Performance of In-Memory Database Systems for BI10th European TDWI Conference
Munich, June 2010
-
10th European TDWI Conference
Munich, June 2010
Authors:
Dr. Andreas Hauenstein Dr. Simon Hefti Dr. Andrej Vckovski
-
In-Memory Database Systems
Buzzwords: Column-Orientation, In-Memory, Shared Nothing
Meaning: Looks like Oracle/DB2/SQLServer from the outside,just much faster
We are talking about relational systems, queryable in SQL
We are not talking about client side caching (Microstrategy or QlikView do this)
There is a new generation of DB systems, for example MonetDB, Exasol, Greenplum, LucidDB
-
Business Intelligence Data Warehouse
We are not looking at transactional systems
Any DB of an online shop or any DB driving a web site is transactional
Typically BI applications are driven by a non-transactional data store that is bulk loaded in intervals by an ETL process. This is called a data warehouse.
Next generation DB systems also exist for transactional systems. An example is Oracle TimesTen. This is a different subject.
General Purpose DB Systems (e.g. Oracle, SQL Server)
DB Systems Specialized for Analytics(e.g. Teradata)
DB Systems Spezialized for Transactions(e.g. TimesTen)
-
Business Intelligence Generated SQL
Tools with a GUI that generate SQL statements
Examples: Business Objects, OBIEE, Microstrategy, Cognos
No SQL tuning possible
Bad SQL
Non-technical users
Frequently changing queries
Lots of averages and sums, groupings, consolidation
-
Real Life Problem (1)
Consolidation of numbers along a hierarchy
Use a Parent-Child Table with a bridge table to do this in a relational DB
-
Real Life Problem (2)
Every company has this sort of problem
The most important people (CEO) experience the worst performance
OLAP tools exist because this sort of query is traditionally slow on relational systems
At a customer, 6 GB of data resulted in a 20 minute waitfor the CEO
Even Pre-Calculating all reports over night became difficult
-
The Data Model
levels12
leaves4096
nodes8191
Bridge Table
400 K Rows 300 K Rows
500 K Rows
-
Size of the Data
34438011875DIM_PRODUCT50111DIM_TIME
300177DIM_TRANS815DIM_UNIT
18110DIM_BUSINESSTYPE45339229819DIM_CLIENT
53248118DIM_ORG_FLAT
16019518723739T_FACTS
816DIM_MEASURE
5320679'780DIM_ACCOUNTING
8916123DIM _ORG
17415366775561
RowsBlocks
Quite small data volume
Bad performance on several platforms
Realistic scenario
775561 blocks * 8192 Bytes = 6 GB
-
Data Generation
One function call creates complete dimension table dim_org
Generates id column, parent pointer, bridge table dim_org_flat
Generated from a helper table with just integers and random numbers
Similar function to generate fact table
Started out as PL/SQL, now a Perl script that works with any DB
It is easy to model any scenario with this tool
create_dim( p_bf => 2, p_depth => 12, p_name => 'org', p_cols => 'org01,org02,org03,org04,org05,org06,org07,org08,org09,org10', p_types => 't10,t10,t10,t10,t10,t10,t10,t10,t10,t10);
-
The Test Query
Generated by BI tool
-
Initial Tests on Oracle and SQL Server
All the same order of magnitude
Adding RAM does not help a traditional DB
PCs are better than you think
Aggregated Fact Rows
Home PC159 sec205 sec1023 secOracle 10 GWindows 2003 ServerDell Dimension E521 4GB RAM
293 sec699 sec741 secMS SQL Server2005
Windows 2003 ServerDell Dimension E521 4 GB RAM
Expensive ProductionServer
167 sec168 sec1200 secOracle 10GAIXIBM 9117-570 8 GB RAM 1.9 GHt 4 CPUs
Linux with little RAM386 sec413 sec1432 secOracle 10 GRed Hat LinuxHP DL 380 Proliant Server 0.5 GB RAM Intel Xeon 3.2 GHz
16 Mio 1 Mio 3500OS DescriptionDBMSMachine
-
A New Generation DB System
Im memory DB factor 30-50 faster
Thats the speed of sound relative to a bicycle
With generic Intel hardware
Worth looking at several of these new systems
Aggregated Fact Rows
Home PC159 sec205 sec1023 secOracle 10 GWindows 2003 ServerDell Dimension E521 4GB RAM
293 sec699 sec741 secMS SQL Server2005
Windows 2003 ServerDell Dimension E521 4 GB RAM
Expensive Production Server167 sec168 sec1200 secOracle 10GAIXIBM 9117-570 8 GB RAM 1.9 GHt 4 CPUs
Linux with little RAM386 sec413 sec1432 secOracle 10 GRed Hat LinuxHP DL 380 Proliant Server 0.5 GB RAM Intel Xeon 3.2 GHz
In Memory DB0 sec2 sec22 secExasolExacluster
(Linux Microkernel)
Exasol Test System 2 Quad Core Intel CPU 32 GB RAM 2 nodes
16 Mio 1 Mio 3500OS DescriptionDBMSMachine
-
A New Generation DB System
Im memory DB factor 30-50 faster
Thats the speed of sound relative to a bicycle
With generic Intel hardware
Worth looking at several of these new systems
0
200
400
600
800
1000
1200
1400
1600
DD SQL DD CRA HP IBM Exa
-
The Contenders
Oracle 11 G
MySQL
MonetDB
LucidDB
Greenplum (their own hardware)
Exasol (their own hardware)
-
The Test Server
Intel Dual Xeon E 5205
16 GB RAM
2 x 250 GB SATA Disk
64 Bit Debian Linux
-
Interesting DB Systems That Were Not Tested
Teradata
Oracle ExaData
Netezza
Vertica
Infobright
Kognitio
The field is very active and new products and approaches keep entering the market.
-
MonetDB
Origin: Result of research at CWI in the Netherlands
Open Source: Yes
Free of Charge: Yes
Remarks:o Recent publicity through a paper in Communications of the ACM:
Breaking the Memory Wall in MonetDBo Constantly changing as research progresseso Easy to get into direct contact with the developers
Quote from the website:MonetDB is a open-source database system for high-performance Applications in data mining, OLAP, GIS, XMLQuery, text and multimediaretrieval.
-
LucidDB
Origin: Formerly part of LucidEra in San Mateo, California
Open Source: Yes
Free of Charge: Yes
Remarks:o Emphasizes ease of configuration and maintenance o Mostly written in Java
Quote from the website:LucidDB is the first and only open-source RDBMS purpose-built entirely fordata warehousing and business intelligence. It is based on architecturalcornerstones such as column-store, bitmap indexing, hash join/aggregation,and page-level multiversioning.
-
Greenplum
Origin: Located in San Mateo, California. Postgres based.
Open Source: Based on Open Source Technology
Free of Charge: No
Remarks:o Based on similiar hardware architecture as Exasolo Highly configurable and tunable, lots of featureso Column store is an option, default is row store
Quote from the website:Greenplum Database utilizes a shared-nothing MPP (massively parallel processing) architecture that has been designed from the ground up for BI and analytical processing using commodity hardware. In this architecture, data is automatically partitioned across multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall data.All communication is via a network interconnect -- there is no disk-levelsharing or contention to be concerned with (i.e. it is a 'shared-nothingarchitecture).
-
Exasol
Origin: Developed from scratch in Nrnberg, Germany
Open Source: No
Free of Charge: No
Remarks:o Based on similiar hardware architecture as Greenplumo Pure column store DBo Emphasizes ease of administrationo No need to create indexes or gather statisticso Imitates some Oracle-isms for compatibility
Quote from the website:The database has been specially developed for analysis and is being used successfully for data warehousing, Web analytics, data mining applications and more. In contrast with universal databases, this specialization means that the data to be analyzed can be made available to analysis tools virtually in real time.
-
Typical Shared Nothing Node
Combine many of these, connected by GB Ethernet
-
Results With 16 Mio Rows in the Fact Table
Oracle on a new 64 Bit box is 4 times faster than on an average 32 bit box
Both Oracle and LucidDB were twice as fast after dropping all indexes on the fact table (those are the times in the chart)
We did not manage to tune MySQL to get acceptable performance for a free system, LucidDB has good performance and little hassle
MonetDB needed a fix in the optimizer before coping with the query
Next generation in memory DBs are at least one order of magnitude faster
226
2280
460
31 13 100
500
1000
1500
2000
2500
Oracle MySQL LucidDB MonetDB Greenplum Exasol
-
183
364
105
210
3
54
97
13
133
288
26
60
50
100
150
200
250
300
350
400
16 160 320
Exasol [sec](public demo system)
Exasol [sec](untuned comparablehardware) Exasol [sec](local dimensionscomparable hardware ) Greenplum[sec]
Performance Scaling
Both systems scale linearly It is possible to query at least ten times the data
volume efficiently The vendors claim unlimited linear scaling by adding
commodity hardware
-
Conclusion
Big Lessons Database technology is in upheaval at the moment
By adopting the new technologies, you can totally revolutionize the way you access your data
Prices will fall rapidly. This is like the PC revolution.
Small Lessons If you have an Oracle on a 32 Bit system, move to a 64 Bit architecture. It will give
you a factor 4 without any pain
If your table scans are slow, drop all indexes
If you move to a new technology, you will get a factor 50
The commercial systems are worth their money. Their SQL is more compatible, and they are more stable