© 2016 MariaDB Corporation Ab 1
MariaDB MaxScale 2.0 andColumnStore 1.0
for the Boston MySQL Meetup Group
Jon Day, Solution Architect - MariaDB
MariaDB ColumnStoreCurrently in Alpha
Tonight’s Topics:
MariaDB MaxScale 2.0Currently in Beta
Company Overview
MariaDB Corporation
● Founded by Original MySQL team
Michael “Monty” Widenius and David Axmark
● Venture Capital - Intel Ventures
● Driving Innovation & Committed to Open Source
● Red Hat & Major Linux Distributions Standardized on
MariaDB
● Increasing OEM/MariaDB Embedded Software Solutions
MariaDB Highlights
The MySQL/MariaDB Timeline
6
20141995 2007 2009
MariaDB default in Red Hat & other Linux Distributions
2014
MariaDB 10.0 fork of MySQL 5.5
2013
SkySQL and MariaDB Merger
MaxScale 1.0 GA Release
2016
Sun Buys MySQL AB
Oracle Buys Sun
2015
MariaDBColumnStore
Product Overview
MariaDB MaxScale 2.0(Beta)
Application-to-Database
Insulates client applications from the
complexities of backend database
clusterDatabase-to-Database
Simplifies interoperability across
databases
MariaDB MaxScale
Secure Your Data
Scale for Growth
ManageabilityEnsure
Availability7
MariaDB MaxScale concept▪ An Intelligent Data Gateway (IDG)
▪ Decouple applications from database deployment environment
▪ Improve availability without adding application complexity
▪ Improve data security
▪ Handle scale-out issues
▪ Add flexibility without burdening every application▪ Enable data replication from OLTP databases to external data stores
▪ Improve database scalability
▪ Remote data disaster recovery
▪ Real-time data streaming to OLAP/DW and Big Data stores
▪ Copy data to other applications, QA databases
What is MariaDB MaxScale?
▪ A flexible data gateway for scalability, high availability, security,
interoperability and migration beyond MySQL and MariaDB
▪ Highly configurable gateway platform
▪ Database Aware
▪ Pluggable Architecture
MariaDB MaxScale – Database aware
▪ Understands the database environment
▪ Is aware of the state of the database components
▪ Understands the data that flows through it
▪ Routes requests based on a combination of
▪ Defined algorithms
▪ Component state
▪ Request contents
▪ Session state
MariaDB MaxScale - Core
▪ Provides core services for
▪ Configuration
▪ Networking
▪ Scheduling
▪ Query classification
▪ Logging
▪ Buffer management
▪ Plugin loading
▪ Request flow
▪ Designed to make plugins easy to write
MariaDB MaxScale – Pluggable architecture
▪ Generic Core
▪ Flexible, easy to write plugins for
▪ Protocol support
▪ Database monitoring
▪ Query Transformation and Logging
▪ Load balancing and Routing
▪ Authentication
MariaDB MaxScale – Flow of requests
Protocol
Protocol
FilterFilter Router
Monitor
Router Protocol
Client Application
MariaDB MaxScale Use Cases
MariaDB MaxScale – Classic Load Balancing
▪ Connection based routing
▪ Low overhead
▪ Balances a set of connections over a set of servers
▪ Uses monitoring feedback to identify master and slaves
▪ Connection weighting if configured
▪ Load balances queries in round robin across configured servers
MariaDB MaxScale – Read Load distribution
▪ Can be used for master-slave replication or master-master
replication environments
▪ Two approaches possible
▪ Either using connection routing with separate read
connections
▪ Or statement routing, classify the statements to read, write
or session modification
▪ Monitor identifies the master and slave nodes
● Simplifies applications● Cluster configuration, Node failures
transparent to applications
MariaDB MaxScale – Schema Sharding
▪ Multi-tenant database hosting
▪ Each tenant with its own schema
▪ Multiple schema per shard
▪ Schema Sharding Router
▪ All applications connect to single MaxScale
▪ MaxScale routes the shard server on based on query from client
▪ No impact on existing client application
▪ New client or shard server is added
MaxScale
Shard 1 Shard 2 Shard 3 Shard 4 Shard 5
Sharding Router
● Scale the database environment as user base and data volume grows
● Without impacting existing user base
MaxScale
Dat
aba
se
Fire
wal
l Filt
er
SELECT * FROM CUSTOMER
WHERE id = 5; SELECT * FROM CUSTOMERS;
Query failed: 1141
Error: Required WHERE/HAVING clause is missing
● rule safe_select deny no_where_clause on_queries select
● rule safe_customer_select deny regex '.*from.*customers.*'
MariaDB MaxScale – Database firewall
▪ Block queries that match a set of rules
▪ Block queries matching rules for specified users
▪ Multiple ordered rules
▪ Match on and block queries with certain patterns
▪ Date or time
▪ WHERE clause
▪ Wildcard or regular expression
19
MaxScale 2.0 New Capabilities
▪
▪
20
Data Streaming
Slaves
Binlog, Avro, JSON
Master
MaxScale
Binary log events
Avro or JSON events
Provide real time transactional data to data lake environment for machine learning or real-time analytics.
▪ Capture change data in the binary log events and replicate the events
○ from MariaDB to Kafka producer in real-time
○ from Master to slave to offload the replication load from master
Data Warehouse
Slaves
Binlog, Avro, JSON
21
MaxScale 2.0 New Capabilities
Better Security
● Transport layer security with end-to-end SSL through MaxScale
● MaxAdmin security improvements to enable configurable
prevention of remote access
● Connection rate limitation feature to protect against DDoS attacks
22
MaxScale 2.0 New Capabilities
High Availability
● Minimize downtime with read mode for
MariaDB/MySQL master-slave clusters
23
High Availability
Ensure High Availability with no single point of failure
Ensure database uptime
▪ Automatic failover
▪ No impact on read transaction when master fails
Minimize database downtime
▪ Database upgrade without impacting user experience
Master
master_down event
Failover Script
CHANGE MASTER to new_master; START SLAVE
Slaves
STOP SLAVE
Promote as master
binlog cache
1
2
4
3
4
Product Overview
MariaDB ColumnStore(Alpha)
MariaDB ColumnStore
SQL
Price to Performance at Scale
Data Analytics using SQL or SPARK
Unified Simplicity
26
Massively parallel, distributed data engine for powerful analytics on big data
• Scaling to petabytes of data
• Read performance scales linearly with data growth
• Built-in high availability at access and data layers
• Exceptional performance
• Transactional and Analytics processing under the same roof
• Encryption for data in motion, role based access and audit features
• Simplified installation, management maintenance, and scaling
• Open-source GPL2
• Same interface as MariaDB
• Attaches to wide range of BI tools
• Real-time response to analytics queries and high speed data loading
27
Brief History
● Created by Calpont originally as InfiniDB
○ no longer in business
● GPL License several years ago
● Calpont was a MariaDB partner, we brought developers,
managers and support staff onboard
● Older MySQL 5.1 front end
MariaDB ColumnStore Architecture
▪ User Module : Processes SQL Requests
▪ Performance Module : Multi Threaded Distributed Processing Engine
Columnar Distributed Data Storage
MariaDB SQL Front End
Distributed Query Engine
User Modules
Performance Module 1 ... Performance
Module NPerformance Module 2
Performance Module 3
Clients
User Connections
28Local Disks, SAN, EBS, GlusterFS, HDFS
MariaDB ColumnStore 1.0 Performance • Columnar Storage, multi-threaded and Massively Parallel distributed execution engine
High Availability • Built in redundancy and high availability
Scale • Linear scalability
Analytics • In database analytics with Complex and Cross Engine JOINs• Windowing functions and UDFs• Out of box BI Tools connectivity, • Analytics integration with R
Ease of Use • ANSI SQL compatible • ACID compliant• No indexes, No materialized views• No manual partitioning
Data Ingestion • High speed parallel data load and extract• Create Table as Select, Like -- locally, cross database joins, or over ODBC
Security • SSL support, Audit Plugin, Authentication Plugin, Role Based Access
Deployment Options • On premise, AWS• Supports local disks, SAN, HDFS, GlusterFS 29
© 2016 MariaDB Corp
Client Access
• ODBC/JDBC
• MariaDB/MySQL Connectors
• BI tools
Row-Oriented vs Column-OrientedRow-oriented: rows stored sequentially in a file
Column-oriented: each column is stored in a separate fileEach column for a given row is at the same offset.
Key Fname Lname State Zip Phone Age Sales1 Bugs Bunny NJ 11217 (123) 938-3235 34 1002 Yosemite Sam CT 95389 (234) 375-6572 52 5003 Daffy Duck IA 10013 (345) 227-1810 35 2004 Elmer Fudd CT 04578 (456) 882-7323 43 105 Witch Hazel CT 01970 (567) 744-0991 57 250
Key12345
FnameBugsYosemiteDaffyElmerWitch
LnameBunnySamDuckFuddHazel
StateNJCTIACTCT
Zip1121795389100130457801970
Phone(123) 938-3235(234) 375-6572(345) 227-1810(456) 882-7323(567) 744-0991
Age3452354357
Sales10050020010250
Data Storage ● Vertical Partitioning by Column
○ Each column in its own column file
○ Only do I/O for columns requested
Logical Layer Physical Layer
Table
Column1 ColumnN
Extent 1(8MB~64MB
8 million rows)
Extent N (8MB~64MB
8 million rows)
SegmentFile1 (Extent)
SegmentFileN (Extent)
Server
DB Root
Blocks (8KB)
● Horizontal Partitioning by range of rows○ Logical grouping of 8 million rows of each
column file
○ In-memory mapping of extent to physical
layer
Data Storage - Extents and PMs
Extent 1 Extent 2
Extent 3 Extent 4
Extent 5 Extent 6
Extent 7 Extent 8
PM 1 PM 2
Extent 1 Extent 2 Extent 3 Extent 4
Extent 5 Extent 6 Extent 7 Extent 8
PM 1 PM 2 PM 4PM 3
● Extent Map○ In memory meta-data of an extent’s min, max value for a column, extent’s physical block offset and PM on which
the extent resides
Data Storage - Local Disks
● Each PM nodes stores data on local disk
● No PM node can access the data on another PM node
● Shared Nothing
● No data redundancy
Data Storage - SAN
● Each PM node is attached to a set of volumes on SAN -
called DBRoots
● Upon failure of PM node, another PM attaches to the
failed PM’s DBRoots
● Shared nothing during running state
● No data redundancy
Data Storage - GlusterFS
● Distributed file system
● Software based storage system○ GlusterFS runs on every PM node
○ Creates distributed file system with each PM node’s local
disks and network interface across PM nodes
● Data redundancy across multiple nodes
● Automatic data failover
● Data availability during failover and failback
Data Storage - EBS
● Dynamic scaling to handle variable workloads
● Data layer high availability with Elastic Block Store
(EBS)
Data Ingestion
● Bulk data load ○ cpimport : CSV and Binary○ LOAD DATA INFILE: CSV
● Apache Sqoop Integration: ○ Integration with cpimport and sql interface
● Future Release○ Data Streaming from MariaDB/MySQL database to MariaDB
ColumnStore cluster ■ via Kafka ■ Avro data record
Data Ingestion - cpimport
● Fastest way to load data
○ Load data from CSV file
○ Load data from Standard Input
○ Load data from Binary Source file
● Multiple tables in can be loaded in parallel by launching multiple jobs
● Read queries continue without being blocked
● Successful cpimport is auto-committed
● In case of errors, entire load is rolled back
Data Ingestion - LOAD DATA INFILE
● Traditional way of importing data into any MariaDB storage engine table
● Up to 2 times slower than cpimport for large size imports
● Either success or error operation can be rolled back
MariaDB ColumnStore on Hadoop
• Native scoop integration
• Runs on existing Apache Hadoop hardware
• SQL access to Apache Hadoop data
• libhdfs integration
Map ReduceHBase MariaDB ColumnStore
Hadoop Distributed File System
Pig/Hive
Batch Processing High Performance analytics
MariaDB ColumnStore on AWS
• Automated cluster installation on AWS
• Dynamic scaling to handle variable workloads
• Data layer high availability with Elastic Block Store (EBS)
Internal Only
Use Case: Scaling Big Data Analytics
● An organization is generating large amount of
operational data
● Multiple terabytes of historical data
● With growth in business and in operational data
○ Analytics query performance degrades
○ Impractical to do analytics
● Put past data into MariaDB ColumnStore
● As data grows
● Perform analytics without performance degradation
● Linear Scalability with data growth
Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,00010-100GB 100-1000GB 1-10TB 10-100TB...PB
MariaDB Enterprise OLTP MariaDB ColumnStore OLAP
Business Challenge MariaDB ColumnStore OLAP Solution
1 2 3
MariaDB ColumnStore 1.0
Add new node(s)
43● Harvest new value from large historical datasets by deriving new insights● Support growth in your business, while continue to deliver high service levels for data analytics
Sizing
● Minimum Spec: ○ UM: 2 GHz, 4 core, 32 G RAM○ PM: 2 GHz, 4 core, 16 G RAM
● Typical Server spec○ UM: 8 core, 64G to 256G RAM○ PM: 8 core 64G RAM
● Data storage:○ External data volumes:
■ Maximum 2 data volume per IO channel per PM node server ■ up to 2TB on the disk per data volume ≈ Max 4 TB per PM node
○ Local disk ■ Up to 2TB on the disk per PM node server
44
Sizing - Example
● Initial DB 60TB uncompressed data = 6TB compressed data at 10x compression
● 2UM - 8 core, 128G to 256G (based on workload) ● PM: 8 core 64G RAM● 6 TB compressed = 3 data volume (at 2TB per volume)
○ with 1 data volume per PM node - 3PMs● Data growth - 2TB per month, Data retention - 2 years
○ Plan for 2TB X24 = 48 TB additional○ 48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)
■ with 1 data volume per PM node - 3 additional PMs● Total 6 PMs, 2 UMs 45
46Social Media
MariaDB MaxScaleMariaDB ColumnStore
Node 1 Node 2 Node 3 Node N...
Connectors,SPARK Integration etc
Descriptive AnalyticsWhat is Happening?
Diagnostic AnalyticsWhy did it Happen?
Predictive AnalyticsWhat is likely to happen?
Transactional, Operational
Sensors
Biometrics
Mobile
ETL Tools
Data Collection
MariaDB Solution for Big Data Analytics
Analytics Insight
UMUM
PM PM PM PM
Data Processing
MariaDB ColumnStore .
High performance data management solution for big data analytics
Prescriptive AnalyticsWhat should I do about it?
Top Related