Download - for the Boston MySQL ColumnStore 1.0 Meetup Group MariaDB …files.meetup.com/212864/Boston MySQL Meetup 9-12-2016.pdf · 2016-09-13 · Real-time data streaming to OLAP/DW and Big

© 2016 MariaDB Corporation Ab 1

MariaDB MaxScale 2.0 andColumnStore 1.0

for the Boston MySQL Meetup Group

Jon Day, Solution Architect - MariaDB

MariaDB ColumnStoreCurrently in Alpha

Tonight’s Topics:

MariaDB MaxScale 2.0Currently in Beta

Company Overview

MariaDB Corporation

● Founded by Original MySQL team

Michael “Monty” Widenius and David Axmark

● Venture Capital - Intel Ventures

● Driving Innovation & Committed to Open Source

● Red Hat & Major Linux Distributions Standardized on

MariaDB

● Increasing OEM/MariaDB Embedded Software Solutions

MariaDB Highlights

The MySQL/MariaDB Timeline

6

20141995 2007 2009

MariaDB default in Red Hat & other Linux Distributions

2014

MariaDB 10.0 fork of MySQL 5.5

2013

SkySQL and MariaDB Merger

MaxScale 1.0 GA Release

2016

Sun Buys MySQL AB

Oracle Buys Sun

2015

MariaDBColumnStore

Product Overview

MariaDB MaxScale 2.0(Beta)

Application-to-Database

Insulates client applications from the

complexities of backend database

clusterDatabase-to-Database

Simplifies interoperability across

databases

MariaDB MaxScale

Secure Your Data

Scale for Growth

ManageabilityEnsure

Availability7

MariaDB MaxScale concept▪ An Intelligent Data Gateway (IDG)

▪ Decouple applications from database deployment environment

▪ Improve availability without adding application complexity

▪ Improve data security

▪ Handle scale-out issues

▪ Add flexibility without burdening every application▪ Enable data replication from OLTP databases to external data stores

▪ Improve database scalability

▪ Remote data disaster recovery

▪ Real-time data streaming to OLAP/DW and Big Data stores

▪ Copy data to other applications, QA databases

What is MariaDB MaxScale?

▪ A flexible data gateway for scalability, high availability, security,

interoperability and migration beyond MySQL and MariaDB

▪ Highly configurable gateway platform

▪ Database Aware

▪ Pluggable Architecture

MariaDB MaxScale – Database aware

▪ Understands the database environment

▪ Is aware of the state of the database components

▪ Understands the data that flows through it

▪ Routes requests based on a combination of

▪ Defined algorithms

▪ Component state

▪ Request contents

▪ Session state

MariaDB MaxScale - Core

▪ Provides core services for

▪ Configuration

▪ Networking

▪ Scheduling

▪ Query classification

▪ Logging

▪ Buffer management

▪ Plugin loading

▪ Request flow

▪ Designed to make plugins easy to write

MariaDB MaxScale – Pluggable architecture

▪ Generic Core

▪ Flexible, easy to write plugins for

▪ Protocol support

▪ Database monitoring

▪ Query Transformation and Logging

▪ Load balancing and Routing

▪ Authentication

MariaDB MaxScale – Flow of requests

Protocol

Protocol

FilterFilter Router

Monitor

Router Protocol

Client Application

MariaDB MaxScale Use Cases

MariaDB MaxScale – Classic Load Balancing

▪ Connection based routing

▪ Low overhead

▪ Balances a set of connections over a set of servers

▪ Uses monitoring feedback to identify master and slaves

▪ Connection weighting if configured

▪ Load balances queries in round robin across configured servers

MariaDB MaxScale – Read Load distribution

▪ Can be used for master-slave replication or master-master

replication environments

▪ Two approaches possible

▪ Either using connection routing with separate read

connections

▪ Or statement routing, classify the statements to read, write

or session modification

▪ Monitor identifies the master and slave nodes

● Simplifies applications● Cluster configuration, Node failures

transparent to applications

MariaDB MaxScale – Schema Sharding

▪ Multi-tenant database hosting

▪ Each tenant with its own schema

▪ Multiple schema per shard

▪ Schema Sharding Router

▪ All applications connect to single MaxScale

▪ MaxScale routes the shard server on based on query from client

▪ No impact on existing client application

▪ New client or shard server is added

MaxScale

Shard 1 Shard 2 Shard 3 Shard 4 Shard 5

Sharding Router

● Scale the database environment as user base and data volume grows

● Without impacting existing user base

MaxScale

Dat

aba

se

Fire

wal

l Filt

er

SELECT * FROM CUSTOMER

WHERE id = 5; SELECT * FROM CUSTOMERS;

Query failed: 1141

Error: Required WHERE/HAVING clause is missing

● rule safe_select deny no_where_clause on_queries select

● rule safe_customer_select deny regex '.*from.*customers.*'

MariaDB MaxScale – Database firewall

▪ Block queries that match a set of rules

▪ Block queries matching rules for specified users

▪ Multiple ordered rules

▪ Match on and block queries with certain patterns

▪ Date or time

▪ WHERE clause

▪ Wildcard or regular expression

19

MaxScale 2.0 New Capabilities

▪

▪

20

Data Streaming

Slaves

Binlog, Avro, JSON

Master

MaxScale

Binary log events

Avro or JSON events

Provide real time transactional data to data lake environment for machine learning or real-time analytics.

▪ Capture change data in the binary log events and replicate the events

○ from MariaDB to Kafka producer in real-time

○ from Master to slave to offload the replication load from master

Data Warehouse

Slaves

Binlog, Avro, JSON

21


Better Security

● Transport layer security with end-to-end SSL through MaxScale

● MaxAdmin security improvements to enable configurable

prevention of remote access

● Connection rate limitation feature to protect against DDoS attacks

22


High Availability

● Minimize downtime with read mode for

MariaDB/MySQL master-slave clusters

23

High Availability

Ensure High Availability with no single point of failure

Ensure database uptime

▪ Automatic failover

▪ No impact on read transaction when master fails

Minimize database downtime

▪ Database upgrade without impacting user experience

Master

master_down event

Failover Script

CHANGE MASTER to new_master; START SLAVE

Slaves

STOP SLAVE

Promote as master

binlog cache

1

2

4

3

4

24

Business Source License 1.0

https://mariadb.com/products/mariadb-enterprise

Product Overview

MariaDB ColumnStore(Alpha)

MariaDB ColumnStore

SQL

Price to Performance at Scale

Data Analytics using SQL or SPARK

Unified Simplicity

26

Massively parallel, distributed data engine for powerful analytics on big data

• Scaling to petabytes of data

• Read performance scales linearly with data growth

• Built-in high availability at access and data layers

• Exceptional performance

• Transactional and Analytics processing under the same roof

• Encryption for data in motion, role based access and audit features

• Simplified installation, management maintenance, and scaling

• Open-source GPL2

• Same interface as MariaDB

• Attaches to wide range of BI tools

• Real-time response to analytics queries and high speed data loading

27

Brief History

● Created by Calpont originally as InfiniDB

○ no longer in business

● GPL License several years ago

● Calpont was a MariaDB partner, we brought developers,

managers and support staff onboard

● Older MySQL 5.1 front end

MariaDB ColumnStore Architecture

▪ User Module : Processes SQL Requests

▪ Performance Module : Multi Threaded Distributed Processing Engine

Columnar Distributed Data Storage

MariaDB SQL Front End

Distributed Query Engine

User Modules

Performance Module 1 ... Performance

Module NPerformance Module 2

Performance Module 3

Clients

User Connections

28Local Disks, SAN, EBS, GlusterFS, HDFS

MariaDB ColumnStore 1.0 Performance • Columnar Storage, multi-threaded and Massively Parallel distributed execution engine

High Availability • Built in redundancy and high availability

Scale • Linear scalability

Analytics • In database analytics with Complex and Cross Engine JOINs• Windowing functions and UDFs• Out of box BI Tools connectivity, • Analytics integration with R

Ease of Use • ANSI SQL compatible • ACID compliant• No indexes, No materialized views• No manual partitioning

Data Ingestion • High speed parallel data load and extract• Create Table as Select, Like -- locally, cross database joins, or over ODBC

Security • SSL support, Audit Plugin, Authentication Plugin, Role Based Access

Deployment Options • On premise, AWS• Supports local disks, SAN, HDFS, GlusterFS 29

© 2016 MariaDB Corp

Client Access

• ODBC/JDBC

• MariaDB/MySQL Connectors

• BI tools

Row-Oriented vs Column-OrientedRow-oriented: rows stored sequentially in a file

Column-oriented: each column is stored in a separate fileEach column for a given row is at the same offset.

Key Fname Lname State Zip Phone Age Sales1 Bugs Bunny NJ 11217 (123) 938-3235 34 1002 Yosemite Sam CT 95389 (234) 375-6572 52 5003 Daffy Duck IA 10013 (345) 227-1810 35 2004 Elmer Fudd CT 04578 (456) 882-7323 43 105 Witch Hazel CT 01970 (567) 744-0991 57 250

Key12345

FnameBugsYosemiteDaffyElmerWitch

LnameBunnySamDuckFuddHazel

StateNJCTIACTCT

Zip1121795389100130457801970

Phone(123) 938-3235(234) 375-6572(345) 227-1810(456) 882-7323(567) 744-0991

Age3452354357

Sales10050020010250

Data Storage ● Vertical Partitioning by Column

○ Each column in its own column file

○ Only do I/O for columns requested

Logical Layer Physical Layer

Table

Column1 ColumnN

Extent 1(8MB～64MB

8 million rows)

Extent N (8MB～64MB

8 million rows)

SegmentFile1 (Extent)

SegmentFileN (Extent)

Server

DB Root

Blocks (8KB)

● Horizontal Partitioning by range of rows○ Logical grouping of 8 million rows of each

column file

○ In-memory mapping of extent to physical

layer

Data Storage - Extents and PMs

Extent 1 Extent 2

Extent 3 Extent 4

Extent 5 Extent 6

Extent 7 Extent 8

PM 1 PM 2

Extent 1 Extent 2 Extent 3 Extent 4

Extent 5 Extent 6 Extent 7 Extent 8

PM 1 PM 2 PM 4PM 3

● Extent Map○ In memory meta-data of an extent’s min, max value for a column, extent’s physical block offset and PM on which

the extent resides

Data Storage - Local Disks

● Each PM nodes stores data on local disk

● No PM node can access the data on another PM node

● Shared Nothing

● No data redundancy

Data Storage - SAN

● Each PM node is attached to a set of volumes on SAN -

called DBRoots

● Upon failure of PM node, another PM attaches to the

failed PM’s DBRoots

● Shared nothing during running state

● No data redundancy

Data Storage - GlusterFS

● Distributed file system

● Software based storage system○ GlusterFS runs on every PM node

○ Creates distributed file system with each PM node’s local

disks and network interface across PM nodes

● Data redundancy across multiple nodes

● Automatic data failover

● Data availability during failover and failback

Data Storage - EBS

● Dynamic scaling to handle variable workloads

● Data layer high availability with Elastic Block Store

(EBS)

Data Ingestion

● Bulk data load ○ cpimport : CSV and Binary○ LOAD DATA INFILE: CSV

● Apache Sqoop Integration: ○ Integration with cpimport and sql interface

● Future Release○ Data Streaming from MariaDB/MySQL database to MariaDB

ColumnStore cluster ■ via Kafka ■ Avro data record

Data Ingestion - cpimport

● Fastest way to load data

○ Load data from CSV file

○ Load data from Standard Input

○ Load data from Binary Source file

● Multiple tables in can be loaded in parallel by launching multiple jobs

● Read queries continue without being blocked

● Successful cpimport is auto-committed

● In case of errors, entire load is rolled back

Data Ingestion - LOAD DATA INFILE

● Traditional way of importing data into any MariaDB storage engine table

● Up to 2 times slower than cpimport for large size imports

● Either success or error operation can be rolled back

MariaDB ColumnStore on Hadoop

• Native scoop integration

• Runs on existing Apache Hadoop hardware

• SQL access to Apache Hadoop data

• libhdfs integration

Map ReduceHBase MariaDB ColumnStore

Hadoop Distributed File System

Pig/Hive

Batch Processing High Performance analytics

MariaDB ColumnStore on AWS

• Automated cluster installation on AWS

• Dynamic scaling to handle variable workloads

• Data layer high availability with Elastic Block Store (EBS)

Internal Only

Use Case: Scaling Big Data Analytics

● An organization is generating large amount of

operational data

● Multiple terabytes of historical data

● With growth in business and in operational data

○ Analytics query performance degrades

○ Impractical to do analytics

● Put past data into MariaDB ColumnStore

● As data grows

● Perform analytics without performance degradation

● Linear Scalability with data growth

Rows/DataSize Scope

1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,00010-100GB 100-1000GB 1-10TB 10-100TB...PB

MariaDB Enterprise OLTP MariaDB ColumnStore OLAP

Business Challenge MariaDB ColumnStore OLAP Solution

1 2 3

MariaDB ColumnStore 1.0

Add new node(s)

43● Harvest new value from large historical datasets by deriving new insights● Support growth in your business, while continue to deliver high service levels for data analytics

Sizing

● Minimum Spec: ○ UM: 2 GHz, 4 core, 32 G RAM○ PM: 2 GHz, 4 core, 16 G RAM

● Typical Server spec○ UM: 8 core, 64G to 256G RAM○ PM: 8 core 64G RAM

● Data storage:○ External data volumes:

■ Maximum 2 data volume per IO channel per PM node server ■ up to 2TB on the disk per data volume ≈ Max 4 TB per PM node

○ Local disk ■ Up to 2TB on the disk per PM node server

44

Sizing - Example

● Initial DB 60TB uncompressed data = 6TB compressed data at 10x compression

● 2UM - 8 core, 128G to 256G (based on workload) ● PM: 8 core 64G RAM● 6 TB compressed = 3 data volume (at 2TB per volume)

○ with 1 data volume per PM node - 3PMs● Data growth - 2TB per month, Data retention - 2 years

○ Plan for 2TB X24 = 48 TB additional○ 48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)

■ with 1 data volume per PM node - 3 additional PMs● Total 6 PMs, 2 UMs 45

46Social Media

MariaDB MaxScaleMariaDB ColumnStore

Node 1 Node 2 Node 3 Node N...

Connectors,SPARK Integration etc

Descriptive AnalyticsWhat is Happening?

Diagnostic AnalyticsWhy did it Happen?

Predictive AnalyticsWhat is likely to happen?

Transactional, Operational

Sensors

Biometrics

Mobile

ETL Tools

Data Collection

MariaDB Solution for Big Data Analytics

Analytics Insight

UMUM

PM PM PM PM

Data Processing

MariaDB ColumnStore .

High performance data management solution for big data analytics

Prescriptive AnalyticsWhat should I do about it?

Thank you!

Jon Day

Solution Architect

[email protected]

MariaDB Corporation

47