SQL for Everything at CWT2014
-
Upload
n-masahiro -
Category
Technology
-
view
413 -
download
1
description
Transcript of SQL for Everything at CWT2014
Masahiro NakagawaNov 6, 2014
Cloudera World Tokyo
SQL for EverythingPresto: Distributed SQL Query Engine
Who are you?
> Masahiro Nakagawa > github/twitter: @repeatedly > Ingress: Blue
> Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer
> I love OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC- D and Python (only RPC) > The organizer of Presto Source Code Reading > etc…
SQL on Hadoop?
> Hive > Spark SQL
Batch
Short Batch Low latency
Stream
> Presto > Impala > Drill
> Norikra > StreamSQL
> HAWQ > Actian > etc…
This color indicates a commercial product
SQL Players on Hadoop
Latency: minutes - hours
Latency: seconds - minutes
Latency: immediate
> Hive > Spark SQL
SQL Players on Hadoop
Batch
Short Batch Low latency
Stream
> Presto > Impala > Drill
> HAWQ > Actian > etc…
Red Ocean
Blue Ocean?> Norikra > StreamSQL
This color indicates a commercial product
Prestohttp://prestodb.io/
Presto overview> Open sourced by Facebook
> https://github.com/facebook/presto • github is a primary
> written in Java > latest version is 0.81
> Built-in useful features > Connectors > Machine Learning > Window function > Approximate query > etc…
What’s Presto?
A distributed SQL query engine for interactive data analisys against GBs to PBs of data.
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
HDFS
Hive Dashboard
Presto
PostgreSQL, etc.
Daily/Hourly Batch
HDFS
HiveDashboard
Daily/Hourly Batch
Interactive query
Interactive query
Presto
HDFS
HiveDashboard
Daily/Hourly BatchInteractive query
Cassandra MySQL Commertial DBs
SQL on any data sets CommercialBI Tools
✓ IBM Cognos✓ Tableau ✓ ...
Data analysis platform
Presto’s deployment> Facebook
> Multiple geographical regions > scaled to 1,000 nodes > actively used by 1,000+ employees > processing 1PB/day
> Netflix, Dropbox, Treasure Data, Airbnb, Qubole, LINE, GREE, Scaleout, etc
> Presto as a Service > Treasure Data, Qubole
PostgreSQL gateway for Presto> A PostgreSQL protocol gateway based on
PostgreSQL’s stable ODBC / JDBC drivers > Developed by Sadayuki Furuhashi
https://github.com/treasure-data/prestogres
Distributed architecture
Client
Coordinator ConnectorPlugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
What’s Connectors?> Access to storage and metadata
> provide table schema to coordinators > provide table rows to workers
> Connectors are pluggable to Presto > written in Java
> Implementations: > Hive(CDH, HDP, Community), Cassandra,
MySQL, JDBC, Kafka, etc… > Or your own connector
• Treasure Data has own connector
Client
Coordinator
otherconnectors
...
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Hive Connector
HDFS / Metastore
Multiple connectors in a query
CassandraConnector
Other data sources...
Distributed architecture
> 3 type of servers: > Coordinator, worker, discovery service
> Get data/metadata through connector plugins. > Presto is NOT a database > Presto provides SQL to existent data stores
> Client protocol is HTTP + JSON > Language bindings:
Ruby, Python, PHP, Java (JDBC), R, Node.JS...
Presto’s execution model
> Presto is NOT MapReduce > Use its own execution engine
> Presto’s query plan is based on DAG > more like Apache Tez / Spark or
traditional MPP databases > Impala and Drill use a similar model
Query Planner
SELECT name, count(*) AS c FROM impressions GROUP BY name
SQL
impressions ( name varchar time bigint)
Table schemaTable scan
(name:varchar)
GROUP BY (name,
count(*))
Output (name, c)
+
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
Logical query plan
Distributed query plan
Query Planner - Stages
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
inter-worker data transfer
pipelined aggregation
inter-worker data transfer
Stage-0
Stage-1
Stage-2
Sink
Partial aggr
Table scan
Sink
Partial aggr
Table scan
Execution Planner
+Node list✓ 2 workers
Sink
Final aggr
Exchange
Output
Exchange
Sink
Final aggr
Exchange
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
Worker 1 Worker 2
All stages are pipe-lined ✓ No wait time ✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
map map
reduce reduce
task task
task task
task
task
memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory
task
disk
map map
reduce reduce
disk
disk
Write data to disk
Wait betweenstages
Demo
Presto Meetup
The first half of 2015
Check: treasuredata.com
Cloud service for the entire data pipeline, including Presto