Introduction to Presto at Treasure Data

Introduction to Presto Making SQL Scalable

Taro L. Saito leo@treasure-data.com

Treasure Data, Inc.

How do we make SQL scalable?• Problem

• Count access logs of each web page: • SELECT page, count(*) FROM weblog

GROUP BY page

• A Challenge • How do you process millions of records in a

second? • Making SQL scalable enough to handle large

data set

• Translate SQL into MapReduce (Hadoop) programs • MapReduce:

• Does the same job by using many machines

map reduce mergesplit

Single CPU Job

Distributed Processing

SQL to MapReduce• Mapping SQL stages into MapReduce program

• SELECT page, count(*) FROM weblogGROUP BY page

TableScan(weblog)

GroupBy(hash(page))

count(weblog of a page)

result

HDFS is the bottleneck• HDFS (Hadoop File System)

• Used for storing intermediate results • Provides fault-tolerance, but slow

TableScan(weblog)

GroupBy(hash(page))

result

Presto• Distributed query engine developed by Facebook

• Uses HTTP for data transfer

• No intermediate storage like HDFS

• No fault-tolerance (but failure rate is less than 0.2%)

• Pipelining data transfer and data processing

TableScan(weblog)

GroupBy(hash(page))

result

Architecture Comparison

Hive Presto Spark BigQuery

Performance Slow Fast Fast Ultra Fast (using many disks)

Intermediate Storage HDFS None Memory/Disk Colossus (?)

Data Transfer HTTP HTTP HTTP ?

Query Execution

Stage-wizeMapReduce

Run all stagesat once

(pipelining)Stage-wise ?

Fault Tolerance Yes

None (but, TD will retry

the query) fromscratch)

Yes, but limited ?

Multiple Job Support

GoodCan handle many

limited (~ 5 concurrent queries

per account in TD)

Require another resource manager (e.g. YARN, mesos)

limited (Query queue)

Presto Usage Stats• More than 99.8% queries finishes without any error • 90%~ of queries finishes within 1 minute

• Treasure Data Presto Stats • Processing more than 100,000 queries / day • Processing 15 trillion records / day

• Facebook’s stat: • 30,000~100,000 queries / day • 1 trillion records / day

• Treasure data is No.1 Presto user in the world

Presto can process more than 1M rows /sec.

Presto Overview• A distributed SQL Engine developed by Facebook

• For interactive analysis on peta-scale dataset • As a replacement of Hive

• Nov. 2013: Open sourced at GitHub • Facebook now has 12 engineers working on Presto

• Code • In-memory query engine, written in Java • Based on ANSI SQL syntax • Isolating query execution layer and storage access layer • Connector provides data access methods

• Cassandra / Hive / JMX / Kafka / MySQL / PostgreSQL / MongoDB / System / TPCH connectors

• td-presto is our connector to access PlazmaDB (Columnar Message Pack Database)

Architectural overview

https://prestodb.io/overview.html

With Hive connector

Presto Users• Facebook

• Dropbox

• Airbnb

Interactive Analysis with TD Presto + Jupyter

• https://github.com/treasure-data/td-jupyter-notebooks/blob/master/imported/pandas-td-tutorial.ipynb

Presto InternalQuery Execution

Stage 1

Stage 2

Stage 0

Presto Architecture

Task 0.0Split

Task 1.0Split

Task 1.1 Task 1.2Split Split Split

Task 2.0Split

Task 2.1 Task 2.2Split Split Split Split Split Split Split

TableScan (FROM)

Aggregation (GROUP BY)

Output

@worker#2 @worker#3 @worker#0

Logical Query PlanOutput[nationkey, _col1] => [nationkey:bigint, count:bigint]- _col1 := count

Exchange[GATHER] => nationkey:bigint, count:bigint

Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint] - count := "count"("count_15")

Exchange[REPARTITION] => nationkey:bigint, count_15:bigint

Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr")

Project => [nationkey:bigint, expr:bigint] - expr := 1

InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint]

Project => [custkey:bigint]

Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar]

TableScan[tpch:tpch:orders:sf0.01, original constraint= ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar] - custkey := tpch:custkey:1 - orderpriority := tpch:orderpriority:5

Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint

TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]- custkey_0 := tpch:custkey:0- nationkey := tpch:nationkey:3

select c.nationkey, count(1)from orders o join customer con o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey

Output[nationkey, _col1] => [nationkey:bigint, count:bigint]- _col1 := count

TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]- custkey_0 := tpch:custkey:0- nationkey := tpch:nationkey:3 Stage 3

Table Scan

Stage 2

Logical Plan Optimization

Stage 2

Stage 1

Stage 2

Stage 1

Stage 0

Output Query Results (JSON)

TD Storage Architecture

LogLogLogLogLogLog

1-hourpartition1-hour

partition1-hourpartition

Hadoop MapReduce

2015-09-29 01:00:00

2015-09-29 02:00:00

2015-09-29 03:00:00

Real-Time Storage

ArchiveStorage

time column-based partitioning…

Hive Presto

many small log files log merge job

LogLogLogLogLog

Distributed SQL Query Engine

Utilizing Time Index

1-hourpartition

2015-09-29 01:00:00

2015-09-29 02:00:00

2015-09-29 03:00:00

time column-based partitioning

Hive/Presto1-hour

partition1-hourpartition1-hour

partition

TD_TIME_RANGE(time, ‘2015-09-29 02:00:00’, ‘2015-09-29 03:00:00’)

Query Results

2015-09-29 01:00:00

2015-09-29 02:00:00

2015-09-29 03:00:00

Hive/Presto Query Results

TD_TIME_RANGE(non_time_column, ‘2015-09-29 02:00:00’, ‘2015-09-29 03:00:00’)

Scanning the whole data set

1-hourpartition1-hour

partition1-hourpartition1-hourpartition

Full Scan

Partial Scan

Queries with huge results• SELECT col1, col2, col3, … FROM …

• INSERT INTO (table) SELECT col1, col2, … • or CREATE TABLE AS

1-hourpartition

headercol1col2……

Presto

Read query results in JSON (single-thread task: slow)

msgack.gz

On Amazon S3

Presto

1-hourpartition1-hourpartition

1-hourpartition

Directly create 1-hour partition on S3 from query results Runs in parallel: fast

Memory Consuming Operators• DISTINCT col1, col2, … (duplicate elimination)

• Need to store the whole data set in a single node • COUNT(DISTINCT col1), etc.

• Use approx_distinct(col1) instead

• order by col1, col2, … • A single node task (in Presto)

• UNION • performs duplicate elimination (single node) • Use UNION ALL

Finding bottlenecks• Table scan range

• Check TD_TIME_RANGE condition • distinct

• duplicate elimination of all selected columns (single node) • slow and memory consuming

• huge result output • Output Stage (0) becomes the bottleneck • Use DROP TABLE IF EXISTS …, then CREATE TABLE AS SELECT …

Resources• Presto Query FAQs

• https://docs.treasuredata.com/articles/presto-query-faq

• Presto Documentation • https://prestodb.io/docs

Introduction to Presto at Treasure Data

Technology

Transcript of Introduction to Presto at Treasure Data

New 1. INTRODUCTION - Treasure Island · 2020. 1. 7. · treasure island & yerba buena island major phase 1 application 1 - introduction 15 1. introduction 1.1 1.2 1.5 1.4 1.6 1.7

Presto As A Service - Treasure DataでのPresto運用事例

Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)

The Enterprise Presto Company STARBURST Presto: SQL-on ...biconsulting.hu/letoltes/2018budapestdata/wojciech_biela_presto_sql_on_anything.pdf · The Enterprise Presto Company STARBURST

An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

Manual Presto

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

TREASURE VALLEY GROWTH SCENARIO ANALYSIS...4 Introduction DEFINING THE TREASURE VALLEY FUTURES PROJECT The Treasure Valley Futures Project, funded by a grant from the Federal Highway

National Treasure - Live.Love.Laugh.Teachnational treasure

The Enterprise Presto Company STARBURST Presto: SQL-on ......We are the Presto company! Largest team of Presto contributors outside of Facebook Led Presto initiative at Teradata for

Serijal STAKLENI PRESTO: Stakleni presto Kruna ponoći … · 2018. 12. 28. · Serijal STAKLENI PRESTO: Stakleni presto Kruna ponoći Naslednica vatre Kraljica senki Carstvo oluja

PRESTO - Drilling

Presto 2.0 Introduction - What is Presto

操作に困った時のトラブルシューティング～Presto～PRESTO - InstallShield Wizard D-station PRESTO*DInstallShield InstallShield(R) D-station PRESTO HOXSIN FUTURES

Presto! PageManager Overviewscanners.fcpa.fujitsu.com/scanzen/presto_page_manager.pdf · Presto! PageManager Overview Presto! PageManager is a document management software that is

PRESTO Update and Transition Overview · With that, majority of PRESTO pass products and features will be introduced throughout 2018, followed by the introduction . PRESTO Update

Canones presto

instructions for 6n recording turntable presto recording corporation introduction description of presto 6n turntable unit 1. general.

Presto @ Treasure Data - Presto Meetup Boston 2015

TTC & PRESTO TOWN HALL: TRANSITION TO PRESTO TTC PRESTO Inform… · TTC & PRESTO TOWN HALL: TRANSITION TO PRESTO. Kirsten Watson, Deputy CEO - Operations, TTC. Annalise Czerny, Executive