Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities...
Transcript of Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities...
![Page 1: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/1.jpg)
Zhenxiao Luo
Software Engineer @ Uber
Even Faster:
When Presto Meets Parquet
@ Uber
![Page 2: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/2.jpg)
Mission
Uber Business Highlights
Analytics Infrastructure @ Uber
Presto
Interactive SQL engine for Big Data
Parquet
Columnar Storage for Big Data
Parquet Optimizations for Presto
Ongoing Work
Agenda
![Page 3: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/3.jpg)
Transportation as reliable as running water, everywhere, for everyone
Uber Mission
![Page 4: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/4.jpg)
Uber Stats
6Continents
73Countries
450Cities
12,000Employees
10+ MillionAvg. Trips/Day
40+ MillionMAU Riders
1.5+ MillionMAU Drivers
![Page 5: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/5.jpg)
Kafka
Analytics Infrastructure @ Uber
Schemaless
MySQL, Postgres
Vertica
Streamio
RawData
Raw Tables
Sqoop
Reports
Hadoop
Hive Presto Spark
Notebook Ad Hoc QueriesReal Time
ApplicationsMachine
Learning JobsBusiness
Intelligence Jobs
Clu
ster
Man
agem
ent
All-
Act
ive
Obs
erva
bilit
y
Secu
rity
Vertica
SamzaPinotFlink
MemSQL
Modeled Tables
Streaming Warehouse
Real-time
![Page 6: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/6.jpg)
Parquet @ Uber
Raw Tables
● No preprocessing
● Highly nested
● ~30 minutes ingestion latency
● Huge tables
Modeled Tables
● Preprocessing via Hive ETL
● Flattened
● ~12 hours ingestion latency
![Page 7: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/7.jpg)
Scale of Presto @ Uber
● 2 clusters○ Application cluster
■ Hundreds of machines■ 100K queries per day■ P90: 30s
○ Ad hoc cluster■ Hundreds of machines■ 20K queries per day■ P90: 60s
● Access to both raw and model tables○ 5 petabytes of data
● Total 120K+ queries per day
![Page 8: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/8.jpg)
● Marketplace pricing○ Real-time driver incentives
● Communication platform○ Driver quality and action platform○ Rider/driver cohorting○ Ops, comms, & marketing
● Growth marketing○ BI dashboard for growth marketing
● Data science○ Exploratory analytics using notebooks
● Data quality○ Freshness and quality check
● Ad hoc queries
Applications of Presto @ Uber
![Page 9: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/9.jpg)
What is Presto: Interactive SQL Engine for Big Data
Interactive query speeds
Horizontally scalable
ANSI SQL
Battle-tested by Facebook, Uber, & Netflix
Completely open source
Access to petabytes of data in the Hadoop data lake
![Page 10: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/10.jpg)
How Presto Works
![Page 11: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/11.jpg)
Why Presto is Fast
● Data in memory during execution
● Pipelining and streaming
● Columnar storage & execution
● Bytecode generation
○ Inline virtual function calls
○ Inline constants
○ Rewrite inner loops
○ Rewrite type-specific branches
![Page 12: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/12.jpg)
Resource Management
● Presto has its own resource manager
○ Not on YARN
○ Not on Mesos
● CPU Management
○ Priority queues
○ Short running queries higher priority
● Memory Management
○ Max memory per query per node
○ If query exceeds max memory limit, query fails
○ No OutOfMemory in Presto process
![Page 13: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/13.jpg)
Limitations
● No fault tolerance
● Joins do not fit in memory
○ Query fails
○ No OutOfMemory in Presto process
○ Try it on Hive
● Coordinator is a single point of failure
![Page 14: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/14.jpg)
Presto Connectors
![Page 15: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/15.jpg)
Parquet: Columnar Storage for Big Data
![Page 16: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/16.jpg)
Parquet Optimizations for Presto
Example Query:
SELECT base.driver_uuidFROM hdrone.mezzanine_tripsWHERE datestr = '2017-03-02' AND base.city_id in (12)
Data:
● Up to 15 levels of Nesting● Up to 80 fields inside each Struct● Fields are added/deleted/updated inside Struct
![Page 17: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/17.jpg)
Old Parquet Reader
![Page 18: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/18.jpg)
Nested Column Pruning
![Page 19: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/19.jpg)
Columnar Reads
![Page 20: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/20.jpg)
Predicate Pushdown
![Page 21: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/21.jpg)
Dictionary Pushdown
![Page 22: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/22.jpg)
Lazy Reads
![Page 23: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/23.jpg)
Benchmarking Results
![Page 24: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/24.jpg)
Ongoing Work
● Multi-tenancy support
● High availability for coordinator
● Geospatial optimization
● Authentication & authorization
![Page 25: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU](https://reader033.fdocuments.net/reader033/viewer/2022060405/5f0f480b7e708231d4436210/html5/thumbnails/25.jpg)
Thank you
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the
use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise
exempt from disclosure under applicable law. All recipients of this document are notified that the information contained
herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any
way disclose this document or any of the enclosed information to any person other than employees of addressee to the
extent necessary for consultations with authorized personnel of Uber.
We are Hiringhttps://www.uber.com/careers/list/27366/
Send resumes to:[email protected] or [email protected]
Interested in learning more about Uber Eng?Eng.uber.com
Follow us on Twitter:@UberEng