Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack...
Transcript of Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack...
![Page 1: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/1.jpg)
Presto - swiss army SQL knife on Hadoop
http://allegro.tech/@AllegroTechBlog
Marek GawińskiDariusz Eliasz
Apache: Big Data North America 2017
![Page 2: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/2.jpg)
Agenda
Apache: Big Data North America 2017
● About Allegro● Technical history of Allegro platform● Architecture of our data computation stack● Leaking component in desired architecture● What is Presto ?● Story of some Presto implementation● Swiss knife - 3 dimensions● Lessons learned● Q&A
![Page 3: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/3.jpg)
About us
Apache: Big Data North America 2017
![Page 4: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/4.jpg)
About us
Apache: Big Data North America 2017
https://github.com/allegro
![Page 5: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/5.jpg)
Technical history of Allegro platform
Apache: Big Data North America 2017
Monolyth Oriented Architecture
Service Oriented Architecture
![Page 6: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/6.jpg)
Technical history of Allegro platform
Apache: Big Data North America 2017
Relational DB+
Exadata
Distributed DB Engines
+ Message Bus
+Hadoop
![Page 7: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/7.jpg)
Technical history of Allegro platform
Apache: Big Data North America 2017
BI Team+
DWH Team
Everybody workswithdata
![Page 8: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/8.jpg)
Architecture of our data computation stack
Apache: Big Data North America 2017
![Page 9: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/9.jpg)
Architecture of our data computation stack
Apache: Big Data North America 2017
![Page 10: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/10.jpg)
Leaking component in desired architecture
Apache: Big Data North America 2017
What if not Exadata ?
![Page 11: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/11.jpg)
Leaking component in desired architecture
Apache: Big Data North America 2017
Wish list:
- Low entry threshold tool:- AnsiSQL- JDBC interface (DataGrip etc.)- Legible documentation- Predefined UDF- Users didn’t need special knowledge related with queries construction
- Project with solid community- Works with secured cluster
![Page 12: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/12.jpg)
What is Presto?
Apache: Big Data North America 2017Source: https://prestodb.io/overview.html
“Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”
“Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.”
![Page 13: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/13.jpg)
Who’s using Presto?
Apache: Big Data North America 2017Source: https://prestodb.io/overview.html
“Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
Leading internet companies including Netflix, Airbnb and Dropbox are using Presto.”
![Page 14: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/14.jpg)
Presto Architecture
Apache: Big Data North America 2017Source: https://prestodb.io/overview.html
![Page 15: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/15.jpg)
Presto Connectors
Apache: Big Data North America 2017
● Accumulo● Black Hole● Cassandra● Hive● Hive Security● Memory● JMX● Kafka
● Local File● MongoDB● MySQL● PostgreSQL● Redis● SQL Server● System● TPCH
![Page 16: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/16.jpg)
Presto Data Formats
Apache: Big Data North America 2017
● Text● Avro● SequenceFile● RCFile● ORC● Parquet
![Page 17: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/17.jpg)
What is Presto?
Apache: Big Data North America 2017
![Page 18: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/18.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 19: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/19.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 20: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/20.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 21: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/21.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 22: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/22.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 23: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/23.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 24: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/24.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 25: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/25.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 26: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/26.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 27: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/27.jpg)
Our trip to production
Apache: Big Data North America 2017
Secured CoordinatorLDAP + HTTPS
![Page 28: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/28.jpg)
Swiss army SQL knife - 3 dimensions
Apache: Big Data North America 2017
Business dimension
Flexibility of solutionAsk fast - fail fast
![Page 29: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/29.jpg)
Lessons learned
Apache: Big Data North America 2017
● Engaging users right from the start of the project● Be in touch with developers and community● Pay attention to data formats (prices as string)● Monitor all metrics● Test every format used - Avro, Parquet problems● Read source code to learn deeper how things works● Test tuning options● Teradata vs Facebook version● Lack of CBO (will be soon) - don’t trust benchmarks
![Page 30: Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack ... Hadoop. Technical history of Allegro platform ... Meetup - https:](https://reader031.fdocuments.net/reader031/viewer/2022021423/5a880c2e7f8b9afc5d8e3527/html5/thumbnails/30.jpg)
Q&A
Apache: Big Data North America 2017
Contact:[email protected]@allegrogroup.com
Allegro Tech blog - http://allegro.techMeetup - https://www.meetup.com/allegrotechFacebook allegro Tech - https://www.facebook.com/allegro.techTwitter Allegro Tech - @AllegroTechBlog