Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data...

8

Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) Presented by Ankur Dave CS294-110, Fall 2015

Upload
trinhdiep
Category

Documents
view
239
download
1

Embed Size (px):

Transcript of Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data...

Page 1: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Spark SQL: Relational Data Processing in Spark

(SIGMOD 2015)

Presented by Ankur DaveCS294-110, Fall 2015

Page 2: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Problem

Imperative Declarative

⇒

Semi-Structured Data & Advanced Analytics

⇒

No support in existing systems

?

Page 3: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Spark SQL

1. DataFrame API

2. Catalyst Optimizer

Page 4: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

DataFrames• Language-integrated declarative API

• Schema inference for semi-structured data• Allows direct access to JVM objects• Unlike other declarative systems

• Supports complex types like vectors• Easy to add new types

⇒

Page 5: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Catalyst Optimizer• Rules written using Scala pattern matching

• Allows writing complex rules• Example: join elimination

• Exploits structure of data source• Example: predicate pushdown for Parquet• Easy to add new data sources (e.g., Succinct)

• Code generation for expression evaluation

• Cost-based join algorithm selection (shuffle vs. BitTorrent broadcast)

Page 6: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Evaluation• Gains over Spark and Shark due to code generation• Optimizer less mature than Impala’s

• More important: ease of use and extensibility

Page 7: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Why Spark SQL?Easy to

Program?Advanced Analytics?

No

No

Yes

Yes

Yes

?

Yes

?

Page 8: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Lasting Impact?

• Extensible optimizer

• Language integration

PEMETAAN – RELATIONAL - SQL

PEMETAAN – RELATIONAL - SQL

Spark SQL: Relational Data Processing in Sparkweb.eecs.umich.edu/~mosharaf/Readings/SparkSQL.pdf · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

Spark SQL: Relational Data Processing in Sparkweb.eecs.umich.edu/~mosharaf/Readings/SparkSQL.pdf · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

Spark SQL: Relational Data Processing in Spark - Spark SQL...Spark SQL was released in May 2014, and is now one of the most actively developed components in Spark. As of this writing,

Spark SQL: Relational Data Processing in Spark - Spark SQL...Spark SQL was released in May 2014, and is now one of the most actively developed components in Spark. As of this writing,

Relational Databases and SQL

Relational Databases and SQL

Object-Relational SQL

Object-Relational SQL

Spark streaming , Spark SQL

Spark streaming , Spark SQL

pages Research Report - dominoweb.draco.res.ibm.com · Spark SQL [12] is a major component in Spark, and it bridges relational processing and procedural processing with Spark APIs.

pages Research Report - dominoweb.draco.res.ibm.com · Spark SQL [12] is a major component in Spark, and it bridges relational processing and procedural processing with Spark APIs.

Spark SQL | Apache Spark

Spark SQL | Apache Spark

20140908 spark sql & catalyst

20140908 spark sql & catalyst

Spark & Spark SQL

Spark & Spark SQL

z/OS Platform for Apache Spark Install and Usage (Session ... · Spark SQL • APIs for working with structured and semi-structured data • Provide for relational queries expressed

z/OS Platform for Apache Spark Install and Usage (Session ... · Spark SQL • APIs for working with structured and semi-structured data • Provide for relational queries expressed

Spark SQL: Relational Data Processing in Spark - MIT · PDF fileSpark SQL: Relational Data Processing in Spark ... Spark SQL is a new module in Apache Spark that integrates rela- ...

Spark SQL: Relational Data Processing in Spark - MIT · PDF fileSpark SQL: Relational Data Processing in Spark ... Spark SQL is a new module in Apache Spark that integrates rela- ...

Spark Concepts - Spark SQL, Graphx, Streaming

Spark Concepts - Spark SQL, Graphx, Streaming

Spark SQL: Relational Data Processing in Sparkweb.eecs.umich.edu/.../eecs582/armbrust15sparksql.pdfchine learning types, and query federation to external databases) tailored for the

Spark SQL: Relational Data Processing in Sparkweb.eecs.umich.edu/.../eecs582/armbrust15sparksql.pdfchine learning types, and query federation to external databases) tailored for the

Relational Model to SQL

Relational Model to SQL

Spark SQL: Relational Data Processing in Spark Michael Armbrust, Reynold Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer.

Spark SQL: Relational Data Processing in Spark Michael Armbrust, Reynold Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer.

Spark vs. PL/SQL

Spark vs. PL/SQL

CS 338The Relational Model2-1 The Relational Model Lecture Topics Overview of SQL Underlying relational model Relational database structure SQL DDL and.

CS 338The Relational Model2-1 The Relational Model Lecture Topics Overview of SQL Underlying relational model Relational database structure SQL DDL and.

Spark SQL: Relational Data Processing in Sparkweiwa/teaching/Fall16-COMP6611B/... · 2016-09-06 · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

Spark SQL: Relational Data Processing in Sparkweiwa/teaching/Fall16-COMP6611B/... · 2016-09-06 · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

Databases 2010 The Relational Model and SQLmis/dDB/sql-2010-1.pdf · Databases 2010 The Relational Model and SQL ... The Relational Model and SQL. 14. Advantages of The Relational

Databases 2010 The Relational Model and SQLmis/dDB/sql-2010-1.pdf · Databases 2010 The Relational Model and SQL ... The Relational Model and SQL. 14. Advantages of The Relational

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS