Impala: A Modern, Open-Source SQL Engine for Hadoop

51

Transcript of Impala: A Modern, Open-Source SQL Engine for Hadoop

Page 1: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 2: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 3: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 4: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 5: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 6: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 7: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 8: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 9: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 10: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 11: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 13: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 14: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 15: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 16: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 17: Impala: A Modern, Open-Source SQL Engine for Hadoop

Page 18: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 19: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 20: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 21: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 22: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 23: Impala: A Modern, Open-Source SQL Engine for Hadoop

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL App

ODBCHDFS NN

Statestore&

Catalog

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL request

HiveMetastore

Page 24: Impala: A Modern, Open-Source SQL Engine for Hadoop

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL App

ODBC

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

HDFS NNStatestore

&Catalog

Planner turns request into collections of plan fragmentsCoordinator initiates execution on remotes nodes

HiveMetastore

Page 25: Impala: A Modern, Open-Source SQL Engine for Hadoop

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

SQL App

ODBCHive

Metastore HDFS NNStatestore

&Catalog

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

Query Planner

Query Coordinator

Query Executor

HDFS DN HBase

query results

Intermediate results are streamed between nodes

Operation permitted, query results are streamed back to client

Page 26: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 27: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 28: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 29: Impala: A Modern, Open-Source SQL Engine for Hadoop

void MaterializeTuple(char* tuple) {for (int i = 0; i < num_slots_; ++i) {

char* slot = tuple + offsets_[i];switch (types_[i]) {

case BOOLEAN:*slot = ParseBoolean();break;

case INT:*slot = ParseInt();

case FLOAT: …case STRING: …// etc.

}}

}

void MaterializeTuple(char* tuple) {// i = 0*(tuple + 0) = ParseInt();// i = 1*(tuple + 4) = ParseBoolean();// i = 2*(tuple + 5) = ParseInt();

}

Hot code path, called per row

Page 30: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 31: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 32: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 33: Impala: A Modern, Open-Source SQL Engine for Hadoop

QueryFragment

QueryFragment

QueryFragment

IO Manager

Disk Disk Disk

Impala Daemon

Disk Disk

Thread0

Thread1

Thread2

Thread3

Thread4

Page 35: Impala: A Modern, Open-Source SQL Engine for Hadoop

container format for all popular serialization formats: Avro, Thrift, Protocol Buffers

Page 36: Impala: A Modern, Open-Source SQL Engine for Hadoop

From Twitter’s “Dremel Made Simple” blog

The most efficient IO, is one that never happens at all

Page 37: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 38: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 39: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 40: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 41: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 42: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 43: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 44: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 45: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 46: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 47: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 48: Impala: A Modern, Open-Source SQL Engine for Hadoop

OVER PARTITION, RANK, LEAD, LAG, NTILE, ..

•VARCHAR, CHAR

Page 49: Impala: A Modern, Open-Source SQL Engine for Hadoop

ROLLUP, CUBE, GROUPING SETSET MINUS INTERSECT

Page 50: Impala: A Modern, Open-Source SQL Engine for Hadoop
Page 51: Impala: A Modern, Open-Source SQL Engine for Hadoop

SELECT question FROM audience WHERE has_question = true;