Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com
-
Upload
cask-data-inc -
Category
Technology
-
view
881 -
download
1
Transcript of Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com
![Page 1: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/1.jpg)
V5
OLTP for Hadoop@ApachePhoenix
http://phoenix.apache.org/James Taylor (@JamesPlusPlus)
![Page 2: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/2.jpg)
About James
• Architect at Salesforce.com– Part of the Big Data group– Lead of the Apache Phoenix– PMC of Apache Calcite
![Page 3: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/3.jpg)
What is Apache Phoenix?
• A relational database layer for Apache HBase
![Page 4: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/4.jpg)
What is
• High performance horizontally scalable byte store• Suitable as store of record for mission critical data
?
![Page 5: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/5.jpg)
What is Apache Phoenix?
![Page 6: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/6.jpg)
What is Apache Phoenix?
• A relational database layer for Apache HBase– OLTP/Query engine
• Transforms SQL into native HBase API calls• Pushes work to cluster for parallel execution• Supports ACID transactions
– Metadata repository• Typed access to data stored in HBase tables• Support multi-tenancy modeled as SQL views
– JDBC driver• A top level Apache Software Foundation project
– Originally developed at Salesforce– A growing community with momentum
![Page 7: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/7.jpg)
Where Does Phoenix Fit In?Sq
oop
RDB
Data
Col
lect
orFl
ume
Log
Data
Col
lect
or
Zook
eepe
rCo
ordi
natio
n
YARNCluster Resource
Manager / MapReduce
HDFS 2.0Hadoop Distributed File System
GraphXGraph analysis
framework
PhoenixQuery execution engine
HBaseDistributed Database
The Java Virtual Machine HadoopCommon JNI
SparkIterative In-Memory
Computation
MLLibData mining
PigData Manipulation
HiveStructured Query
PhoenixJDBC client
![Page 8: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/8.jpg)
Why is Phoenix fast?• Pushes down computation to region servers
– Start/stop key range(s)– Time range min/max– Predicates– Aggregation– Sort– Limit– TopN
• Parallelizes query from client– Intra-region through statistics collection
• Supports secondary indexes– Global & co-located
![Page 9: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/9.jpg)
Client
Server Server Server Server Server ServerPartitioned TableR1 R2 … … Rn
Intra-region guideposts
Parallel scans
Scan rangeScan ranges
• Filter• Aggregate• Sort• Limit
Scan range
Why is Phoenix fast?
![Page 10: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/10.jpg)
Skip Scans
• Push key ranges through filter• Use SEEK_NEXT_HINT to skip data
R1
Include
Include
Include
Skip
Skip
Skip
![Page 11: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/11.jpg)
Filters
• Pushes down WHERE clause to server• Filters entire HFile based on time-range
R1HFilet1 - 10
HFilet11 - 20
HFilet21 - 30
HFilet31 - 40 ScanHFilet31 - 40 Scan
![Page 12: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/12.jpg)
Transactions
• Snapshot isolation model– Using Tephra (http://tephra.io/)– Supports SERIALIZABLE isolation level– Allows reading your own uncommitted data
• Optional– Enabled on a table by table basis– No performance penalty when not used
• Available in 4.7.0 release (being voted on now)
![Page 13: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/13.jpg)
Optimistic Concurrency Control
• Avoids cost of locking rows and tables• No deadlocks or lock escalations• Cost of conflict detection and possible rollback is
higher• Good if conflicts are rare: short transaction, disjoint
partitioning of work• Conflict detection not always necessary:
write-once/append-only data
![Page 14: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/14.jpg)
Tephra Architecture
ZooKeeper
Tx Manager(standby)
HBase
Master 1
Master 2
RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager(active)
![Page 15: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/15.jpg)
Transaction Lifecycle
time outtry abort
failedroll backin HBase
writeto
HBasedo work
Client Tx Manager
none
complete Vabortsucceeded
in progress
start txstart
start tx
committry commit check conflicts
invalid Xinvalidatefailed
![Page 16: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/16.jpg)
Tephra Architecture
• TransactionAware client• Coordinates transaction lifecycle with manager• Communicates directly with HBase for reads and writes
• Transaction Manager• Assigns transaction IDs• Maintains state on in-progress, committed and invalid transactions
• Transaction Processor coprocessor• Applies server-side filtering for reads• Cleans up data from failed transactions, and no longer visible
versions
![Page 17: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/17.jpg)
Demo
• Debit/credit example• Multiple clients updating account balance through stream of
debits/credits.• Without transactions, balance will not be correct
![Page 18: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/18.jpg)
What’s Next?
You are here
![Page 19: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/19.jpg)
Introducing Apache Calcite• Query parser, compiler, and planner framework
– SQL-92 compliant• Pluggable cost-based optimizer framework
– Sane way to model push down through rules• Interop with other Calcite adaptors
– Already used by Drill, Hive, Kylin, Samza– Supports any JDBC source (RDBMS - remember them )– One cost-model to rule them all
![Page 20: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/20.jpg)
How does Phoenix plug in?
Calcite Parser & Validator
Calcite Query Optimizer
Phoenix Query Plan GeneratorPhoenix Runtime
JDBC Client
SQL + Phoenix specific grammar
Built-in rules + Phoenix specific
rules
![Page 21: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/21.jpg)
What’s Next?Query
Drillbit
Calcite
Phoenix/HBase HDFS
Drillbit
Calcite
RDBMS
Drillbit
Calcite
Samza/ Streams
Drillbit
Calcite
![Page 22: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com](https://reader035.fdocuments.net/reader035/viewer/2022062503/5882b63d1a28abd75a8b73cd/html5/thumbnails/22.jpg)
Thank you!Questions?
* who uses Phoenix