JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why...
Transcript of JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why...
JaguarDB
New NoSQL Database
Jonathan Yue, Ph.D.Founder of DataJaguar
Presentation at NoSQL MeetupFeb 28, 2017, Penisula, Silicon Valley
Overview
Data is distributed
▪All Masters▪No Slaves
Overview
SQL-like query statements
jaguar> create table comp ( key: city char(8), cn char(8), value: emp int, ct char(2) );jaguar> insert into comp values ( ‘SF’, ‘CSCO’, 10000, ‘HP’ );Jaguar> select city, cn from comp where city=‘SF’;Jaguar> select city, count(1) as cnt from comp group by city order by cnt;Jaguar> update comp set emp=20000 where city=‘SF’ and cn=‘CSCO’;Jaguar> delete from comp where city=‘SF’ and cn=‘test111’;
Overview
Programming Languages
Jaguar Server: C++
Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, GoJDBC, Spark
Why JaguarDB
JaguarDB uses a new data structure
1970s B+Tree 1990s LSM Tree 2016 SEA Array
B+Tree Index
1000 2000
1000500100 200018001200 500038002500
120011101090DATA DATA DATA
180013001210DATA DATA DATA
200018701805DATA DATA DATA
240020500
2005DATA DATA DATA
1005030DATA DATA DATA
500200150DATA DATA DATA
1000800530DATA DATA DATA
108010501025DATA DATA DATA
250024802405DATA DATA DATA
380027002600DATA DATA DATA
500045084010DATA DATA DATA
848078025903DATA DATA DATA
B+Tree 1970
MySQL, Oracle, PostgreSQL, MongoDB
• Many node splits
• Leaf nodes are linked list
• Index scan requires disk seeks
LSM Tree
WriteMemTable
201 DATA405 DATA843 DATA
SSTable SSTable SSTable SSTable
SSTable201 DATA280 DATA405 DATA
SSTable350 DATA404 DATA780 DATA
Read
Flush
Compact Compact
Read
LSM Tree 1990
Cassandra, HBase, BigTable
• MemTable and SSTable
• Table compaction
• Requires partition key
What Is SEA
Sorted Elastic Array (SEA)
▪ Data sorted in simple flat array with open spaces
▪ Sparse array (not so sparse) ~30% sparse
Regular Array Vs. SEA
Array is good for data locality, read-ahead, caching
1001DATA
1505DATA
1246DATA
1599DATA
2013DATA
2150DATA
2208DATA
2442DATA
3028DATA
3860DATA
4201DATA
4577DATA
7638DATA
1026DATA
1180DATA
Conventional Sorted Array Insert O(N2)
1001DATA
1505DATA
1246DATA
1599DATA
2013DATA
2150DATA
2208DATA
2442DATA
3028DATA
3860DATA
4201DATA
4577DATA
7638DATA
1026DATA
1180DATA
Sorted Elastic Array (SEA) Insert O(NlogN)
1001DATA
Sorted Elastic Array (SEA) in JaguarDB
1505DATA
1246DATA
1599DATA
2013DATA
2150DATA
2208DATA
2442DATA
3028DATA
3860DATA
4201DATA
4577DATA
7638DATA
1026DATA
1180DATA
BlockBlock Block Block Block
Block Index
SEA
SEA
SEA RESIZE
DispersionData Elements
D a t a E l e m e n t s
Properties of SEA
• Fast sequential index scan• Less data fragmentation • Good data locality• Read-ahead efficiency• Works well for both HDD and SSD
1001DATA
1505DATA
1246DATA
1599DATA
2013DATA
2150DATA
2208DATA
2442DATA
3028DATA
3860DATA
4201DATA
4577DATA
7638DATA
1026DATA
1180DATASEA
Architecture
• Server scale out
• Client scale out
Architecture
Hash Code
Key-Value
Choosing a server to save data
File Structure
JDB Family
Data records are saved in a family of JDB files
Seconday Indexes
Table
Index
Index
Index
Index
One Table -- Multiple Indexes
• Each attribute can have a index
• Significant attributes in index
• Direct search on indexes
Big Data Ecosystem
Integration with other components
JDBC
Spark
SparkR
Java
Ruby
PHP
Python
Easy Installation
Two steps on each host
1. Execute install.sh
2. Edit conf/host.conf
SQL Like API
Supports shell, API of Java, C++, Python, PHP, …
create table member (key: uid char(32), value: phone
bigint, addr char(128), zip int );
insert into member values (‘jdoe8888’, 4088882228, ‘123
Main Street, SF, CA92333’, 92333 );
select * from member where uid=‘jdoe8888’;
JagguarDB supports 90+ SQL-like commands and functions
Peformance
Write
▪ 100 million records
▪ 1000 bytes in each record
▪ 3 servers
▪ Used 100 Minutes
▪1.8T HDD, 72GB RAM, 16 Core, 1Gbps, Linux
Work In Progress
▪ Fault Tolerance (one or two data replicates)
▪ Scaling (adding new nodes need no data migration)
Application
IoT data, machine data, sensor data, process data
-- speed
-- location
-- IP address
-- temperature
-- acceleration
-- blood pressure
-- heart beat
-- stock price
-- moisture …
Thank You
www.datajaguar.com
Jonathan Yue
www.linkedin.com/in/jonyue