JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why...

24
JaguarDB New NoSQL Database Jonathan Yue, Ph.D. Founder of DataJaguar Presentation at NoSQL Meetup Feb 28, 2017, Penisula, Silicon Valley

Transcript of JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why...

Page 1: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

JaguarDB

New NoSQL Database

Jonathan Yue, Ph.D.Founder of DataJaguar

Presentation at NoSQL MeetupFeb 28, 2017, Penisula, Silicon Valley

Page 2: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Overview

Data is distributed

▪All Masters▪No Slaves

Page 3: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Overview

SQL-like query statements

jaguar> create table comp ( key: city char(8), cn char(8), value: emp int, ct char(2) );jaguar> insert into comp values ( ‘SF’, ‘CSCO’, 10000, ‘HP’ );Jaguar> select city, cn from comp where city=‘SF’;Jaguar> select city, count(1) as cnt from comp group by city order by cnt;Jaguar> update comp set emp=20000 where city=‘SF’ and cn=‘CSCO’;Jaguar> delete from comp where city=‘SF’ and cn=‘test111’;

Page 4: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Overview

Programming Languages

Jaguar Server: C++

Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, GoJDBC, Spark

Page 5: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Why JaguarDB

JaguarDB uses a new data structure

1970s B+Tree 1990s LSM Tree 2016 SEA Array

Page 6: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

B+Tree Index

1000 2000

1000500100 200018001200 500038002500

120011101090DATA DATA DATA

180013001210DATA DATA DATA

200018701805DATA DATA DATA

240020500

2005DATA DATA DATA

1005030DATA DATA DATA

500200150DATA DATA DATA

1000800530DATA DATA DATA

108010501025DATA DATA DATA

250024802405DATA DATA DATA

380027002600DATA DATA DATA

500045084010DATA DATA DATA

848078025903DATA DATA DATA

Page 7: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

B+Tree 1970

MySQL, Oracle, PostgreSQL, MongoDB

• Many node splits

• Leaf nodes are linked list

• Index scan requires disk seeks

Page 8: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

LSM Tree

WriteMemTable

201 DATA405 DATA843 DATA

SSTable SSTable SSTable SSTable

SSTable201 DATA280 DATA405 DATA

SSTable350 DATA404 DATA780 DATA

Read

Flush

Compact Compact

Read

Page 9: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

LSM Tree 1990

Cassandra, HBase, BigTable

• MemTable and SSTable

• Table compaction

• Requires partition key

Page 10: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

What Is SEA

Sorted Elastic Array (SEA)

▪ Data sorted in simple flat array with open spaces

▪ Sparse array (not so sparse) ~30% sparse

Page 11: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Regular Array Vs. SEA

Array is good for data locality, read-ahead, caching

1001DATA

1505DATA

1246DATA

1599DATA

2013DATA

2150DATA

2208DATA

2442DATA

3028DATA

3860DATA

4201DATA

4577DATA

7638DATA

1026DATA

1180DATA

Conventional Sorted Array Insert O(N2)

1001DATA

1505DATA

1246DATA

1599DATA

2013DATA

2150DATA

2208DATA

2442DATA

3028DATA

3860DATA

4201DATA

4577DATA

7638DATA

1026DATA

1180DATA

Sorted Elastic Array (SEA) Insert O(NlogN)

Page 12: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

1001DATA

Sorted Elastic Array (SEA) in JaguarDB

1505DATA

1246DATA

1599DATA

2013DATA

2150DATA

2208DATA

2442DATA

3028DATA

3860DATA

4201DATA

4577DATA

7638DATA

1026DATA

1180DATA

BlockBlock Block Block Block

Block Index

SEA

SEA

SEA RESIZE

DispersionData Elements

D a t a E l e m e n t s

Page 13: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Properties of SEA

• Fast sequential index scan• Less data fragmentation • Good data locality• Read-ahead efficiency• Works well for both HDD and SSD

1001DATA

1505DATA

1246DATA

1599DATA

2013DATA

2150DATA

2208DATA

2442DATA

3028DATA

3860DATA

4201DATA

4577DATA

7638DATA

1026DATA

1180DATASEA

Page 14: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Architecture

• Server scale out

• Client scale out

Page 15: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Architecture

Hash Code

Key-Value

Choosing a server to save data

Page 16: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

File Structure

JDB Family

Data records are saved in a family of JDB files

Page 17: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Seconday Indexes

Table

Index

Index

Index

Index

One Table -- Multiple Indexes

• Each attribute can have a index

• Significant attributes in index

• Direct search on indexes

Page 18: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Big Data Ecosystem

Integration with other components

JDBC

Spark

SparkR

Java

Ruby

PHP

Python

Page 19: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Easy Installation

Two steps on each host

1. Execute install.sh

2. Edit conf/host.conf

Page 20: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

SQL Like API

Supports shell, API of Java, C++, Python, PHP, …

create table member (key: uid char(32), value: phone

bigint, addr char(128), zip int );

insert into member values (‘jdoe8888’, 4088882228, ‘123

Main Street, SF, CA92333’, 92333 );

select * from member where uid=‘jdoe8888’;

JagguarDB supports 90+ SQL-like commands and functions

Page 21: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Peformance

Write

▪ 100 million records

▪ 1000 bytes in each record

▪ 3 servers

▪ Used 100 Minutes

▪1.8T HDD, 72GB RAM, 16 Core, 1Gbps, Linux

Page 22: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Work In Progress

▪ Fault Tolerance (one or two data replicates)

▪ Scaling (adding new nodes need no data migration)

Page 23: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Application

IoT data, machine data, sensor data, process data

-- speed

-- location

-- IP address

-- temperature

-- acceleration

-- blood pressure

-- heart beat

-- stock price

-- moisture …

Page 24: JaguarDB · Jaguar Client: Java, Python, C++, Php, Scala, Ruby, NodeJS, Go JDBC, Spark. Why JaguarDB JaguarDB uses a new data structure 1970s B+Tree 1990s LSM Tree 2016 SEA Array.

Thank You

www.datajaguar.com

Jonathan Yue

www.linkedin.com/in/jonyue