Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

41
World’s Best Data Modeling Tool for Apache Cassandra 1 © 2015. All Rights Reserved. Artem Chebotko Andrey Kashlev

Transcript of Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Page 1: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

World’s Best Data Modeling Tool

for Apache Cassandra

1 © 2015. All Rights Reserved.

Artem Chebotko

Andrey Kashlev

Page 2: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

1 Cassandra Data Modeling Methodology

2 The KDM Tool

3 Live Demo: IoT

4 Live Demo: Media Cataloguing

5 Future Work

2 © 2015. All Rights Reserved.

Page 3: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Data Modeling Process

• Data requirements

• Application requirements

• Schema Design

• Optimization

3 © 2015. All Rights Reserved.

Page 4: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Cassandra Data Modeling Methodology

© 2015. All Rights Reserved. 4

Conceptual

Data Model

Application

Workflow

Logical

Data Model

Physical

Data Model Mapping Optimization

Page 5: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Methodology Models

© 2015. All Rights Reserved. 5

Model Representation

Conceptual Data Model ERD

Application Workflow Model Graph

Logical Data Model Chebotko Diagram

Physical Data Model Chebotko Diagram, CQL

Page 6: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Methodology Protocols

© 2015. All Rights Reserved. 6

• Conceptual-to-logical mapping

– Mapping rules

– Mapping patterns

• Physical optimizations

– Partition size analysis

– Duplication factor analysis

– Keys, aggregation, transactions, …

Page 7: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Example

© 2015. All Rights Reserved. 7

SELECT timestamp, value FROM …

WHERE location = ? AND parameter = ? AND timestamp > ?

ORDER BY timestamp DESC

n

parameter value

1

timestampid location

Sensor Measurementrecords

Page 8: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

1

Example

© 2015. All Rights Reserved. 8

SELECT timestamp, value FROM …

WHERE location = ? AND parameter = ?

AND timestamp > ?

ORDER BY timestamp DESC

n

parameter value

1

timestampid location

Sensor Measurementrecords

Mapping Entity and Relationship Types

Page 9: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

1 2

Example

© 2015. All Rights Reserved. 9

SELECT timestamp, value FROM …

WHERE location = ? AND parameter = ?

AND timestamp > ?

ORDER BY timestamp DESC

n

parameter value

1

timestampid location

Sensor Measurementrecords

Mapping Equality Search Atributes

Page 10: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↑

id C↑

value

1 2 3

Example

© 2015. All Rights Reserved. 10

SELECT timestamp, value FROM …

WHERE location = ? AND parameter = ?

AND timestamp > ?

ORDER BY timestamp DESC

n

parameter value

1

timestampid location

Sensor Measurementrecords

Mapping Inequality Search Attributes

Page 11: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↑

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

1 2 3 4

Example

© 2015. All Rights Reserved. 11

SELECT timestamp, value FROM …

WHERE location = ? AND parameter = ?

AND timestamp > ?

ORDER BY timestamp DESC

n

parameter value

1

timestampid location

Sensor Measurementrecords

Mapping Ordering Attributes

Page 12: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

sensor_data

location K

parameter K

timestamp C↑

id C↑

value

sensor_data

location K

parameter K

timestamp C↓

id C↑

value

1 2 3 4 5

Example

© 2015. All Rights Reserved. 12

SELECT timestamp, value FROM …

WHERE location = ? AND parameter = ?

AND timestamp > ?

ORDER BY timestamp DESC

n

parameter value

1

timestampid location

Sensor Measurementrecords

Mapping Key Attributes

Page 13: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Methodology Pros and Cons

Correctness

Completeness

© 2015. All Rights Reserved. 13

Complexity

Time investment

Page 14: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Human Errors Happen …

© 2015. All Rights Reserved. 14

Page 15: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Automation

© 2015. All Rights Reserved. 15

Complexity

Time investment

Human Error

Page 16: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

1 Cassandra Data Modeling Methodology

2 The KDM Tool

3 Live Demo: IoT

4 Live Demo: Media Cataloguing

5 Future Work

16 © 2015. All Rights Reserved.

Page 17: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

The KDM Tool

• Streamlines the methodology

• Guides the user

• Automates data modeling tasks:

– Conceptual-to-logical mapping

– Physical optimization

– CQL generation

17 © 2015. All Rights Reserved.

Page 18: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

18 © 2015. All Rights Reserved.

Page 19: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

19 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Step1

Solution

architect

Page 20: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

20 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Solution

architect

Step1 Step2

Solution

architect

Page 21: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

21 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Generate

Logical

Data

Models

KDM Solution

architect

Step1 Step2 Automated

Solution

architect

Page 22: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

22 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Generate

Logical

Data

Models

Select

Logical

Data

Model

KDM Solution

architect

Step1 Step2 Step3 Automated

Solution

architect

Solution

architect

Page 23: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

23 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Generate

Logical

Data

Models

Select

Logical

Data

Model

Generate

Physical

Data

Model

KDM Solution

architect

Step1 Step2 Step3 Automated Automated

Solution

architect

Solution

architect KDM

Page 24: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

24 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Generate

Logical

Data

Models

Select

Logical

Data

Model

Generate

Physical

Data

Model

Configure

Physical

Data

Model

KDM Solution

architect

Step1 Step2 Step3 Step4 Automated Automated

Solution

architect

Solution

architect

Solution

architect KDM

Page 25: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

25 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Generate

Logical

Data

Models

Select

Logical

Data

Model

Generate

Physical

Data

Model

Configure

Physical

Data

Model

Generate

Physical

Schema

KDM Solution

architect

Step1 Step2 Step3 Step4 Automated Automated Automated

Solution

architect

Solution

architect

Solution

architect KDM KDM

Page 26: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

KDM Automation Workflow

26 © 2015. All Rights Reserved.

Design

Conceptual

Data Model

Specify

Access

Patterns

Generate

Logical

Data

Models

Select

Logical

Data

Model

Generate

Physical

Data

Model

Configure

Physical

Data

Model

Generate

Physical

Schema

Download

CQL

Script

KDM Solution

architect

Step1 Step2 Step3 Step4 Step5 Automated Automated Automated

Solution

architect

Solution

architect

Solution

architect

Solution

architect KDM KDM

Page 27: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

1 Cassandra Data Modeling Methodology

2 The KDM Tool

3 Live Demo: IoT

4 Live Demo: Media Cataloguing

5 Future Work

27 © 2015. All Rights Reserved.

Page 28: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

28

Page 29: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

1 Cassandra Data Modeling Methodology

2 The KDM Tool

3 Live Demo: IoT

4 Live Demo: Media Cataloguing

5 Future Work

29 © 2015. All Rights Reserved.

Page 30: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

© 2015. All Rights Reserved. 30

Page 31: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

31 © 2015. All Rights Reserved.

• KDM:

– automates most complex tasks

– eliminates human error

– simplifies data modeling

– guides

– is a general purpose tool

Summary

Page 32: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

32 © 2015. All Rights Reserved.

• build new data models

• verify existing data models

• teach/learn data modeling

How Can KDM Help You?

Page 33: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

1 Cassandra Data Modeling Methodology

2 The KDM Tool

3 Live Demo: IoT

4 Live Demo: Media Cataloguing

5 Future Work

33 © 2015. All Rights Reserved.

Page 34: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Future Work

• Materialized views

© 2015. All Rights Reserved. 34

Page 35: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Future Work

• Materialized views

• User Defined Types

© 2015. All Rights Reserved. 35

Page 36: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Future Work

• Materialized views

• User Defined Types

• Analysis and physical optimization

© 2015. All Rights Reserved. 36

Page 37: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Future Work

• Materialized views

• User Defined Types

• Analysis and physical optimization

• Support for application workflow design

© 2015. All Rights Reserved. 37

Page 38: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Future Work

• Materialized views

• User Defined Types

• Analysis and physical optimization

• Support for application workflow design

• Support for Chebotko Diagrams

© 2015. All Rights Reserved. 38

Page 39: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Sign up for KDM – it’s FREE!

• KDM: kdm.dataview.org

• Methodology: academy.datastax.com

• Planet Cassandra blog posts:

– KDM: An Automated Data Modeling Tool for Apache

Cassandra, Pt. 1, Pt. 2

• Artem Chebotko, Andrey Kashlev, Shiyong Lu,

“A Big Data Modeling Methodology for Apache Cassandra”,

IEEE International Congress on Big Data, 2015.

© 2015. All Rights Reserved. 39

Page 40: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Acknowledgements

• Andrey Kashlev would like to thank:

– Dr. Shiyong Lu

– Anthony Piazza

• Artem Chebotko would like to thank:

– Anthony Piazza

– Patrick McFadin

– Jonathan Ellis

– Tim Berglund

© 2015. All Rights Reserved. 40

Page 41: Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

Thank you