Kylin Engineering Principles

32
Engineering Principles of Kylin October 2014 1 Jiang Xu

Transcript of Kylin Engineering Principles

Page 1: Kylin Engineering Principles

Engineering Principles of Kylin

October 2014

1

Jiang Xu

Page 2: Kylin Engineering Principles

2

Done is better than perfect!

Page 3: Kylin Engineering Principles

3

How to get product ideas?

Page 4: Kylin Engineering Principles

Get ideas from real product problems

• We get this ideas from ??? project & MicroStrategy limitation :

– Although data is on-boarding to hadoop, how to access data is a big issue. Hive is too slow!

– Although MicroStrategy is fast, MicroStrategy can’t handle 2+ billion records

– Although there are lots of SQL-on-Hadoop solutions, they can’t guarantee the low latency for big query

• Lesson learned

– Try to get ideas from customer’s pain point

– Always get ideas from real product problems

4

Page 5: Kylin Engineering Principles

Thinking as product instead of project

• We think to build a generic product or platform

– Standard: ANSQL SQL

– Full Stack: ODBC/JDBC for BI tools integration

– …

• Lesson Learned:

– When you get ideas, try to think about a product or platform

– Product is more generic and is easy to adopt in long term

5

Page 6: Kylin Engineering Principles

Control scope to build best solution

• Due to the time and resource limitation, we must control the scope of product

– Focus on MOLAP instead of HOLAP

– Focus on Tableau instead of MicroStrategy

– Don’t support real-time

– Don’t support full SQL

• Try to build best solution for a “small problem”

6

Page 7: Kylin Engineering Principles

Reference the industry solution & academic papers

• Study industry analysis report

– Gartner

– Forrester

– …

• Study existed solution

– Google BigQuery

– Google Dremel & PowerDrill

– SQL-on-Hadoop (Hive, Presto, Phoenix, Druid…)

• Study academic papers

– Data Mining Concepts and Techniques, 3rd

– Lost of papers on data cube, OLAP…

7

Page 8: Kylin Engineering Principles

8

How to setup a team?

Page 9: Kylin Engineering Principles

Find the right people

• Due to the complexity of this product, we put lots of efforts to setup the team

– Smart

– Diligent

– Solid CS background

– Matching the team’s chemistry

• Try to use your connection to find the good candidate

– Find a very good team member by friend

• Try to give a tough interview to find the good candidate

– Give a 2+ hours 1:1 interview to find a good member, mostly on coding, algorithm and problem solving

9

Page 10: Kylin Engineering Principles

Assign the right tasks to the right people

• Assign the components based on the team member’s capability and interesting

• All member have to do the dirty work

• All member have the opportunity to do challenge tasks.

• People have to prove himself to take more challenge task

10

Page 11: Kylin Engineering Principles

Lead by example

• Leader Knows Details, Leader Writes Code

• If you want the team member to follow the engineering principle, the leader must follow it firstly.

– For example, the test driven development, the leader must write test case firstly.

• Lead should take nobody-wanted tasks

– Support

– Testing

– Customer onboard

– …

11

Page 12: Kylin Engineering Principles

12

How to design a product?

Page 13: Kylin Engineering Principles

Done is better than perfect

• It’s easy to design a “perfect” system. But it’s hard to design a feasible system!

• Due to resource limitation, we must guarantee that the design can be done by the team.

• Don’t do everything average. Try to do one thing best!

13

Page 14: Kylin Engineering Principles

KISS – Keep it simple stupid

• Designing a simple system is much challenge than a complex system.

– Give simple solution to complex problem;

– Build a system that is easy to maintain and extend over time

• For example, Kylin has a very simple deployment architecture: just web server besides hadoop

14

Page 16: Kylin Engineering Principles

Don’t reinvent wheels

• Try to reuse the existed open source product

– Calcite

– Hive

– MapReduce

– HBase

– …

• Try to reference the existed solution

– Bias error in Hyperloglog

• Google Hyperloglog++

• Facebook Presto: magic parameter

16

Page 17: Kylin Engineering Principles

80-20 Rule

• Put 80% efforts to develop 20% most important features

• What should be done?

– ODBC driver

– Analytic SQL: groups, aggregation, filter, join, projection, sub-query…

– …

• What shouldn’t be done?

– BI tools

– Full ANSI SQL

– …

17

Page 18: Kylin Engineering Principles

Explain your design in simple words

• If you can’t explain to your peers with simple words, there must be something wrong.

• Challenge each other!

• Good design is involving!

18

Page 19: Kylin Engineering Principles

Build a workable prototype

• Paper work can’t verify your design

• Only the workable prototype can validate your design.

• We use 1 month to build a workable prototype

– SQL is parsed by hand-written ANTLR

– Cube is built by simple map-reduce scripts

19

Page 20: Kylin Engineering Principles

20

How to develop a product?

Page 21: Kylin Engineering Principles

Automate Test

• Auto integration testing >> auto unit test

– No mock!

– Test on live system

– Each case cover one user case

• 1+ auto test for each feature & 1+ auto test for each bug fix

• Reusing a golden-standard test sample will simplify the test cases building

• Automate everything

– Compare SQL result between H2 and Hadoop

21

Page 22: Kylin Engineering Principles

Code Review - Simple is Beautiful

• Code is clear to read and easy to change

• If I have problem understanding your code, FIX it!

– One class has > 1 responsibilities

– Code looks complex

– Not easy to do enhancement

– Duplicate logic

– Package organization looks messy

– …

22

Page 23: Kylin Engineering Principles

Code Review – Buddy Programming

• Can Code Review find Bugs ??? – NO !!!

• How can we find Bugs

– Testing as a customer with vertical use case

– You write first version, I write second version

– Each component has 2+ owner

23

Page 24: Kylin Engineering Principles

Continues Code Refactoring

• If other people have problem understanding your code, REFACTOR it!

• Comprehensive auto test suite make refactor much easy

24

Page 25: Kylin Engineering Principles

DevOps – Develop For Operation

• Logging every important information

• Export every important metrics

• Easy to trouble shooting

• Easy to monitor

• One-liner installation

25

Page 26: Kylin Engineering Principles

Performance Tuning - Question Everything

• System Level

– CPU, Memory

• JVM Level

– GC: Calcite generate code and use up perm generation that trigger full GC

– Java Profile to question yourself every hotspot

– Remove hotspot One by One

• Hadoop

– Data Skew

– MapReduce Job Tuning

– …

• Algorithm

– Hyperloglog26

Page 27: Kylin Engineering Principles

Open Source Adoption

• Open source software is budget-free, but isn’t bug-free

• We fix lots of bugs

– Calcite

– Trev4j

– Hyperloglog

– …

27

Page 28: Kylin Engineering Principles

28

How to on-board a product?

Page 29: Kylin Engineering Principles

Customer is 1st priority

• Work with customer closely

– Help customer to design cube

– Refine requirements to reduce complexity

• because we make impossible become possible

• Fix bug quickly

– Fix product bus is more important than feature development

• Continues Improvement

– Prioritize the customer requirement

– Give a workable solution quickly, then improve it later.

• Specific requirement vs. Generic requirement

– Do your best to give generic solution for specific requirement

– Say NO to very specific solution

29

Page 30: Kylin Engineering Principles

2+ Cases for Product Verification

• To develop a good product, we need at least 2+ use case to verify and finalize our design.

• Try to find different use cases to verify product

– Transaction Data

– Behavior Data

30

Page 31: Kylin Engineering Principles

Usability is Key

• Usability is key for customer onboarding

• Easy used GUI to hide the complexity concepts

• …

31

Page 32: Kylin Engineering Principles

Q & A

32