Stream processing and Norikra

46
@tagomoris Norikra meetup 2014/07/09 Stream processing and Norikra 1479日水曜日

description

 

Transcript of Stream processing and Norikra

Page 1: Stream processing and Norikra

@tagomorisNorikra meetup 2014/07/09

Stream processingand Norikra

14年7月9日水曜日

Page 2: Stream processing and Norikra

TAGOMORI Satoshi (@tagomoris)LINE Corporation

Analytics Platform Team

14年7月9日水曜日

Page 3: Stream processing and Norikra

THE ONE THINGWHAT YOU MUST LEAN TODAY IS

14年7月9日水曜日

Page 4: Stream processing and Norikra

Norikra

14年7月9日水曜日

Page 5: Stream processing and Norikra

NorikraIS NOT

Norika14年7月9日水曜日

Page 6: Stream processing and Norikra

STREAM PROCESSING

14年7月9日水曜日

Page 7: Stream processing and Norikra

Processing models

Batch processing:

RDBMS, Hadoop(Hive), BigQuery/RedShift

Stream processing:

Storm, Spark streaming, Norikra

14年7月9日水曜日

Page 8: Stream processing and Norikra

Batch processing

RDBMS, Hadoop/Hive, ....(transaction is out of this topic)

Target window: hours - weeks (or more)

Total throuput: HIGHEST

Query Latency: LARGEST (20sec - mins - hours)

14年7月9日水曜日

Page 9: Stream processing and Norikra

Stream processing

Storm, Esper, Norikra, Fluentd, ....

Kafka(?), Spark streaming(?)

Target window: seconds - hours

Total throughput: Normal

Query latency: SMALLEST (milliseconds)

Queries must be written BEFORE DATA

Once registered, runs forever

14年7月9日水曜日

Page 10: Stream processing and Norikra

Data flow and latencydata windowquery execution

Batch Stream

incrementalquery exection

14年7月9日水曜日

Page 11: Stream processing and Norikra

Query for stored datav1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

table

At first, all dataMUST be stored.

14年7月9日水曜日

Page 12: Stream processing and Norikra

Query for stored datav1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table

WHERE v3=’x’ GROUP BY v1,v2

table

14年7月9日水曜日

Page 13: Stream processing and Norikra

Query for stored datav1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table

WHERE v3=’x’ GROUP BY v1,v2

table

SELECT v4,COUNT(*)FROM table

WHERE v1 AND v2 GROUP BY v4

14年7月9日水曜日

Page 14: Stream processing and Norikra

Query for stored datav1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table

WHERE v3=’x’ GROUP BY v1,v2

table

SELECT v4,COUNT(*)FROM table

WHERE v1 AND v2 GROUP BY v4

“All data” means“data that not be used”.

14年7月9日水曜日

Page 15: Stream processing and Norikra

Query for stream data

v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

stream

SELECT v4,COUNT(*)FROM table.win:xxx

WHERE v1 AND v2 GROUP BY v4

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

14年7月9日水曜日

Page 16: Stream processing and Norikra

Query for stream data

v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

stream

SELECT v4,COUNT(*)FROM table.win:xxx

WHERE v1 AND v2 GROUP BY v4

v1,v2,v3

v1,v2,v4v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

14年7月9日水曜日

Page 17: Stream processing and Norikra

v1,v2,v3,v4,v5,v6

Query for stream dataSELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

stream

SELECT v4,COUNT(*)FROM table.win:xxx

WHERE v1 AND v2 GROUP BY v4

v1,v2,v3

v1,v2,v4v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

14年7月9日水曜日

Page 18: Stream processing and Norikra

v1,v2,v3,v4,v5,v6

Query for stream dataSELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

stream

SELECT v4,COUNT(*)FROM table.win:xxx

WHERE v1 AND v2 GROUP BY v4

v1,v2,v3

v1,v2,v4

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

All data will be discardedjust after inserted.

(Bye-bye storage system maintenance!)

14年7月9日水曜日

Page 19: Stream processing and Norikra

Incremental calculation

v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

streamv1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

internal data (memory)

v1 v2 COUNT

TRUE TRUE 0

TRUE FALSE 1

FALSE TRUE 33

FALSE FALSE 2

14年7月9日水曜日

Page 20: Stream processing and Norikra

Incremental calculation

v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

streamv1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

internal data (memory)

v1 v2 COUNT

TRUE TRUE 1

TRUE FALSE 1

FALSE TRUE 33

FALSE FALSE 2

14年7月9日水曜日

Page 21: Stream processing and Norikra

Incremental calculation

v1,v2,v3,v4,v5,v6

SELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

stream

v1,v2,v3,v4,v5,v6

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

internal data (memory)

v1 v2 COUNT

TRUE TRUE 1

TRUE FALSE 1

FALSE TRUE 34

FALSE FALSE 2

14年7月9日水曜日

Page 22: Stream processing and Norikra

Incremental calculationSELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

stream

v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6v1,v2,v3,v4,v5,v6

internal data (memory)

v1 v2 COUNT

TRUE TRUE 1

TRUE FALSE 2

FALSE TRUE 37

FALSE FALSE 3memory can store

internal data

14年7月9日水曜日

Page 23: Stream processing and Norikra

Data windowTarget time (or size) range of queries

Batch (or short-batch)

FROM-TO: WHERE dt >= ‘2014-07-07 00:00:00‘

AND dt <= ‘2014-07-08 23:59:59’

Stream

“Calculate this query for every 3 minutes”

Extended SQL required SELECT v1,v2,COUNT(*)FROM table.win:xxx

WHERE v3=’x’ GROUP BY v1,v2

14年7月9日水曜日

Page 24: Stream processing and Norikra

Stream processing with SQLEsper: Java library to process StreamWith schema

14年7月9日水曜日

Page 25: Stream processing and Norikra

Stream processing with SQLEsper: Java library to process StreamEsper EPL

SELECT param1, param2FROM tblWHERE age > 30

14年7月9日水曜日

Page 26: Stream processing and Norikra

Stream processing with SQL

SELECT param, COUNT(*) AS cFROM tblWHERE age > 30GROUP BY param

Esper: Java library to process StreamEsper EPL

14年7月9日水曜日

Page 27: Stream processing and Norikra

Stream processing with SQL

SELECT param, COUNT(*) AS cFROM tbl.win:time_batch(1 hour)WHERE age > 30GROUP BY param

Esper: Java library to process StreamEsper EPL

14年7月9日水曜日

Page 28: Stream processing and Norikra

14年7月9日水曜日

Page 29: Stream processing and Norikra

Norikra:Schema-less Stream Processing with SQL

Server software, runs on JVM

Open source software (GPLv2)

http://norikra.github.io/

https://github.com/norikra/norikra

14年7月9日水曜日

Page 30: Stream processing and Norikra

Norikra:Schema-less event stream:

Add/Remove data fields whenever you wantSQL:

No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/ UDF (in Java/Ruby from rubygem)

Truly Complex events:Nested Hash/Array, accessible directly from SQL

HTTP RPC w/ JSON or MessagePack (fluentd plugin available!)

14年7月9日水曜日

Page 31: Stream processing and Norikra

How to setup Norikra:

Install JRubydownload jruby.tar.gz, extract it and export $PATH‘rbenv install jruby-1.7.xx’ & ‘rbenv shell jruby-..’

Install Norikra

‘gem install norikra’

Execute Norikra server

‘norikra start’

14年7月9日水曜日

Page 32: Stream processing and Norikra

Norikra Interface:Command line: norikra-client

norikra-client target open ...

norikra-client query add ...

tail -f ... | norikra-client event send ...

WebUI

show status

show/add/remove queries

HTTP API

JSON, MessagePack

14年7月9日水曜日

Page 33: Stream processing and Norikra

Norikra Queries: (1)

SELECT name, ageFROM events

target

14年7月9日水曜日

Page 34: Stream processing and Norikra

Norikra Queries: (1)

SELECT name, ageFROM events

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”}

{“name”:”tagomoris”,”age”:34}

14年7月9日水曜日

Page 35: Stream processing and Norikra

Norikra Queries: (1)

SELECT name, ageFROM events

nothing

{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”}

14年7月9日水曜日

Page 36: Stream processing and Norikra

Norikra Queries: (2)

SELECT name, ageFROM events

WHERE current=”Tsukuba”

{“name”:”tagomoris”,”age”:34}

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”}

14年7月9日水曜日

Page 37: Stream processing and Norikra

Norikra Queries: (2)

SELECT name, ageFROM events

WHERE current=”Tsukuba”

nothing

{“name”:”kawashima”, “age”:99, “address”:”Tsukuba”, “corp”:”Univ”, “current”:”Dream”}

14年7月9日水曜日

Page 38: Stream processing and Norikra

Norikra Queries: (3)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

14年7月9日水曜日

Page 39: Stream processing and Norikra

Norikra Queries: (3)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...

every 5 mins

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”}

14年7月9日水曜日

Page 40: Stream processing and Norikra

Norikra Queries: (4)

SELECT age, COUNT(*) as cntFROM

events.win:time_batch(5 mins)GROUP BY age

{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...

SELECT max(age) as maxFROM

events.win:time_batch(5 mins)

{“max”:51}

{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”}

every 5 mins14年7月9日水曜日

Page 41: Stream processing and Norikra

Norikra Queries: (5)

SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY age

{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Tsukuba”, “speaker”:true, “attend”:[true,true,false, ...]}

14年7月9日水曜日

Page 42: Stream processing and Norikra

Norikra Queries: (5)

SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Tsukuba”, “speaker”:true, “attend”:[true,true,false, ...]}

14年7月9日水曜日

Page 43: Stream processing and Norikra

Norikra Queries: (5)

SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)

WHERE current=”Tsukuba”AND attend.$0 AND attend.$1

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...]}

14年7月9日水曜日

Page 44: Stream processing and Norikra

Use cases in real world

Enjoy following sessions!

14年7月9日水曜日

Page 45: Stream processing and Norikra

More queries, more simplicityand less latency.

Thanks!

14年7月9日水曜日