Log everything! @DC13

Log Everything! @DC13

Stefan & Mike

Mike Lohmann Co-Founder / Software Engineer

mike.lohmann@deck36.de

Dr. Stefan Schadwinkel Co-Founder / Analytics Engineer

stefan.schadwinkel@deck36.de

ABOUT DECK36 Who We Are –  DECK36 is a young spin-off from ICANS

–  Small team of 7 engineers

–  Longstanding expertise in designing, implementing and operating complex web systems

–  Developing own data intelligence-focused tools and web services

–  Offering our expert knowledge in Automation & Operations, Architecture & Engineering, Analytics & Data Logistics

–  Log everything! – The Data Pipeline.

–  Tackling the Leviathan – Realtime Stream Processing with Storm.

–  JS Client DataCollector: Live Demo

–  Storm Processing with PHP: Live Demo

WHAT WE WILL TALK ABOUT Topics

Log everything! The Data Pipeline

THE DATA PIPELINE Requirements Background: Building and operating multiple education communities

Baseline: PokerStrategy.com KPIs

–  6M registered users, 700k posts/month, 2.8M page impressions/day, 7.6M requests/day

New products à New business models à New Questions

–  Extendable generic solution

–  Storage and accessability more important than specific, optimized applications

Analytics

Producer

Transport

Storage

Realtime Stream Processing

Producer

–  Monolog Plugin, JS Client

Transport

–  Flume 0.9.4 m( à RabbitMQ, Erlang Consumer

–  Evaluated Apache Kafka

Storage

–  Hadoop HDFS (our very own) à Amazon S3

Analytics

Producer

Transport

Storage

THE DATA PIPELINE Requirements

THE DATA PIPELINE Logging Pipeline

Analytics

-  Hadoop MapReduce à Amazon EMR, Python, R

-  Exports to Excel (CSV), Qlikview à Amazon Redshift

-  Twitter Storm

Analytics

Producer

Transport

Storage

Analytics

Producer

Transport

Storage

THE DATA PIPELINE Unified Message Format -  Fixed, guaranteed envelope

-  Processing driven by message content

-  Single message gets compressed (LZOP) to about 70% of original size "(1184 B à 817 B)

-  Message bulk gets compressed to about 12-14% of original size "(@ 42k & 325k messages)

Unified Message Form

THE DATA PIPELINE Compaction RabbitMQ consumer (Erlang) stores data to cloud

-  Relatively large amount of files

-  Mixed messages

We want

-  A few files

-  Messages grouped by „Event Type“ and „Time Partition“

-  Data transformation

s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo

Hive partitioning!

Determined by message content

THE DATA PIPELINE Compaction Using Cascalog

-  Based on Clojure (LISP) and Cascading

-  Provides a Datalog-like query language

-  Don‘t LISP? à JCascalog

Very handy features (unavailable in Hive or Pig)

-  Cascading Output Taps can be parameterized by data records

-  Trap location for corrupted records (job finishes for all the correct messages)

-  Runs within the JVM à large available codebase, arbitrary processing is simple

Cacalog Query Syntax

Cascalog is Clojure, Clojure is Lisp

(?<- (stdout) [?person] (age ?person ?age) … (< ?age 30))

Query Operator

Cascading Output Tap

Columns of the dataset generated

by the query

„Generator“ „Predicate“

-  as many as you want

-  both can be any clojure function

-  clojure can call anything that is

available within a JVM

Cacalog Query Syntax

Run the Cascalog processing on Amazon EMR:

./elastic-mapreduce [standard parameters omitted]

--jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar

--main-class icans.cascalogjobs.processing.compaction

--args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error

The Data Pipeline Data Queries with Hive Hive is table-based and provides SQL-like syntax

-  Assumes one storage location (directory) per table

-  Simple to use if you know SQL

-  Widely used, rapid development for „simple“ queries

Hive @ Amazon

-  Table locations can be S3

-  „Cluster on demand“ à requires to rebuild Hive metadata

-  CREATE TABLE for source and target S3 locations

-  Import Table metadata (auto-discovery for partitions)

-  INSERT OVERWRITE to query source table(s) and store to target S3 location

Hive @ Amazon (1)

Hive @ Amazon (2)

We can now simply copy the data from S3 and import into any local analytical tool e.g. Excel, Redshift, QlikView, R, etc.

Further Reading

-  More details in the Log Everything! ebook

-  Available at Amazon and DeveloperPress

THE DATA PIPELINE Still: It’s Batch Processing

-  While quite efficient in flight, the logistics of getting the job started are significant.

-  Only cost-efficient for long distance travel.

THE DATA PIPELINE Instant Insight through Stream Processing

-  Often, only updates for the recent day, week, or month are necessary

-  Time is of importance when direct feedback or user interaction is desired

More Wind In The Sails With Storm

-  Distributed realtime processing framework

-  Battle-proven by Twitter

-  All *BINGO-Abilities fulfilled!

-  Hadoop = data batch processing; Storm = realtime data processing

-  More (and maybe new) *BINGO: DRPC, ETL, RTET, Spouts, Bolts, Tuple, Topology

-  Easy to use (Really!)

REALTIME STREAM PROCESSING Instant Insight through Stream Processing

Realtime Stream Processing Infrastructure with Storm

Producer Transport

Nimbus (Master)

Zookeeper

Supervisor

Worker Worker

Worker

NodeJS

Realtime Data Stream Analytics

Zabbix Graylog

Storage

Analytics

Storm-Cluster

Apps &Server

REALTIME STREAM PROCESSING JS Client Features -  Event system

-  Master/Slave Tabs

-  Local queuing of data

-  Ability to use node modules

-  Easy to extend

-  Complete development suite

-  Deliver bundles with vendors or not

Realtime Stream Processing - Loading the JS Client

NodeJS

https://../starlog-client.min.js

Set-Cookie:UUID starlog-client.min.js

/socket.io/1/websockets Upgrade: websockets

Cookie: UUID

HTTP 101 – Protocol Change Connection: Upgrade Upgrade: websocket

Established connection

Create signed cookie

Check cookie

Sending data in UMF

Queue Sending data to the client

Counts

Collecting Data

Backend Magic

Realtime Stream Processing - JS Client in action

UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge

ClickEvent collector register onclick Event

localstorage

Clicked-Data

SocketConnect

Clicked-Data

NodeJS

Clicked-Data-UMF

Realtime Stream Processing - JS Client in action

Client Live Demo

https://localhost:3001/test/1-page-stub.html

REALTIME STREAM PROCESSING Producer Libraries -  LoggingComponent: Provides interfaces, filters and handlers

-  LoggingBundle: Glues all together for Symfony2

-  Drupal Logging Module: Using the LoggingComponent

-  JS Frontend Client: LogClient Framework for Browsers

https://github.com/ICANS/IcansLoggingComponent

https://github.com/ICANS/IcansLoggingBundle

https://github.com/ICANS/drupal-logging-module

https://github.com/DECK36/starlog-js-frontend-client

Realtime Stream Processing - PHP & Storm

UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge

Using PHP for that! https://github.com/Lazyshot/storm-php/blob/master/lib/storm.php

Clicked-Data-UMF

Event: „Star Trek Commander“ Badge

Storm & PHP Live Demo

REALTIME STREAM PROCESSING Get Inspired! Powered-by Storm: https://github.com/nathanmarz/storm/wiki/Powered-By

-  50+ companies (Twitter, Yahoo, Groupon, Ooyala, Baidu, Wayfair, …)

-  Ads & real-time bidding, Data-centric (Economic, Environmental, Health), User interactions

Language-agnostic backend systems (Operate Storm, Develop in PHP)

Streaming „counts“: Sentiment Analysis, Frequent Items, Multi-armed Bandits, …

DRPC: Custom user feeds, Complex Queries (i.e. trace graph links)

Realtime, distributed ETL

-  Buffering / Retries

-  Integrate Data: Third-party API, Machine Learning

-  Store to DBs, Search engines, etc

Questions?

Thanks a lot!

You can find us:

github.com/DECK36

info@deck36.de

deck36.de

Log everything! @DC13

Technology

Transcript of Log everything! @DC13

dC13 083a. 368 kW (500 hp) - Scania Group · SCANIA industrial engines dC13 083a. 368 kW (500 hp) us tier 4i, eu stage iiiB Standard equipment • Scania Engine Management System,

Solar: Towards a Shared-Everything Database on Distributed ...lifeifei/papers/solar.pdf · Solar: Towards a Shared-Everything Database on Distributed Log-Structured Storage Tao Zhu1,

Dc13/12 Newport Forest Bulletin Monitoring Natureakd/newport-forest/2012_12_13.pdf · Dc13/12 Newport Forest Bulletin # 856 Monitoring Nature Date and Time: Thursday December 13 2012

This reading log belongs to - advancedenglishllnadvancedenglishlln.weebly.com/uploads/3/8/9/4/38949699/skellig... · This reading log belongs to: Everything seemed to have been going

DC13 087A. 257 kW (350 hp) - Scania Group...DC13 087A. 257 kW (350 hp) EU Stage IV, US Tier 4f Standard equipment • Scania Engine Management System, EMS • Extra high pressure fuel

Everything You Always Wanted to Know about Log … copy available at: 1752115 Everything You Always Wanted to Know about Log Periodic Power Laws for Bubble Modelling but Were Afraid

dC13 080a. 264 kW (360 hp) - Scania Group€¦ · dC13 080a. 264 kW (360 hp) us tier 4i, eu stage iiiB Standard equipment • Scania Engine Management System, EMS • Extra high pressure

THE POWER OF CONNECTIVITY - atlascopco.com · SCANIA DC09 072A SCANIA DC13 072A SCANIA DC13 072A Speed rpm 1800 1800 1800 1800 1800 1800 Speed rpm 1800 1800 1800 1500 1800 1500 1800

dC13 084a. 294 kW (400 hp) - Loftin Equipment · SCANIA industrial engines se 151 87 södertälje, sweden telephone +46 8 553 810 00 telefax +46 8 553 829 93 engines@scania.com dC13

Manual Del Operador Scania DC13-072A.pdf

Sk8ing on Thin Ice: A Crash Course in Kubernetes & Security · Source: Verizon DBIR - 2016. The retirement benefit that benefits everyone 19 Watch Everything Monitoring Log everything.

dC13 083a. 405 kW (550 hp)

dC13 087a. 257 kW (350 hp) - Loftin Equip · 2015. 10. 30. · SCANIA industrial engines dC13 087a. 257 kW (350 hp) us tier 4F, eu stage iV Standard equipment • Scania Engine Management

dC13 070a. 405 kW (550 hp) - Kraft Power · SCANIA industrial engines dC13 070a. 405 kW (550 hp) us tier 4i, eu stage iiiB Standard equipment • Scania Engine Management System,

DC13 087A. 257 kW (350 hp) - Collicutt · DC13 087A. 257 kW (350 hp) EU Stage IV, US Tier 4f Standard equipment • Scania Engine Management System, EMS • Extra high pressure fuel

Sprite LFS or Let's Log Everything

Use Amazon Elasticsearch Service to Log and Monitor ... · Amazon Web Services – Use Amazon Elasticsearch Service to Log and Monitor (Almost) Everything Page 1 Introduction AWS

MUNICIPAL DEMARCATION BOARD REPORT ON ACTIVITIES THE ... · DC12 Amatola District Amatole 73 DC13 Stormberg District North East (DC13) 38 DC14 Drakensbe rg District Ukwahlamba 23

Dc13 084 a_294kw_scr_tcm133-411084-1

DC13 084A. 331 kW (450 hp) - Scania Group · DC13 084A. 331 kW (450 hp) EU Stage IV, US Tier 4f Standard equipment • Scania Engine Management System, EMS • Extra high pressure