Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in...

31

description

Presenter: Ülker Ciftci, Senior Expert Architect at Turkcell In this session, hear how a leading telecom operator integrates complementary and powerful real-time big data processing technologies such as Apache Kafka, Apache Storm and Datastax Cassandra to build a distributed, fast, fault tolerant and highly scalable mobile marketing platform. Telecom operators as mobile app marketers can better target and offer individualized personalization by collecting customer behavior data and segmenting customers according to behavior. Currently there are 50 mobile applications in Turkcell's Mobile App. Store, and for now, these applications get almost 100 million hits per day. As more mobile applications and more users become involved in the Turkcell Curio, the data set coming from customer behaviours is growing each day. The main challenge facing mobile marketers is the difficulty of real time big data processing which requires low latency, high availability and high scalability. The second requirement, processing a user's action in an "exactly once semantics" for the sake of reliability, is making the challenge even bigger. Turkcell Curio, its name inspired from Mars Rover named Curiosity, is developed within Turkcell to solve these challenges. Curio is now in production, giving Turkcell's Mobile Marketers precious real-time statistics, reports and even chance to interact the online customers via another platform, Turkcell's Push Notifications Platform.

Transcript of Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in...

Page 1: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra
Page 2: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

CURIO:

A Mobile Marketing Platform

Ülker ÖZGEN ÇİFTÇİ

TURKCELL

Page 3: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

About Turkcell

9 COUNTRIES, 71.3 MILLION MOBILE SUBSCRIBERS

EUROPE’S SECOND LARGEST OPERATOR

400 DEVELOPERS

Page 4: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Content

• Mobile marketing platform : Curio

• Curio’s architecture (Storm + Kafka + Cassandra)

• Use cases about Cassandra

Page 5: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

About Curio

• Mobile marketing platform

• Now serving 80+ mobile applications in production

• Nearly 100 million transactions/day

• Real time interaction with users (via Push Notifications)

Page 6: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Example Analytics Data

Page 7: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra
Page 8: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Apache Kafka

• Distributed publish-subscribe messaging system

• Open sourced by Linkedin

Page 9: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Apache Kafka – Features

• Fast

• Scalable

• Durable

• Distributed by Design

Page 10: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Apache Storm

• Distributed fault-tolerant realtime computation system

• Open sourced by Twitter

• Written in Java

Page 11: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Apache Storm – Features

• Runs "Topologies"

• Clustered Structure

• Master node Nimbus

• Worker node Supervisor

• State is kept in Zookeeper

Page 12: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Apache Storm – Features II

• Integrates with any queueing and database system

• Kestrel, RabbitMQ / AMQP, Kafka, JMS..

• Simply connect with your database

• Simple API / Trident API

Page 13: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Apache Storm – Features III

• Scalable

• Benchmarks clocked Storm at over 1.000.000 tuples/second/node

• Fault-tolerant

• Guarantees your data will be processed (exactly once is guaranteed by

Trident API)

Page 14: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Curio Topologies

• Visit Topology (heavy reads & writes to Cassandra)

• 24 parallel and partitioned tasks processing raw data

• 12 parallel consolidating tasks processing the pre-processed data

• Push Topology

• 5 paralel tasks for sending push notifications

Page 15: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Topology To Cassandra

• All stored in Cassandra

• Mobile application launching/closing (creating and ending session)

• Page navigations (creating and ending screen hits)

• Event triggers (creating events)

• Counted values (relevant summary tables)

Page 16: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Curio Use Cases

• Use Case – I

• Calculating online user counts in real time

• Use Case – II

• Calculating active user counts in real time

Page 17: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Use Case I:

Counting Online Users

Page 18: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Requirement

• Counting online users for each mobile application

• Within a session timeout duration a user is online if:

• Opens a session

• Navigates through screens

• Triggers events

Page 19: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

First Implementation

• Store online requests into a single table

• Default compaction strategy for the table is :

SizeTieredCompactionStrategy

Page 20: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

First Implementation (cont.)

• Insert with a TTL for each request that is encountered as "online"

• Do deletion for session end requests

• Use a count query when online counts are requested

Page 21: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

First Problem

• Storm performs insertion, update and deletions to "visit_online" table

• The performance of these queries got 100 times worse than before.

• The cause is stated as "SizeTieredCompactionStrategy"

Page 22: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Solution I

• Use "LeveledCompactionStrategy"

• The storm queries returned back to normal values as 50msec/tuple

Page 23: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Second Problem

• Count queries started getting timeout under heavy traffic

• ex: the applications who has 500.000 transactions/day

Page 24: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

The Solution

• Re-design online table :

• Add new column desc ordered to identify "online" status :

valid_through

• TTL duration is changed to 3 days (no session_timeout any more)

Page 25: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

The Solution (cont)

• Do not perform manual deletions

• Insert single row for each transaction (session, screen, event)

• No more count query! Calculation performed in java by selecting less

records with:

Page 26: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Use Case II:

Counting Active Users

Page 27: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Requirement

• Users have unique "visitor_code" for each application

• The most required report is "active user count" requires performance

• Active users = new users + returning users

• New users can be summed up daily

• Returning users need calculation with time interval

Page 28: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Solution

• Create a new table for storing session intervals

• Use a column for storing previous session time

previous_timestamp

• This column is ordered DESC

Page 29: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Solution

• Calculate returning users by:

• Running query for each date in the selected date interval

• Select records whose previous login time is less than the start time

of query

Page 30: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

View from GUI

Page 31: Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in Apache Kafka, Storm, and Cassandra

Thank You

@ulkeroz