Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

48
Spark Use Case at Telefónica CyberSecurity (CBS) Antonio Alcocer [email protected] Oscar Mendez [email protected] @omendezsoto 1 #CassandraSummit 2014

description

Spark & Cassandra Use Case at Telefónica CyberSecurity (CBS) Antonio Alcocer [email protected] Oscar Mendez [email protected] @omendezsoto #CassandraSummit 2014 1

Transcript of Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Page 1: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Spark Use Case at

Telefónica CyberSecurity (CBS)

Antonio Alcocer [email protected]

Oscar [email protected]@omendezsoto

1#CassandraSummit 2014

Page 2: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

• Stratio is a Big Data Company

• Founded in 2013

• Commercially launched in 2014

• 50+ employees in Madrid

• Office in San Francisco

• Certified Spark distribution

STRATIOWho are we?

#CassandraSummit 2014 2

Page 3: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

3

Page 4: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

General info

o 1924- 2014: 317+ customer with 130.000+ employees

o 2nd European operator by revenues

o 4th global integrated operator by accesses

o 9th Telco in the Global ranking by market capitalization

o 2nd global operator for investment in R+D

#CassandraSummit 2014 4

Page 5: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Present in 24 countries

#CassandraSummit 2014 5

Page 6: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Their main brands

#CassandraSummit 2014 6

Page 7: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Other brands

#CassandraSummit 2014 7

Page 8: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Telefónica Global Solutions

#CassandraSummit 2014 8

Global Security Services

A global infrastructure to

safeguard your business_

Page 9: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Managed Security

#CassandraSummit 2014 9

Page 10: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

CyberSecurity??

10

Page 11: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Why????

#CassandraSummit 2014 11

A picture is worth a thousand words - but a film clip, a million!

Page 12: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Why????

#CassandraSummit 2014 12

A picture is worth a thousand words - but a film clip, a million!

Page 13: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Why????

#CassandraSummit 2014 13

A picture is worth a thousand words - but a film clip, a million!

Page 14: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Don’t worry…

#CassandraSummit 2014 14

Page 15: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

What is Cybersecurity?

#CassandraSummit 2014 15

“Cybersecurity is the collection of tools, policies… capabilities to protect

the cyber environment and organization and user’s assets. Cybersecurity

strives to ensure unauthorized access to, manipulation of the integrity,

confidentiality, or availability of an information, or unauthorized

exfiltration of information.”

What does it mean for us?

No rules, just guidelines.

Page 16: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

An example of threats

#CassandraSummit 2014 16

Cassandra OpsCenter

World mapWordpress

Page 17: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

C* OpsCenter + Shodan

#CassandraSummit 2014 17

Page 18: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

C* OpsCenter + Shodan

#CassandraSummit 2014 18

Page 19: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Another threats

#CassandraSummit 2014 19

Page 20: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

CyberSecurity in

numbers

20

Page 21: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Numbers

#CassandraSummit 2014 21

• DDoS (23%)

• SQLi (19%)

• Defacement (14%)

• Account Hijacking (9%)

• Unknown (18%)

Threats

Page 22: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Looking for unknown threats

#CassandraSummit 2014 22

Page 23: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

What did Telefonica need?

#CassandraSummit 2014 23

Page 24: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Joining efforts

24

Page 25: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Joinnig efforts

Page 26: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Required skills

#CassandraSummit 2014 26

Page 27: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

in

27

Using

Page 28: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Use Case Architecture

#CassandraSummit 2014 28

We have three phases:

• Ingestion: based on Apache Kafka

• Data fusion: based on Apache Storm.

• Batch & Analytics: Based on Cassandra

and Spark

Page 29: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Data Adquisition

#CassandraSummit 2014 29

• Data are in several sources:

• DNS traffic

• IP

• Social media

• Underground sources

• Government sources

• …

• There are several sources consumers pulling the info and

pushing it into a Kafka Cluster

• Sources are heterogeneous and their speed is variable.

Data sources

So

urc

es

So

urc

es

So

urc

es

So

urc

es

So

urc

es

So

urc

es

So

urc

es

So

urc

es

KAFKA

AP

I

Page 30: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Data fusion

#CassandraSummit 2014 30

• We use Storm to process and

normalize the information.

• The system must fire alerts

to the analysts.

• This use case required a Big

Data component capable of

processing the data and

extract its information in real-

time.

• Warnings and alerts are time-sensitive in order to deal efficiently with security attacks.

Page 31: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Batch

#CassandraSummit 2014 31

•The data are saved in

Cassandra.

•We use Cassandra directly for

the easy queries.

•And we used Spark to extract

the information not accessible

to cassandra directly.

Da

ta p

roce

ss

INTEGRATION INTEGRATION INTEGRATION

Page 32: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Why did we use C*?

#CassandraSummit 2014 32

Because we need their features:

• P2P architecture

• Read/write performance

• Fault tolerance

• Easy to deploy

• CQL

Page 33: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Why did we use C*?

#CassandraSummit 2014 33

•And we needed data modeler:

•The data in Storm is normalize by source.

• The primary key is the source key (f.e. IP) and a

time stamp to split the cluster key.

• All the data row have view tables with relationship

between entities: IP, DNS, Domain…

IP timestamp Timesplit … Domain … Table name: IP

Primary Key ((IP, timestamp)timesplit)

Domain timestamp timesplit IP1 … IPn Table name: IP_Domain

Primary Key ((Domain, timestamp)timesplit)

Page 34: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Why did we use C*?

#CassandraSummit 2014 34

IP timestamp Timesplit … Domain … Table name: IP

Primary Key ((IP, timestamp)timesplit)

Domain timestamp timesplit IP1 … IPn Table name: IP_Domain

Primary Key ((Domain, timestamp)timesplit)

IP main table

IP view for domain

Domain timestamp Timesplit … IP … Table name: domain

Primary Key ((domain, timestamp)timesplit)

IP timestamp timesplit domain1 … Domainn Table name: Domain_IP

Primary Key ((IP, timestamp)timesplit)

Domain main table

IP view for domain

Page 35: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

35

What have we

learned?

Page 36: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

THE BEST OF BOTHWORLDS COMBINED

“Two plus two is four? Sometimes… Sometimes it is five.”

G. Orwell

Combination wins

Page 37: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

RISK

Combination = add more and more products to the Stack

Page 38: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Complexity

Platforms hybrid Hadoop + spark

Hybrid = complexity

Page 39: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

RDD-Based Matrices

Interactive

Batchprocessin

g

Streamprocessing

BatchInteractive [SQL]StreamingMachine Learning

Learn just one systemDevelop within one frameworkDeploy/Manage just one system

Databricks co-founder & CTO Matei Zaharia(source)

Why Spark

1 One stack to rule them all

Page 40: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Be rational not only emotional

Page 41: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

The only Pure Spark processing

No Hadoop elements

+10year old constraints

Page 42: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Lean simplicity

Pure Spark PlatformFormer Hadoop or

Hybrid Hadoop-Spark Platforms

Lean = Easier deployment, management, and use of the system

Page 43: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

STRATIOADMIN

STRATIODATAVIS

STRATIOINGESTION

STRATIOCROSSDATA(SPARK)

CASSANDRAMONGO DBELASTICSEARCHHDFS

STRATIOSTREAMING(SPARK STREAMING, SIDDHI)

Not to make a POC, but a real project for a Big Company is

very demanding

SPARKCERTIFIED

Page 44: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Multiple Combination

https://github.com/Stratio/stratio-meta

API

Elastic S

Page 45: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Full text search + queries

C*

node

C*

node

Lucene

index

C*

node

Lucene

index

C*

node

Lucene

index

Lucene

index

C*

node

Lucene

index

SELECT * FROM logsWHERE description

MATCH‘*Exception’ ;

Page 46: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Stratio Streaming

•Start using Spark Streaming for

doing some Complex Event

Processing operations.

https://github.com/Stratio/stratio-streaming

Page 47: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

DATA JOURNEY THROUGH TIME

PAST

PRESENT FUTURE

Storeddata

Real Time Data

Streaming

ML Algorithms

EphemeralTables

StoredTables

SQL combination: Done

SQL combination: In progress

Quantum Tables

Page 48: Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Thanks in advance

#CassandraSummit 2014 48