Fluturas presentation @ Big Data Conclave
-
Upload
fluturads -
Category
Technology
-
view
360 -
download
0
description
Transcript of Fluturas presentation @ Big Data Conclave
Agenda
• 3 Industries , 5 real life Flutura user stories
• 7 Key “Gotchas” & Big Data Best Practices
Case Study-1 : Reducing Network threats by Detecting Patterns in
perimeter device logs
What is the Biz problem being solved ?
What is the problem being solved?
Network threats are growing ...
What is the problem being solved?
• 2 types of threats – Internal ( Social Unrest & Watch List ) & External ( Hackers )
External hackers Internal Activists
Who is experiencing the pain ? Telecom Security Operations centre
Lots of Telecom Machine data left untapped !
This is typically flushed but has gold in it
Why is it important to solve this problem?
• Reduces network disruption from hackers
• Minimize social disruption and unrest
Traditional RDBMS architectures cant handle high velocity machine data !
SOC's cant see threat patterns … running BLIND
• Being Blind = Risk • BeingCannot be blind to patterns anymore • The capability to “see” patterns previously not seen • Network activity and behaviour – Firewalls , routers • Saves lives, provides social stability – WL Chatter !
Capability to remove “data blind folds” to “SEE” behavioural patterns key to
security
MACHINE DATA
KEY TO UNCOVERING
SECURITY PATTERNS !
What are some “behavioural signatures” ?
1. Sudden increase in you tube uploads @ night
1. Viral Rate of propagation of MMS videos
So what does the data look like ? National content filtering log – 1 billion events/day !
16
1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37
1 2 3 4 5 6
Decoding 7 components of the Netsweeper log entry
7
EPOCH Time stamp
URL requested Source IP Client
subnet Client group
name 0 allowed 1 denied
URL Category Descp tbd
50 categories in the system
Education, Pornoraphy, Phishing, Criminal Skills etc
23" - Its related to "Pornography “45" - Its related to "GENERAL"
Timestamp URL requested Source IP Client Subnet Client Group Name Denied flag URL Categort
Decoding National content filtering logs
Expand to ingest variety of watched events
File Delete Events
User Login Failure Events
Root access Failures
2 Sigma events
Table Drop Events
Table Delete Events
Column Drop Events
Critical Proc recompilation
OS logs Database logs
Critical tsn value changes
Master data changes
App login failures
Login at unusual time windows
Application logs
Search for specific keywords
2 Sigma event for URL’s
Decomp tree- failed reqsts
Login Failure
Web server logs
Dropped call frequency
Watch List inbound/outbound
Cut calls - poor connection
Call Failure event frequency
Timeout event frequency
Swarm event detected
Dropped IP calls frequency
Failed IP call frequency
CDR logs IPR logs
SMS Capacity events
Unusual sms traffic events
User defined router events
Compliance related router event
Router logs
Odd hour Unsuccessful logins
X happens Y times in Z time
User defined firewall events
Compliance oriented firewall e
Firewall logs
Frequency of login failures high in a certain pockets Recency of late night events noticed in certain pockets Certain corridors experiencing high dropped calls
Converting raw data Actionable Intelligence
INTEGRATED
EVENT 360
REPOSITORY
SENSE &
RESPOND
LAYER
LOG FILE
INGESTION
MACHINE LEARNING
ALGORITHMS ON
GRANULAR LOG
EVENT DATA
INFER INTENT FROM
PATTERNS
AND CREATE EVENT
PROFILES
LOAD RISK /
BEHAVIOR PROFILE
TO RULES ENGINE
DB
INTERCEPT OR
OFFLINE REVIEW OF
EVENTS
CONSOLIDATE & REVIEW
EVENT INTERCEPTS TO
ASSESS EVENT RULE
EFFECTIVENESS
MEASURE PATTERN RULE
EFFECTIVENESS
- TRUE POSITIVE / FALSE
POSITIVES
CASE MANAGEMENT
WORKFLOW
TELECOM SWITCHES OTHER DEVICES •CDR LOG FILES •IP LOG FILES •MISC LOG FILES
Holistic Value Chain
BIG DATA
REPOSITORY
Case Study-2 : Decoding travellers intent
What's the problem we are trying to solve ?
• Travellers are “signalling” to us thru the behaviour they exhibit
• OTA is unable to sense n respond to these varied behaviour
Why is it important to solve this problem ?
• Impacts look to book
• Increase revenue from cross sell
Srikanth intends to travel from San Fran to NYC
Srikanth searches !
Srikanths First Moment of Truth !
Srikanth sees the options rendered !
Is Srikanth Price Sensitive or Time conscious traveller?
87 % 13%
Does Srikanth have a bias towards any
airline ?
Those small clicks reveal a lot !
So who is Srikanth? Do we 'know' him ?
What's his behavorial DNA ? Key vectors ?
Early bird ( days = 21 ) Price insensitive ( click % = 89 %) Prefers American Airlines Most valuable customer ( Decile-1 ) Intra visit interval = 17 days Visit dispersion = 12 % International Churn propensity = 0 Bargain hunter = No ( 3 % coupon) Roadie = Yes ( 28000 miles per qtr ) Sentiment index = 73 %
How do we respond in real time to Srikanths experience and behavioural patterns we’ve seen ?
• If Srikanth is a high value customer
• If he does not book within 8 min window
• In real time route to high performing agent
• Short circuit the queue
• Extra 10 % discount since he is vulnerable
• If search response time velocity is trending downward
• Signal to beef up infrastructure
• Optimise code base
• Property recommendations
Case Study-3 : Watched List
What is the problem being solved?
• Internal watch lists
• Can we get e signals in their behavior ? Call patterns ?
SMS patterns ?
Youtube upload patterns ?
Watched countries ?
Intrawatch list chatter ?
Late night communication behavior ?
• Watch list activity intelligence takes 6 weeks
• Bring it down to < day
• Enhance it to make it real time
Why is it important to solve this problem ?
• Threat signals are there in telecom and communication logs
• Saves lives !
• Ensures national
security !
Under the hood
• Remote Authentication Dial-In User Service (RADIUS) provide authentication, authorization and accounting for network access.
• When a user wants to get access to the Internet he will first have to give his users
credentials (in most cases username and password) to a local RADIUS client.
Deconstructing Radius Logs
The IP address of the NAS ( Network Access server ) that is sending the request
The framed address to be configured for the user
3 time stamps
User Identity
Radius logs Netsweeper logs
Subscriber database
Rich Security intelligence !
Triangulate from 3 event data pools
Access/Device
Framed IP address
Customer ethnicity
URL accessed
Date/time
Day
Week
Client IP address
Customer type
Customer browse location
Post paid Subscriber Database
1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37
Status
Enterprise
Residential
Asian
European
Dubai
Smart Phone
Desktop
Ipad
Others
URL Type
Gaming sites
News sites
Others
?
? Yes
No
Business rule to derive access device to be elicited from
SME
Location mapping business logic to be elicited from SME
Social Networking
Blogs
P2P sites
VPN/VOIP
NAS Port Id
Username Nas port id RADIUS Logs
Co-relating fragmented telecom log files-Info model
Calls to watched countries
Intra Watch list Chatter velocity is high
Call patterns reveal malicious intent
38
Entity on watch list
NOT on watched list but high level of
interactions
Are people ‘n’ degrees away from watched list performing 2 sigma activity across multiple Call dimensions – sms, voice, conference and other behavioral activity ?
CDR From BTN To TN Date/Time Duration Call type, Approximate tower location which carried
call
Watch List Recommender Data Product Modeling Unique behavioural signature
Discarded Telecom data--> Actionable Security patterns
Case Study-4 : Mobile forensics
Mobile funnel data Analyzing Mobile Sub Channel Behavioural
shift to Drive revenues for a leading online
travel company
What's the problem being solved ?
• More applications becoming mobile
• There is a dip in transaction completion rate
• Friction points and hot spots exist
• No way to “see” these hot spots and patterns
• Spot friction points
• Mobile funnel drops
• Payment gateway drops
• Airline connector drops
Funnel Analysis
Churn Scoring Model
Case Study-5 : Money transmission
Minimizing fund leakages to watched entities
Money transmission event stream Threat matrix Graph Analysis
Money transmission behavioral modeling
Modeling money transmission behavior
Graph analysis to monitor money transmission patterns
• Each account can be modelled as a node in a graph
• Behaviour across nodes can be analyzed
• Proxy behaviours can be easily discerned
7 Key “gotchas” ( best practices)
Lesson-1 : Think “Polyglot persistence”
Asset
Sensor
Parameters
Asset tags Sensor tags
Events
Column family ( Hbase/Cassandra)
Document db ( Mongo)
Graph db ( Neo4js)
RDBMS ( Oracle )
Heavy duty write workloads
Photos, Videos, text Inter relationships
Low velocity self service
Logical Business Model
“Different strokes for different folks”
Lesson-2 : Think “pattern extraction”
1. Collaborative filtering
2. Text Mining
3. Scoring Models (
Logistic etc )
Embedding one ML process can help SPOT patterns not previously seen
Lesson-3 : Think “Baby steps”
• 60-90 day Hadoop Sandbox
• Build quick wins to
build momentum
• Pick a few low
hanging use cases to demonstrate impact
No Big Bang !
Lesson-4 : Think “Data Products”
• Data Product = “Action an end user takes”
• EXAMPLE
• Watch List recommender vs tons of “feel good” graphs
• Next best action vs lots of dials, graphs
•
Focus on Outcomes more than Analysis
Lesson-5 : Think “MVP-Minimum Viable Product”
• Minimalist ... Key is to start simple
• Only core features ... No bells and whistles
• Get feedback from early adopters and enrich features
•
How can Big Data co-exist with existing DW solutions ?
Big Data Existing DW
Existing DW
OSS BSS CRM
ETL
Existing BI tools
Radius logs IP traffic
logs Comments
File copy / Bulk load / Agent based
Operational App Integration
Existing DW
OSS BSS CRM
ETL
Existing BI tools
Radius logs IP traffic
logs Comments
File copy / Bulk load / Agent based
Operational App Integration
Lesson-6 : Gracefully Co-exist
Lesson-7 : Think “Biz backward … NOT Tech forward”
1. What is the business problem you are solving ? Tightly framed ?
2. Why is important to solve this problem ?
3. What happens if we dont solve this problem ?
4. Is status quo an option ?
5. Is the business pain acknowledged ?
6. How would the end user “feel” when the product is deployed ?
7. Are budgets allocated ?
8. What is the actual use case to solve the pain ?
Connect with business @ a deeper level !
1. Think “Polyglot Persistence”
2. Think “Pattern Extraction”
3. Think “Crawl-Walk-Run”
4. Think “Data Products”
5. Think “MVP”
6. Think “Co-existence”
7. Think “Business Impact/Outcomes”
To summarize !
Taming and channelising data beast is going to be a crucial capability for survival !