Three Big Data Case Studies
-
Upload
atidan-technologies-pvt-ltd -
Category
Technology
-
view
508 -
download
0
description
Transcript of Three Big Data Case Studies
THREE
Big DataCASE STUDIES
Great use cases of Big Data
Big Data ExplorationFind, visualize, understand all big data to improve decision making
Enhanced 3600 View
of the Customer
Extend existing customer views
(CRM, etc) by incorporating
additional internal and external
information sources
Security/Intelligence Extension
Lower risk, detect fraud and
monitor cyber security in real-time
Data Warehouse Augmentation
Integrate big data and data
warehouse capabilities to increase
operational efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
• Greater efficiencies
in business
processes
• New insights from
combining and
analyzing data
types in new ways
• Develop new
business models
with resulting
increased market
presence and
revenue
Why Big Data
File Systems
Relational Data
Content Mgmt
CRM
Supply Chain
ERP
RSS Feeds
Cloud
Custom Sources
Data V
iews
Applications/Users
Atidan Approach
Implement a
Hadoop-
centric
reference
architecture
Move
enterprise
batch
processing to
Hadoop
Make Hadoop
the single
point of truth
Massively
reduce ETL by
transforming
within
Hadoop
Move results
and
aggregates
back to legacy
systems for
consumption
Retain, within
Hadoop,
source files at
the finest
granularity for
re-use
Top Criteria
• Allow users to use familiar consumption interfaces (web, mobile)
• Enable businesses to unlock previously unusable data
Unlock Big
Data
Simplify
Your
Warehouse
Preprocess
Raw Data
Ingest
BigData
Arc
hitect
ure
Hig
hle
vel
Atidan Case StudyUsage Analysis using Hadoop
• Business Need• A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs
• The logs received from IIS were stored in multiple files e.g. Daily logs
• The data had free text, it was unstructured and it also contained irrelevant data
• The exact analysis criteria/parameters/desired outcome were not pre-known
• Solution• Traditional RDBMS could not handle the problem due to the type and volume of the data and the
uncertainty around ultimate analysis criteria
• Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily
• The solution was fault tolerant to data inconsistencies
• Hadoop provided elasticity to incremental data addition
• Scalability in the range of Peta Bytes
• Based on data size and complexity, the processing can be scaled from one node to 100 nodes
• Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage
in the project
• The organization got completely new and unexpected insights on employee, customer and vendor/partner
behavior
• Correlations between employee’s usage pattern and attrition as well as productivity were established
Atidan Case StudyUsage Analysis using Hadoop
02000400060008000
100001200014000
Accep
ted
…
Bad R
equest…
Cre
ate
d (
20
1)
Forb
idden…
Not…
Not
Found…
OK
(2
00
)
Unauth
ori
se…
Request Types
0
200
400
600
800
1000
1200
January
Marc
h
May
July
Septe
mber
Novem
ber
January
Marc
h
May
July
Septe
mber
Novem
ber
2001 2002
Monthly Requests
0
200000
400000
600000
Am
are
Am
it
Bhagat
Mukesh
Pra
neel
Sanjo
g
Vim
al
Users
• The size of data being collected
and analyzed in industry for
business intelligence (BI) is
growing rapidly making
traditional warehousing solution
prohibitively expensive
• Map Reduce is low level and
complex to write
• Hive provides high level query
language like SQL
• This allows for ad-hoc analysis
• Business need not know patterns
to look for in advance
Big Query - Hive
Atidan Case Study Customer data collection (KYC) using Hadoop
• Business Need• A financial institution had to periodically collect customer data
• Customers are very reluctant to provide updated data
• This customer data has to be cross-checked against the billions of transactions they receive per day
• They want to collate data that is available in public domain from known social media sites
• The data had free text, it was unstructured and it also contained irrelevant data
• Solution• A graph database is constructed over the extracted social data to analyze transactions
• Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database
• Aggregate customer information from existing sources, social media, government sources
• Analyzed transaction to find hidden patterns
• Enable link analysis, risk monitoring
• Facilitate decision making(new products) and customer discovery
Atidan Case Study Customer data collection (KYC) using Hadoop
Big Data Processing
Graph Database
Customer Clustering
Income/Expense changes
Corporate structure
changes
AML
Peer group analysis
Pattern Analysis
Customer InformationWeb
Social
Channel
PartnersUtility
Providers
Aadhar
UIDAI
• Lowers cost of follow-up with users
• Reduces loses by highlighting risky
users early
• Graph database based AML
• Insights into
• New products
• New customers
• New loans to existing customers
• New investment opportunities for
customers
• Reduces operational errors
• Traceability of data source
Advantages
of Hadoop (KYC) Solution to Banks
AML
Graph
Queries
Due
Diligence
Risk
Credit
Scoring
Mitigation
Analysis
Peer
groups
New
Prospects
Insights
New
Products
New
Customers
Atidan Case Study Email scanning and categorization using MongoDB
Business NeedRetrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s
page for frontend access
The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM
SolutionAtidan proposed a MongoDB-Drupal based solution with the following approach:
• Scheduler was created to pull only headers from the all-user common webmail account
• Stored them into the intermediate Catalog in MongoDB
• Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered
records and saved into the final Catalog in MongoDB
• Emails from the final catalog pushed into the front end platform (Drupal)
Key Takeaways• Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible
• The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
Atidan Case Study Email scanning and categorization using MongoDB
• Node.js (data transformation)
• MongoDB (database)
• Schema-less
• RESTFUL service to access data from the browser
• Drupal (Frontend)
• Basic unit of data storage and transfer was JSON object
• Storage and querying
• NoSQL/Simple/Schema-less database
• Advantages
• highly scalable, very flexible, simple
• Connectivity
• node.js
Server side Javascript
Technologies used