Log Data Analysis Platform
-
Upload
valentin-kropov -
Category
Data & Analytics
-
view
113 -
download
3
Transcript of Log Data Analysis Platform
![Page 1: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/1.jpg)
LOG DATA ANALYSIS PLATFORM
May, 2015
![Page 2: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/2.jpg)
Agenda
1) User-Group Introduction
2) Problematic
3) Log Data Analysis System Overview
4) Task Analysis
5) Solution Architecture
6) Trade-off Analysis
7) Automation
8) Performance Testing
9) Outcome & Plans
![Page 3: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/3.jpg)
PROBLEMATIC
![Page 4: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/4.jpg)
Demo Lab: Why we’ve started this project?
1) Increase Internal Experience
2) Create Reference Solution w/o NDA Limitations
3) Get Playground for Tests
4) Provide Demo Environment for Customers (using their data)
5) Decrease time to Market (by introducing automation)
![Page 5: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/5.jpg)
LOG DATA ANALYSIS PLATFORM :
OVERVIEW
![Page 6: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/6.jpg)
Log Data Analysis Platform Details
Key Facts: • ~270-300 Web Servers
• Log Types: HTTPD Access
logs, Error logs, Application
Server Servlet, OS Service
Logs
• ~500K events per minute
• 150GB of data per day
Technologies:• Flume
• Hadoop/HDFS, MapReduce
• Hive, Impala
• Oozie
• Elasticsearch, Kibana 3
• Tableau Analytics platform
• Puppet + Vagrant
![Page 7: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/7.jpg)
Log Data Examples
Access log:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Error log:
[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client
stopped connection before send body completed
[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist:
/home/httpd/twiki/view/Main/WebHome
Vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0
iostat
Linux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011
avg-cpu: %user %nice %system %iowait %steal %idle
5.68 0.00 0.52 2.03 0.00 91.76
![Page 8: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/8.jpg)
TASK ANALYSIS
![Page 9: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/9.jpg)
Architecture Drivers: Use Cases
![Page 10: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/10.jpg)
Architecture Drivers: Quality Attributes (1/3)
![Page 11: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/11.jpg)
Architecture Drivers: Quality Attributes (2/3)
![Page 12: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/12.jpg)
Architecture Drivers: Quality Attributes (3/3)
![Page 13: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/13.jpg)
Architecture Drivers: Limitations
![Page 14: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/14.jpg)
Demo Lab: Marketecture
![Page 15: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/15.jpg)
SOLUTION ARCHITECTURE
![Page 16: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/16.jpg)
Solution Architecture
Batch Layer Serving Layer
Speed Layer
Raw Data Storage
Data Strea
m
Real-time Views
Static Views Precomputing
PrecomputingAd-hoc Batch
Views
Static Batch Views
Corporate BI Tool
Legend:
Layer boundary
Data flow (with direction indicated)
Query flow
Apache HTTP Servers
Raw Data Storage Pre-computing Batch Views
Real-Time Views
Dashboard/Search
Data Stream
Real-Time Processing and Aggregations
BI Tool
Avro as a Raw Data Storage file format
Parquet as a Batch Views file format
Star schema as a Batch Views data model
![Page 17: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/17.jpg)
Architecture: Flume Topology
![Page 18: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/18.jpg)
Batch ETL
![Page 19: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/19.jpg)
TRADE-OFF ANALYSIS
![Page 20: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/20.jpg)
Distribution Selection
![Page 21: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/21.jpg)
Hive Stinger vs Impala
Compression Ratio
Access Speed
![Page 22: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/22.jpg)
AUTOMATION
![Page 23: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/23.jpg)
Automation (saves time and money)
80% 20%
Development and Debugging F&P Testing, Demo
Local Development Cloud Development
![Page 24: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/24.jpg)
vagrant up
![Page 25: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/25.jpg)
Automation Process
Phase Tool NotesVM Provisioning Vagrant — Supports:
VirtualBox, VMWare ESX, Amazon AWS
VM Bootstraping Puppet — Installs Cloudera Manager, Cloudera DistributionHadoop, ElasticSearch+Kibana, Flume, Microstrategy, LogGenerator.
— Creates Cluster using Cloudera Manager API.
Configure ETL and BI
Puppet — Configures Flume, Oozie, ElasticSearch, Impala, Hive, Microstrategy Dashboards
Integration Tests Puppet — Generates Workload and ensures data go through.— Checks Logs for errors.— Calculates timing/throughput.
![Page 26: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/26.jpg)
PERFORMANCE TESTING
![Page 27: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/27.jpg)
Log Generator
1 Thread can generate:4200 events / second (File source)5500 events / second (TCP source)
![Page 28: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/28.jpg)
Accurate Sizing
100k/min
50k/min
20k/min
200k/min
Calculator!
![Page 29: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/29.jpg)
OUTCOME & PLANS
![Page 30: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/30.jpg)
Outcome
1) Demo lab, playground, testing platform (in 1 hour)
2) Sizing Calculator
3) Help to get 3 new customers (one is really, really
huge)
4) Strategic Partnership with Cloudera
5) Tons of experience and fun
Plans
1) Add support for other Hadoop Distributions
(Hortonworks, MapR)
2) Make Project Open-Source
![Page 31: Log Data Analysis Platform](https://reader033.fdocuments.net/reader033/viewer/2022042509/55ac230d1a28ab58298b4618/html5/thumbnails/31.jpg)
Thank You!
31
SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880
Contacts Valentyn Kropov
Tel: 866.687.3588 x4341