DataFlow, Streaming Analytics and Cyber...
Transcript of DataFlow, Streaming Analytics and Cyber...
DataFlow, Streaming Analytics and Cyber Security
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Payment Tracking
DueDiligence
SocialMapping
ProductDesign
M & ACall
AnalysisMachineData
DefectDetecting
FactoryYields
CustomerSupport
BasketAnalysis
Segments
CustomerRetention
SentimentAnalysis
OptimizeInventories
SupplyChain
Cross‐Sell
VendorScorecards
AdPlacement
CyberSecurity
DisasterMitigation
InvestmentPlanning
AdPlacement
RiskModeling
ProactiveRepair
InventoryPredictions
NextProduct Recs
OPEXReduction
HistoricalRecords
MainframeOffloads
Device DataIngest
Rapid Reporting
DigitalProtection
Dataas a
Service
FraudPrevention
PublicDataCapture
INNOVATE
RENOVATE
EXP LORE OPT IM I Z E TRANS FORM
ACT IVEARCH IVE
ETLONBOARD
DATAENR ICHMENT
DATAD ISCOVERY
S INGLEV IEW
PRED ICT IVEANALYT ICS
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Components of Streaming Analytics in Enterprise Environments
Flow Management
StreamProcessing
EnterpriseServices
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Components of Streaming Analytics in Enterprise Environments
Easy, Secure, Reliable Way to Get the Data
You Need
Because there’s nodata science without
the data
Immediate andContinuous Insights
Because acting on perishable insights in real timemaximizes value
Provisioning, Management, Monitoring, Security, Audit, Compliance, Governance
Because it all has to work together in an enterprise
environment
Flow Management
StreamProcessing
EnterpriseServices
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Constrained
High‐latency
Localized context
Hybrid – cloud / on‐premises
Low‐latency
Global context
CoreInfrastructure
Hortonworks DataFlow Manages Data in Motion
RegionalInfrastructureSources
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow Manages Data in Motion
CoreInfrastructureSources
Constrained
High‐latency
Localized context
Hybrid – cloud / on‐premises
Low‐latency
Global context
RegionalInfrastructure
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems Today: Timely Access to Data and Decisions
http://diginomica.com/2016/04/22/royal‐mail‐starts‐to‐deliver‐on‐hortonworks‐data‐in‐motion‐promise
“HDF helps us to streamline the flowof data and build models andvisualisations quickly, so that my teamcan work iteratively with business colleagues on building solutionsthat work for the business.“
Royal Mail
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Dataflows: Easy, Definitive
AcquireData
StoreData
DataFlow
ProcessAnalyzeData
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realistic View of Dataflows: Complex, Convoluted
AcquireData
StoreData
AcquireData
StoreData
StoreData
StoreData
StoreData
Processand
AnalyzeData
DataFlow
AcquireData
AcquireData
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDPHORTONWORKS
DATA PLATFORMPowered by Apache Hadoop
HDF Makes Big Data Ingest Easy
Complicated, messy, and takes weeks to months to move the right data into Hadoop
HDPHORTONWORKS
DATA PLATFORM
Streamlined, Efficient, Easy
HDPHORTONWORKS
DATA PLATFORMPowered by Apache Hadoop
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connecting Data Between Ecosystems Without Coding: 170+ Processors
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schlumberger Dataflow
Slideshare: http://www.slideshare.net/HadoopSummit/from‐zero‐to‐data‐flow‐in‐hours‐with‐apache‐nifi‐64032731LinkedIn: www.linkedin.com/hp/update/6171029401820479488Twitter: https://twitter.com/aldrinpiri/status/747865822228422656
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prescient Data Ingest
http://hortonworks.com/blog/prescient‐transforms‐48000‐data‐sources‐real‐time‐apache‐nifi/
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Energi: DataFlow
As a result of its investment in Hortonworks Dataflow, Open Energi is Already:
Reducing costs thanks to 10‐15% less data being transmitted across a mobile network
Creating a full transparent trail for data provenance that Open Energi can sharewith customers
Enabling line of business teams to contribute to building dataflow rules and processes
Standardizing the output of data across various end point devices
Open Energi: hortonworks.com/blog/data‐fuel‐open‐energi‐virtual‐power‐station‐hortonworks‐dataflow/
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DataFlow Management to Increase Efficiency of Cybersecurity Solutions
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Log Analytics Systems Today
Network Device Logs
LogAnalyticsPlatform
Not all data can be captured
Not all captured data is valuable
Transport all data
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Efficiently Expand Log Ingestion from the Edge
Network Device Logs
HDP
LogAnalyticsPlatform
HDF
HDF
HDF
HDF
Expand collection to new sources of machine data
Edge analytics to transform, enrich and prioritize content based routing
Capture and transport only valuable data
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cost Effectively Expand Storage Options of Log Data
Network Device Logs
HDP
LogAnalyticsPlatform
HDF
Cost effectively expand collection and grow timescale of logs collected
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Optimize Log Analytics Platforms with Content Based Routing
Send data to alternative systems based on value,
content, priority
Intelligent, content based routing, transformation and
enrichment
Edge analytics for cost‐effective and efficient
movement of machine data
Transformation Routing Enrichment
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Service
s and In
tegratio
n La
yer
Search andDashboarding
Portal
Security Data
Vault
CommunityAnalyticalModels
Provisioning,Managementand Monitoring
ModulesReal‐time ProcessingCyber Security Engine
TelemetryParsers Enrichment
ThreatIntel
AlertTriage
Indexersand
Writers
Cyber SecurityStream Processing Pipeline
Apache Metron: Incubating Project
Telemetry
Ingest B
uffe
r
TelemetryData Collectors
Real‐timeEnrich / ThreatIntel Streams
PerformanceNetworkIngestProbes
/ OtherMachine Generated Logs(AD, App / Web Server,firewall, VPN, etc.)
Security Endpoint Devices (Fireye, Palo Alto,BlueCoat, etc.)
Network Data(PCAP, Netflow, Bro, etc.)
IDS(Suricata, Snort, etc.)
Threat Intelligence Feeds(Soltra, OpenTaxi,third‐party feeds)
TelemetryData Sources
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Metron at Capital One
Using HDF to ingest log data into their cyber security data lake, key features:prioritization and provenance. Then created Apache Metron.
https://youtu.be/Nffx8SKn7l4?t=1h37m50s
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Metron Resources
http://hortonworks.com/apache/metron/
https://metron.incubator.apache.org/ http://www.meetup.com/futureofdata‐
siliconvalley/events/233096822/
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Features of HDF 2.0
Enterprise productivity via streamlined operations
– Ambari Integration of Apache NiFi, Kafka, Storm
– Apache Ranger authorization
– Multi‐Tenancy of dataflows
30% more processors in Apache NiFi 1.0
Edge intelligence with Apache MiNiFi
Increased security options with Apache Kafka 0.10
6‐16x streaming analytics performance with Apache Storm 1.0
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi‐tenant Authorization
NO Read Permission
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
Guaranteed delivery
Data buffering
‒ Backpressure
‒ Pressure release
Prioritized queuing
Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
Data provenance
Recovery / recording a rolling log of fine‐grained history
Designed for extension
Different from Apache NiFi
Design and Deploy
Warm re‐deploys
Key Features
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Processor, Smaller Footprint ~40 MB
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Stream Processing Features HDF 2.0
New Storm Connectors
Storm‐Kafka Spout using new client APIs
Storm Distributed Log Search
Storm Dynamic Worker Profiling
Kafka Grafana Integration
Storm Grafana Integration
Improved Nimbus HA
Storm Automatic Back Pressure
Storm Distributed cache
Storm Windowing and State Management
Storm Performance improvements
Improved Kafka SASL
Storm Topology Event inspector
Storm Resource Aware Scheduling
Storm Dynamic Log Levels
Pacemaker Storm Daemon
Kafka Rack Awareness
Developer Productivity EnterpriseReadiness Operational Simplicity