Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive
-
date post
21-Oct-2014 -
Category
Data & Analytics
-
view
163 -
download
1
description
Transcript of Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive
SynapseThe Hive Big Data Platform
Mohan Reddy, Chief ArchitectThe Hive
Vision
The HivePortfolio
Online Enterprise Internet of Things
ApplicationsApplications
SynapseSynapseBig DataBig Data
Data InfrastructureData Infrastructure
Knowledge Action
The HiveBig Data
Stack
• Accelerate product development & go‐to‐market of The Hive portfolio companies
• Plug the latest open source innovations in data science & infrastructure
• Engage & contribute back to relevant open source communities
• Share insights & experiences with The Hive Think Tank
Goals of Synapse
Synapse for IoT Applications
Smart Home Smart Building Smart Factory
Synapse Data InfrastructureSynapse Data Infrastructure
Data‐driven ControlData‐driven ControlDeep LearningDeep Learning
SecuritySecurity
Business AppsBusiness AppsThe HiveIoT Portfolio
5
Synapse IoT Compute Models
• Fast changing open source technologies adding complexity to application design
• Realtime stream analytics for operations that can respond to patterns in live data streams
• Rethinking trade‐offs between scale‐up & scale‐out architectures, especially for realtime use‐cases
• Faster machine learning through smarter partitioning of data & parallelism in model building
• Data management, lineage and curation add significant overheads to product development
Trends driving Synapse Design
7
Synapse Infrastructure Services
Visualization Service APIs
Machine Learning Provisioning & Deployment
Stream Processing Batch Processing
Storage
Data Ingestion & Lineage
8
Synapse Service Abstractions
Visualization Service APIs
Machine Learning Provisioning & Deployment
Stream Processing Batch Processing
Storage
Data Ingestion & Lineage
Taswira Alchemy
Akili Chombo
Tempus Huduma
Ukoo
Duka
LambdaArchitecture
9
Extendable Service Implementationsby Present/Future Open Source Projects
Visualization Service APIs
Machine Learning Provisioning & Deployment
Stream Processing Batch Processing
Storage
Data Ingestion & Lineage
Mophiline Kite Falcon
• A framework to build, reuse, link, manage and run data and job pipelines
• The pipeline is a collection of procedural steps, interactions, input and output ‐ steps needed to describe a big data business process
• Datasets come from different sources, industry‐standard and proprietary adapters, Apache Flume, MQTT, iBeacon etc.,
• Based on Apache Falcon, Kite SDK, Morphlines
10
Ukoo ‐ Data Ingestion, Lineage and Management
• An extensible framework to process realtime data and an API to compute real time ranking and aggregations
• Works with Spark Streaming and Storm
• Real time classification
11
Tempus ‐ Realtime Processor
12
Tempus Speed Layer
• Stream Processing• Continuous
Computation • Transactional• Stores limited window
of data
• Complexity Isolated in this layer only
• Fault tolerant by autocorrection in the next batch run
• Compensates for batch latency
• Data adapters and pipelines for different sources
• DSL based jobs using Scalding• Data connectors to storage layer
supporting HBase, Cassandra and Redis
• Input to machine learning models
13
Huduma ‐ Batch Processor
• Framework and Infrastructure to run machine learning models
• Embedded models with code generation in R, Javascript and Java
• Online Classification Service• Large scale collaborative filtering
based recommendation engine• Uses MLLIB, GraphLab and
OXData.• Based on SMAC/Auto
Weka/GhostFace model selection
14
Akili ‐Machine Learning As a Service
15
Akili – Schematic Description
• Real‐time and batch views of data
• REST Interface• Scalable and Highly Available• Generic Service which
interfaces with Data Storage and other realtime and batch processes
16
Alchemy ‐ Service Layer
• Scalable Interactive visualization
• Uses D3, Aperture and Gephi
• Works with Tableau.
17
Taswira ‐ Visualization framework
• Deployment of the components as a lightweight, portable, self‐sufficient container that will run virtually anywhere
• Docker based containers
18
Chombo ‐ Deployment Provisioning