© 2016 Silver Spring Networks. All rights reserved. 1
Silver Spring NetworksGreg BrosmanProduct ManagerSilverLink Data Platform
© 2016 Silver Spring Networks. All rights reserved. 2
Silver Spring Networks• Silver Spring Networks helps global utilities and cities
connect, optimize, and manage smart energy and smart city infrastructure
• Over 22 million connected devices• 200B records read per year• 2 million remote operations per year
IntegrateRenewables
EngageCustomers
Improve Operational Efficiency
Improve Reliability
Manage Peak
AutomateMeasurement
ImproveEnergy Efficiency
Reduce Truck Rolls for Device Maintenance
© 2016 Silver Spring Networks. All rights reserved. 3
More Devices, More Data
• How can we do more with our network?- We deployed a network to support meter reading. It works
great, but we’re ready for the next thing to leverage these investments
• How do we manage these new devices and make all this data accessible and secure?- There are lots of opportunities to enhance our service by
making use of advanced analytics, but we can’t get the data to the right people
• How can we reduce the cost, time, and pain of integrating with 3rd party apps?- The ecosystem of 3rd party apps is growing, but need a scalable
way to connect apps with data
Managing the volume, variety, and velocity of data
© 2016 Silver Spring Networks. All rights reserved. 4
SilverLink Data Platform
• Automatically ingest smart grid data
• Enrich data with valuable context
• Enable real-time and batch applications
• Archive raw and enriched data
• Connect apps through standard APIs
• Explore data through BI tool integrations
Seamlessly connecting apps with sensor data
Security & API Management
Storage & BatchReal-Time
Data Ingestion
Data Sources
SilverLink Data Platform
ApplicationsSilver Spring
Networks Apps3rd Party
Apps
In-HouseApps
Devices
Silver SpringNetworks Data
UtilityData
3rd PartyData
© 2016 Silver Spring Networks. All rights reserved. 5
Starfish
• A Worldwide Wireless IPv6 Network Service for the IoT. Starfish enables cities, utilities, enterprises, and developers to connect and manage a new generation of intelligent devices
• Focus areas include water, energy, food, traffic, transportation and safety
• 2016 Global IoT Hackathon Series: an opportunity to develop and test innovations and collaborate with leading IoT technologists
Building a new ecosystem of IoT services
© 2016 Silver Spring Networks. All rights reserved. 6
IOT Big Data Ingestion & Processing in HadoopDarin NeeSilver Spring Networks
© 2016 Silver Spring Networks. All rights reserved. 7
• Context & scope of our use case• Tour a DataTorrent app we built• Some technical hurdles & solutions we came up with• Q & A
Agenda
© 2016 Silver Spring Networks. All rights reserved. 8
• Sensor reads• Meter register reads & interval data• Threshold events, traps• Device metadata
Kinds of Data
© 2016 Silver Spring Networks. All rights reserved. 9
• NICs collect data from meters• Head end software poll NICs• Some data sent asynchronously to head end• Agents send data to SilverLink• Data processing using DataTorrent + more• Data consumed via APIs and SQL
Data Flow
© 2016 Silver Spring Networks. All rights reserved. 10
• Encryption of data at rest & in-transit• Ranger & Knox• Custom requirements to satisfy local laws• Auditing• No data leakage across tenants• Not enough to be secure – need to prove it
Security
© 2016 Silver Spring Networks. All rights reserved. 11
• Shared resources to cut costs• Customers with millions of devices, and pilots with a handful of
them• Centralized management of software & operations• Challenge in selling shared anything to our customers
Multi-Tenancy
© 2016 Silver Spring Networks. All rights reserved. 12
• 23 million network endpoints in service today• Up to 96 intervals a day• Each interval has 4 channels• So, approximately 8 billion intervals per day• Keep this data forever• Also, 100 million events a day• And, sensors that can collect data every 10s• 19.4 GB per million meters per day• ½ TB per day
Scalability
© 2016 Silver Spring Networks. All rights reserved. 13
• Clustering• Automated Fail-overs• Rolling upgrades
High Availability & Disaster Recovery
© 2016 Silver Spring Networks. All rights reserved. 14
• HDFS• Kafka• DataTorrent• Elasticsearch• OpenTSDB & HBase• Oozie• Hive• Mule• Apigee• Tableau
Tech Architecture
© 2016 Silver Spring Networks. All rights reserved. 15
• Management UI Console• Malhar Library + Java• Support• Rapid Development• Stats, Operability, Auto-Scaling
Why DT?
© 2016 Silver Spring Networks. All rights reserved. 16
• Resilient operators (availability)• Easily partition operators (scalability)• Any java programmer can build a simple app• Facilitate management hand-off to operations• Easy to detect failures with UI and stats
Strengths
© 2016 Silver Spring Networks. All rights reserved. 17
• No “back pressure”• If container crashes with OOM, it restores container to OOM state• No good way to stop an app and save context• Can be difficult to navigate logs
Our focus areas for improvement
© 2016 Silver Spring Networks. All rights reserved. 18
Example DT App: AMM Export Ingestion
© 2016 Silver Spring Networks. All rights reserved. 19
Example App: AMM Export Ingestion
• Scans last 2 days’ HDFS directories• Emits filenames• Too fast!
Input Operator
© 2016 Silver Spring Networks. All rights reserved. 20
Example App: AMM Export Ingestion
• Parses different types• Emits avro tuples• XML parsing can be slow• File & tuple sizes vary a lot
AMM File Reader
© 2016 Silver Spring Networks. All rights reserved. 21
Example App: AMM Export Ingestion
• Adds metadata to every tuple• External dependency on elasticsearch• Uses a thread pool since one YARN container too
big for a single client
Enricher
© 2016 Silver Spring Networks. All rights reserved. 22
Example App: AMM Export Ingestion
• Normalizes tuples across schema versions• Outputs many tuples from one
Avro Converter
© 2016 Silver Spring Networks. All rights reserved. 23
Example App: AMM Export Ingestion
• Writes avro tuples to HDFS files• Names output files by date, input file, part, etc.• HDFS can be slow – another external dependency• Container death causes rewriting of tuples
Enriched Persister
© 2016 Silver Spring Networks. All rights reserved. 24
Example App: AMM Export Ingestion
• Embedded instance of OpenTSDB• External dependency on HBase• Slow during metric creation and Hbase Region
splits
TSDB Writer
© 2016 Silver Spring Networks. All rights reserved. 25
AMM Export IngestionContinuing to extend the DAG with new operators
© 2016 Silver Spring Networks. All rights reserved. 26
• The classic YARN application solution is to spin up more containers
• Not so simple due to external dependencies, and,• Highly variable loads
- Tuple mix- Tuple size- Kind of tuple
• Buffering tuples in the DAG• Static partitioning means the DAG has to be slow• Throughput: how many tuples operator can emit per window• We need dynamic throughput management
Scalability & Throughput
© 2016 Silver Spring Networks. All rights reserved. 27
Throughput ManagementWe use a Stats Listener to “auto-tune” the throughput rate
© 2016 Silver Spring Networks. All rights reserved. 28
Throughput Management
• Any pair of logical operators• Adjusts upstream operator throughput every N
windows• Scales it by a factor based on downstream operator
backlog threshold levels• A lagging correction since based on operator stats
from prior windows• Observed overall processing rate across DAG oscillates• Control theory says this is not going to work since it
will never converge to a reasonable value
First implementation
© 2016 Silver Spring Networks. All rights reserved. 29
Throughput Management
• Compute a backlog• Try to maintain a target backlog that is a multiple
of the downstream operator processing rate• Problem: starvation
- Stats not reported when throughput set to zero- Solution 1: small, positive min throughput- Solution 2: fractional/probabilistic emit
Second implementation
© 2016 Silver Spring Networks. All rights reserved. 30
Throughput Management
• Operators don’t run out of memory and crash• Overall throughput across the DAG is much higher• Can adapt to a wide mix of loads• General enough that we are using it in all our apps• We ingested 4 multi-month pilot datasets
successfully• Reduced the time it takes to ingest 1 day’s worth
of data from 1½ hrs to 15 min• Hands off, automated tuning
Successes
© 2016 Silver Spring Networks. All rights reserved. 31
Throughput Management
• Throughput management is based on tuple count and not all tuples are the same
• Garbage Collection causes uneven performance• Slow to converge• Hard to test and debug
Remaining problems
© 2016 Silver Spring Networks. All rights reserved. 32
• Persist processed state for files & Kafka messages- Save Kafka offsets in ZooKeeper- Rename input files to .processed
• Checkpoint Listener- Wait to persist state until tuple fully transits DAG- Prevent loss of data
• However, some tuples get processed twice• Suspend script
- Use REST API to set a flag on Input Operator- Wait until no more activity
Stopping DAGs
© 2016 Silver Spring Networks. All rights reserved. 33
• Hadoop 2.3.0• DataTorrent 3.1.1
Versions
Top Related