Data at Scale - Michael Peacock, Cloud Connect 2012
-
Upload
michael-peacock -
Category
Technology
-
view
351 -
download
2
Transcript of Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale
Data problems and solutions with the connected world
Michael Peacock
Web Systems Developer
Telemetry Team
Smith Electric Vehicles
Lead Developer
Occasional conference speaker
Technical Author
• Worlds largest manufacturer of all electric commercial vehicles
• Founded in 1920• US facility opened 2009• US buyout in 2011
Commercial electric vehicles?
Electric Vehicles
• 16,500 – 26,000 lbs gross vehicle weight• Commercial Electric Delivery Trucks• 7,121 – 16,663 lbs payload• 50 – 240km• Top Speed 80km/h
Electric Vehicles
• New, continually evolving, technology• Viability evidence required• Government research
EV Data
• Performance analysis and metrics• Proving the technology: Government
research• Evaluating driver training conversions• Diagnostics, Service and Warranty Issues• Continuous Improvement
Current Status
• ~500 telemetry enabled vehicles• Telemetry is now fitted as standard in our
vehicles• Our MySQL solution processes:
– 1.5 billion inserts per day– Constant minimum of 4000 inserts per second
CANBus: 101
CANBus and Telemetry
• Sample the buses: once per second• Only sample buses with useful
performance and diagnostic information on them
Vehicle Data• Drive train information:
– Motor speed– Pedal positions– Temperatures– Fault Codes
• Battery information:– Current, Voltage & Power– Capacity– Temperatures
Connected World: The Problem
• Connected infrastructure– EV Charging stations– Utilities
• Home based telemetry– Smart Meters– Smart Homes
Our problem
• Hundreds of connected devices, each with numerous sensors giving us 2,500 pieces of data per second per vehicle
• Broadcast time we can’t plan for• Vehicles rolling off the production line• New requirements for more data
How it started
Issue 1: Availability
Issue 2: Capacity
Sometimes data is too
much to cope with
www.flickr.com/photos/eveofdiscovery/3149008295
Issue 2: Capacity
Option: Cloud Infrastructure
• Cloud based infrastructure gives:– More capacity– More failover– Higher availability
Cloud Infrastructure: Problem
• Huge volumes of data inserts into a MySQL solution: sub-optimal on virtualised environments
• Existing enterprise hardware investment• Security and legal issues for us storing the
data off-site
Cloud Infrastructure: Enabler
www.flickr.com/photos/gadl/89650415/inphotostream
AMQP
Advanced Message Queuing Protocol
Queuing
• Downtime• Capacity• Maintenance Windows
What if...
• Queuing allows us to cope with:– Downtime of our own systems– Capacity problems
• Queuing doesnt allow us to cope with:– An outage of a queuing infrastructure
Buffer
www.flickr.com/photos/brapps/403257780
Cloud based infrastructure
• Use a Message Queue to ensure data is only processed when you have the resources to process it
SAN
• Backbone to most cloud-based systems• Powers our MySQL solution• Supports:
– Huge volumes of data– Lots of processing– Fast connection to your servers– Backups and snapshots
SAN Tips
• When dealing with data on a huge scale every aspect of your application and infrastructure needs to be optimised, this includes your SAN – something which is commonly overlooked.
• http://www.samlambert.com/2011/07/how-to-push-your-san-with-open-iscsi_13.html
New Architecture
Speed: Stream Batch
• Streams of continuously flowing data can be difficult to process
• Turn the stream into small, quick batches
• MySQL: LOAD DATA INFILE
Shard 1: Hardware
• As the amount of data increased, we hit a huge performance problem. This was solved by sharding at a hardware level.
• Each data collection device was given its own database, which could be on any number of separate machines, with a single database acting as a registry
Rationalisation & Extrapolation
• Remember the CANBus– Always telling us information, which we
sample every second?– Do we always need that?
• Extrapolate and assume
Getting information from data
• Vehicle performance information involves:– Looking at 20 – 30 data points for each
second of a vehicles operation in a day– Analysing the data– Performing calculations, which vary
depending on certain data points
• Getting this data was slow– How far did Customer A’s fleet travel last
week?
Regular processing
• Instead of processing data on demand, process it regularly
• Nightly scheduled task to evaluate performance information
Regular Processing: Problems
You need to pull the data out faster and faster than before!
Shard 2: Tables
• All our data has a timestamp associated with it
• Looking up data for a particular day was slow. Very slow.
• We sharded the data again, this time with a table per week within a vehicles specific database
Sharding: Fallbacks and logic
• What about data before you implemented sharding?
• Which table do I need to look at?
Aggregation
• With data segregated on a per vehicle and per week basis, lookups were much faster
• Performance calculations could be scheduled nightly, with a single record recorded for each vehicle for each day in a central database
• Allows for easy aggregation:– How far did my fleet travel last week?– How much energy did they use last month?
Backups and Archives
• SAN backups and snapshots• With date based sharding:
– Dump a table– Copy it elsewhere– Drop it / Flush it (if archiving)
Outsource to the cloud
• Why waste resources doing things that cloud based services do better (where legal, security and privacy reasons allow?)
• Maps• Email delivery• Even phone integration
Data Type Optimization
• When prototyping a system and designing a database schema, its easy to be sloppy with your data types, and fields
• DONT BE• Use as little storage space as you can
– Ensure the data type uses as little as you can– Use only the fields you need
Sharding: An excuse
• Sharding was a large project for us, and involved extensive re-architecting of the system.
• We had to make changes to every query we have in our code
• Gave us an excuse to:– Optimise the queries– Optimise the indexes
Query Optimization
• Run every query through EXPLAIN EXTENDED
• Check it hits the indexes• Remove functions like CURDATE from
queries, to ensure query cache is hit
Index Optimization
• Keep it small• From our legacy days of one database on
one server, we had a column that told us which vehicle the data related to– This was still there...as part of an
index...despite the fact the application hadn’t required it for months
Live data: dashboard
Live data: Maps
Live data
• Original database design dictated:• Each type of data point required a separate
query, sub-query or join to obtain
• Collection device and processing service dictated:• GPS Co-ordinates can be up to 6 separate
data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction
Dashboards: Caching
• Don’t query if you don’t have to
• Cache what you can; access direct
• With message queuing its possible to route messages to two or more places: one to be processed and another to display the latest information directly
Exporting data: Group
• Where possible group exports and reports together by the same shard/table/index
Code considerations
• Race conditions• Number of concurrent requests – group
them
Application Quality
• When dealing with lots of data, quickly, you need to ensure:– You process it correctly– You can act fast if there is a bug– You can act fast when refactoring
Deployment
• When dealing with a stream of data, rolling out new code can mean pausing the processing work that is done
• Put deployment measures in place to make a deployment switch over instantaneous
Technical Tips
• Measure your applications performance, data throughput and so on– A data at scale problem itself
• Use as much RAM on your servers as is safe to do so– We give 80% per DB server to MySQL of 100
– 140GB
What do we have now?• Now we have a fast, stable reliable system• Pulling in millions of messages from a queue per
day• Decoding those messages into 1.5 billion data
points per day• Inserting 1.5 billion data points into MySQL per
day• Performance data generated, and grant
authority reports exported daily• More sleep on a night than we used to
Questions