Powering a Virtual Power Station with
Big DataMichael BironneauApril 2016
CCGTCoal
Nuclear
Wind
Interconnecto
rsOCGT
Pumped StorageSo
lar Oil
BiomassHydro
0
5
10
15
20
25
30
35
Installed Capacity (GW) Generation (GW)
02468
101214161820
Total PowerM
W
Average upwards flex – 120%
Average downwards flex – 35%
?
?
• 25-40k messages processed per second• Total size of data 500TB-800TB
Open Energi in the coming year:
• 25-40k messages processed per second• Total size of data 500TB-800TB
Open Energi in the coming year:
Perspective: here’s what “big data” means to Boeing [1]:• ~64k messages per second from each aircraft• Total size of data over 100 petabytes
[1]: http://bit.ly/18kQlMn
Open Energi Boeing0
20
40
60
80
100
120
Size of data (PB)
Our data is not huge at the moment…
…but after domestic demand-side response (or something else on that scale)
Open Energi Boeing0
20
40
60
80
100
120
Size of data (PB)
Why Hortonworks Data Platform
• Can scale quickly to respond to market demands• Interoperability with existing code• Fantastic data integration• Knowledgeable technical support• Security and data governance
Batch | Our HDP setup
Flume
Asset Data
National Electricity Data
Market data
Other “live” timeseries data
Hive Streaming
Hive
otherApplications
Real-time | (Work ongoing)
Asset Data
ML models
HDFS, cache, Elasticsearch…
Update ML ModelsCorrelate Events
Enrich
Apache Hive | Example
CREATE EXTERNAL TABLE semi_structured_stuff (...) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = ‘semi/structured',
'es.index.auto.create' = 'false') ;
SELECT something FROM semi_structured_stuffJOIN metadata m ON …LEFT JOIN timeseries t ON …
Index semi-structured data (Elasticsearch)
Use Hive to integrate this with timeseries data and other metadata
Farm out complex analytics to PythonSELECT transform(something) USING ‘insane_maths.py’AS (result)
Benefits
• Reduced storage cost compared to SAN + SQL Server• Better utilisation of infrastructure thanks to YARN• Pain-free integration of multiple data sources with external tables
in Hive• Scale up/down on demand• Re-use existing Python code = low development overhead
Dynamic Demand
SimulationsInsights via web
Machine learningStatistical Analysis
Event correlationExpert system
Real-time aggregationReal-time web feed
Dynamic Demand
SimulationsInsights via web
Machine learningStatistical Analysis
Event correlationExpert system
Real-time aggregationReal-time web feed
Thanks for listening. Any questions?
Top Related