Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered...
Transcript of Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered...
![Page 1: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/1.jpg)
Using Azure Data Services for Modern Data Applications
Lara Rubbelke | Principal SDE | Microsoft
Allan Mitchell | Cloud Architect | elastabytes
![Page 2: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/2.jpg)
key concepts for modern database architectures
database / datastoretypes
reasons to go explore
OutcomesWe want you to leave here understanding:
![Page 3: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/3.jpg)
Agenda
• Context of Lambda Architecture
• Ingestion
• Hot Path• Processing
• Cold Path• Processing
• Staging
• Enrichment and Serving
![Page 4: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/4.jpg)
4 Questions
![Page 5: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/5.jpg)
![Page 6: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/6.jpg)
![Page 7: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/7.jpg)
?
![Page 8: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/8.jpg)
What Will Happen?
![Page 9: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/9.jpg)
Lambda Architecture
![Page 10: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/10.jpg)
The Old and the New Data Processing
![Page 11: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/11.jpg)
Lambda Architecture – High Level View
Process Stream Increment views
All data Precompute views
Partial aggregate
Partial aggregate
Partial aggregate
Batch views
Real-time data
Real-time views
![Page 12: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/12.jpg)
Lambda Architecture – Detailed View
Inbound Data
BufferedIngestion
(message bus)
Event Processing Logic
Event Decoration
Spooling/Archiving
Data Movement/ Sync
Hot Store
Analytical Store
Serving andConsumption
Curation
Dashboards/Reports
Exploration
Interactive
![Page 13: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/13.jpg)
Lambda Architecture – Detailed View
Inbound Data
BufferedIngestion
(message bus)
Event Processing Logic
Event Decoration
Spooling/Archiving
Hot Store
Analytical Store
Curation
Dashboards/Reports
Exploration
Interactive
StagingProcessingIngestion
Data Movement/ Sync
Serving andConsumption
![Page 14: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/14.jpg)
Cortana Intelligence Gallery Demo
![Page 15: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/15.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
Enrichment and Curation
![Page 16: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/16.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
Event HubsIoT Hubs
Service BusKafka
HDInsightADLAStormSpark
Stream Analytics
ADLSAzure StorageAzure SQL DB
ADLSAzure DW
Azure SQL DBHbase
CassandraAzure Storage
Power BI
Enrichment and CurationAzure Data Factory Azure ML
![Page 17: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/17.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
Event HubsIoT Hubs
Service BusKafka
Enrichment and Curation
![Page 18: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/18.jpg)
Ingestion
Data pushed to a broker for further processing
Data pushed to storage for further processing
Typically indicative of streaming approaches.
Best alignment with many existing systems.Use scheduled jobs to synchronize or move data.Typical “batch” processing.
![Page 19: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/19.jpg)
Ingestion Options
• IoT Hub• Bi-Directional Communication
• Event Hub• Device to cloud mass ingestion
• Service Bus• Complex filters and processing rules
• Kafka• Common OSS integration
• RabbitMQ• More common OSS integration, run in IaaS
![Page 20: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/20.jpg)
IoT Hub
![Page 21: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/21.jpg)
Azure IoT Suite: IoT Hub
Connect millions of devices to a partitioned application back-end
Devices are not servers
Use IoT Hub to enable secure bi-directional comms
Device-to-cloud and Cloud-to-device
Durable messages (at least once semantics)
Delivery receipts, expired messages
Device communication errors
Individual device identities and credentials
Single device-cloud connection for all communications (C2D, D2C)
Natively supports AMQP, HTTP
Designed for extensibility to custom protocols
Device SDKs available for multiple platforms (e.g. RTOS, Linux, Windows)
Multi-platform Service SDK.
![Page 22: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/22.jpg)
Setup
• Retention Period• Event Hub Endpoint• C2D settings
![Page 23: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/23.jpg)
Device provisioning
• Many systems involved(IoT Hub, device registry, ERPs, …)
• Device identity(composite devices, many concerns)
1. Device provisioned at manufacturing into system
2. Device connects for the first time and gets associated to its regional data center (bootstrapped)
3. As a result of customer interactions the device is activated
4. Devices can be deactivated for security and other reasons
5. A device can also be de-provisioned at end-of-life or decommission.
![Page 24: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/24.jpg)
IoT HubOpening up the channels
![Page 25: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/25.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
Stream AnalyticsStorm
Spark Streaming
Enrichment and Curation
![Page 26: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/26.jpg)
Processing
Multiple processing ”instances” can work off of the broker.
Must think about the end use:- Indexing for operational dashboards- Transformation/Curation for persistence- Spooling to an alternate store
Multiple processing ”instances” can work on top of dis-aggregated storage
Lot’s of options abound depending on the need:- Relational engines- Big Data solutions / in-mem or not- Other
![Page 27: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/27.jpg)
Bounded vs. Unbounded Processing
![Page 28: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/28.jpg)
Stream Options
• Azure Stream Analytics
• Spark Streaming
• Storm
![Page 29: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/29.jpg)
Azure Stream Analytics
![Page 30: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/30.jpg)
What is Azure Stream Analytics?
• Cost effective event processing engine
• SQL-like syntax
• Naturally integrated with Azure IoT Hub and Event Hubs
![Page 31: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/31.jpg)
Azure Stream AnalyticsEnd-to-End Architecture Overview
Event Inputs- Event Hub- IoT Hub- Azure Blob
Transform- Temporal joins- Filter- Aggregates- Projections- Windows- Etc.
Enrich
Correlate
Outputs- SQL Azure- Azure Blobs- Event Hub- ADLS- PowerBI- …
AzureStorage
• Temporal Semantics• Guaranteed delivery• Guaranteed up time
Reference Data- Azure Blob- Azure SQL DB
![Page 32: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/32.jpg)
Inputs sources for a Stream Analytics Job
• Currently supported input Data Streams
are Azure Event Hub , Azure IoT Hub and
Azure Blob Storage. Multiple input Data
Streams are supported.
• Advanced options lets you configure how
the Job will read data from the input blob
(which folders to read from, when a blob
is ready to be read, etc).
• Reference data is usually static or changes
very slowly over time.
• Must be stored in Azure Blob
Storage.
• Cached for performance
![Page 33: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/33.jpg)
Output for Stream Analytics Jobs
Currently data stores supported as outputs
Azure Blob storage: creates log files with temporal query resultsIdeal for archiving
Azure Table storage: More structured than blob storage, easier to setup than SQL database and durable (in contrast to event hub)
SQL database: Stores results in Azure SQL Database tableIdeal as source for traditional reporting and analysis
Event hub: Sends an event to an event hubIdeal to generate actionable events such as alerts or notifications
Service Bus Queue: sends an event on a queueIdeal for sending events sequentially
Service Bus Topics: sends an event to subscribersIdeal for sending events to many consumers
PowerBI.com:Ideal for near real time reporting!
DocumentDb:Ideal if you work with json and object graphs
![Page 34: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/34.jpg)
ASA: Three Types of Windows
• Every window operation outputs events at the end of the window• The output of the window will be single event based on the aggregate
function used. The event will have the time stamp of the window
• All windows have a fixed length
Tumbling windowAggregate per time interval
Hopping windowSchedule overlapping windows
Sliding windowWindows constant re-evaluated
![Page 35: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/35.jpg)
Multiple Steps, Multiple Outputs
• A query can have multiple steps to enable pipeline execution• A step is a sub-query defined using WITH
(“common table expression”)
• The only query outside of the WITH keyword is also counted as a step
• Can be used to develop complex queries more elegantly by creating a intermediary named result• Each step’s output can be sent to multiple
output targets using INTO
WITH Step1 AS (
SELECT Count(*) AS CountTweets, Topic
FROM TwitterStream PARTITION BY
PartitionId
GROUP BY TumblingWindow(second, 3),
Topic, PartitionId
),
Step2 AS (
SELECT Avg(CountTweets)
FROM Step1
GROUP BY TumblingWindow(minute, 3)
)
SELECT * INTO Output1 FROM Step1
SELECT * INTO Output2 FROM Step2
SELECT * INTO Output3 FROM Step2
![Page 36: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/36.jpg)
Azure Streaming Analyticswhere the smarts happen
![Page 37: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/37.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
HDInsightADLASpark
ADLSAzure StorageAzure SQL DB
Enrichment and Curation
![Page 38: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/38.jpg)
Batch Processing
• Azure Data Lake
• HDInsight
• Spark
![Page 39: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/39.jpg)
• Integrated analytics and storage
• Fully managed
• Easy to use–“dial for scale”
• Proven at scale
• Analyze data of any size, shape or speed
• Open-standards based
Azure Data Lake
48
YARN
HDFS
HDInsightAnalytics Service
Store
Partners
U-SQL
Clickstream
Sensors
Video
Social
Web
Devices
Relational
Applications
![Page 40: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/40.jpg)
• Batch processing
• Map…and reduce
• Lots of aggregation
• Multiple schemas on same data
• Fast
• Patterns/What Works • Anti-Pattern/DangerAnything that requires:
• Joins
• Complex transactional needs
• Granular security requirements
• Not a relational database replacement
• Not fast
HDInsight
![Page 41: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/41.jpg)
ADL Store Unlimited Scale
• Optimized for analytics and IoT systems.
• Each file in ADL Store is sliced into blocks, distributed across multiple data nodes in the backend storage system.
• With sufficient number of backend storage data nodes, files of any size can be stored.
• Backend storage runs in the Azure cloud which has virtually unlimited resources.
Azure Data Lake Store file
…Block 1 Block 2 Block 2
Backend Storage
Data node Data node Data node Data node Data nodeData node
Block Block Block Block Block Block
![Page 42: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/42.jpg)
ADL Store: High Availability and Reliability
• Azure maintains 3 replicas of each data object per region across three fault and upgrade domains
• Each create or append operation on a replica is replicated to other two
• Writes are committed to application only after all replicas are successfully updated
• Read operations can go againstany replica
• Provides ‘read-after-write’ consistencyData is never lost or unavailable
even under failures
Replica 1
Replica 2 Replica 3
Fault/upgradedomains
Write Commit
![Page 43: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/43.jpg)
ADL Store: Enterprise Grade Security
• Auditing, alerting, access control - all from within a single web-based portal
• Azure Active Directory integration for identity and access management
![Page 44: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/44.jpg)
Apache Spark – An Unified Framework
An unified, open source, parallel, data processing framework for Big Data Analytics
Spark Core Engine
Spark SQLInteractive
Queries
SparkStreaming
Stream processing
Spark MLlibMachineLearning
GraphXGraph
Computation
Yarn MesosStandalone Scheduler
Intro to Apache Spark (Brain-Friendly Tutorial): https://www.youtube.com/watch?v=rvDpBTV89AM
![Page 45: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/45.jpg)
What makes Spark fast?
Read fromHDFS
Write toHDFS
Read fromHDFS
Write to HDFS
Read fromHDFS
![Page 46: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/46.jpg)
Spark (Preview) on Azure HDInsight
• Fully Managed Service• 100% open source Apache Spark and Hadoop bits• Latest releases of Spark• Fully supported by Microsoft and Hortonworks• 99.9% Azure Cloud SLA
• Coming Soon: Advanced Enterprise Features• Integration with Azure Data Lake Store• Role based security and audit• Encryption at rest and in transit• Certifications: PCI in addition to existing ISO 27018, SOC, HIPAA, EU-MC
![Page 47: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/47.jpg)
Optimized for Data Scientist Productivity
• On-demand compute• Dynamically scale cluster to 1000s of cores to compress time of the ML job
• Coming Soon: Auto-scale during job execution or time-based
• Tools for experimentation and development• Jupyter Notebooks (scala, python, automatic data visualizations)
• IntelliJ plugin (integrated job submission, remote debugging)
• ODBC connector for Power BI, Tableau, Excel, etcon Spark
• ML algorithms in R parallelized using Spark
• R Studio
![Page 48: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/48.jpg)
Resilient Distributed Datasets (RDDs)Lowest level set of object representing data, can be stored in memory or disk across a cluster.
DataFrameHigher level abstraction API.A distributed collection of rows organized into named columns. RDD with schema and optimizations
DataSet APIsExtension of Spark’s DataFrame API that supports static typing and user functions that run directly on existing JVM types (such as user classes).Compile time type safety with optimizations.“Preview”.
Spark 2.0 will unify these APIs
Basic building blocks
57Structuring Spark: DataFrames, Datasets, and Streaming: https://www.youtube.com/watch?v=i7l3JQRx7Qw&feature=youtu.be
![Page 49: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/49.jpg)
RDDs vs DataFrames vs DataSetsWhich one to use?
http://www.slideshare.net/databricks/structuring-spark-dataframes-datasets-and-streaming
![Page 50: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/50.jpg)
Spark Cluster Architecture
60
Cluster Manager
Worker Node Worker Node Worker Node
HDFS
Driver ProgramSparkContext
![Page 51: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/51.jpg)
RDDs: Transformations and Actions
61
Obviously does not apply to persistent RDDs.
RDDRDD
RDDRDD
RDD
transformations Valueactions
![Page 52: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/52.jpg)
Developing Spark Apps with Notebooks
62
![Page 53: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/53.jpg)
Jupyter
63
Jupytr Interactive web-based Notebook
Jupyter Qt console
Jupyter Terminal console
Notebook viewer (nbviewer)
full list here
![Page 54: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/54.jpg)
Integration with BI Reporting Tools
64
![Page 55: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/55.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
Enrichment and Curation
ADLSAzure DW
Azure SQL DBHbase
CassandraAzure Storage
Power BI
Azure Data Factory Azure ML
![Page 56: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/56.jpg)
Dashboards Interactive Exploration
Serving
![Page 57: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/57.jpg)
Serving
Typically serving here is constrained:- Constrained on particular access patterns- Constrained on dashboard scenariosServing here optimized for reducing “observation latency”
Typically serving here can be unconstrained- Still used for dashboards- Used for data exploration and ML- Used for interactive BI- Often used for broad sharing
![Page 58: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/58.jpg)
Azure DW and Analytical WorkloadsStore large volumes of data.
Consolidate disparate data into a single location.
Shape, model, transform and aggregate data.
Perform query analysis across large datasets.
Ad-hoc reporting across large data volumes.
All using simple SQL constructs.
![Page 59: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/59.jpg)
ADL and SQLDW
![Page 60: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/60.jpg)
Pattern: Compute consumption
![Page 61: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/61.jpg)
Pattern: SaaS customer isolation
![Page 62: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/62.jpg)
Logical overview
![Page 63: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/63.jpg)
Distributed queries
![Page 64: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/64.jpg)
Azure Data Warehousewhere the smarts happen
![Page 65: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/65.jpg)
Interactive and Exploration
![Page 66: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/66.jpg)
Interactive and Dashboards
![Page 67: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/67.jpg)
Ingestion
Modern Data Lifecycle
Processing Staging Serving
Event HubsIoT Hubs
Service BusKafka
HDInsightADLAStormSpark
Stream Analytics
ADLSAzure StorageAzure SQL DB
ADLSAzure DW
Azure SQL DBHbase
CassandraAzure Storage
Power BI
Enrichment and CurationAzure Data Factory Azure ML
![Page 68: Using Azure Data Services for Modern Data … Architecture –Detailed View Inbound Data Buffered Ingestion (message bus) Event Processing Logic Event Decoration Spooling/Archiving](https://reader034.fdocuments.net/reader034/viewer/2022051601/5ada225b7f8b9a6d318c4ed6/html5/thumbnails/68.jpg)
key concepts for modern database architectures
database / datastoretypes
reasons to go explore