Let’s Get Started… Stream Analytics, Azure Data Lake, Azure SQL Data Warehouse, Azure Data...
Transcript of Let’s Get Started… Stream Analytics, Azure Data Lake, Azure SQL Data Warehouse, Azure Data...
What is Cortana Analytics Suite
Cortana Analytics Suite is designed to deliver analytics as a service. It Transform data into intelligent
action. Data can be collected from Apps Sensors & Devices.
Cortana Analytics Suite are designed to take advantage on IoT .
Big Picture
A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions
Cortana Analytics Suite
Cortana Analytics Suite is comprised
Cortana Personal Assistant,
Power BI,
Azure HDInsight,
Azure Machine Learning,
Azure Stream Analytics,
Azure Data Lake,
Azure SQL Data Warehouse,
Azure Data Factory,
Azure Data Catalog and
Azure Event Hub.
What … is a cloud-based data integration service. ADF Ingest Data from various data sources, prepare, validate, transform and analyze the data with job schedule and then publish ready-to-use data for consumption .
Why
• Cloud based managed service
• No hardware & software required
• Pay as you use
• HDInsight compatible
• Less administrative effort
How 1. Reads data from the source data store.
2. Performs serialization/deserialization,
3. compression/decompression, column mapping,
and type conversion.
4. It does these operations based on the configurations
of the input dataset, output dataset, and Copy Activity.
5. Writes data to the destination data store.
Components
1. Define Architecture: Set up objectives and flow
2. Create the Data Factory: Portal, PowerShell, VS
3. Create Linked Services: Connections to Data and Services
4. Create Datasets: Input and Output
5. Create Pipeline: Define Activities
6. Monitor and Manage: Portal or PowerShell, Alerts and
Metrics
Linked services Linked services define the information needed for Data Factory to connect
to external resources .Represents either
a. data store
File system
On-premises SQL Server
Azure storage
Azure DocumentDB
Azure Data Lake Store
etc.
b. compute resource
HDInsight (own or on demand)
Azure Machine Learning Endpoint
Azure Batch
Azure SQL Database
Azure Data Lake Analytics
Data sets
Named references to data
Used for both input and output
Identifies structure
Files, tables, folders, documents
Internal or external
ActivitiesDefine actions to perform on data Zero or more input data sets
One or more output data sets Unit of orchestration of a pipeline
Activities for
data movement
data transformation
data analysis
Use WindowStart and WindowEnd system variables to select relevant data using a tumbling window
Pipelines
Logical grouping of activities
Provides a unit of work that performs a task
Can set active period to run in the past to back fill data slices
Back filling can be performed in parallel
Data movement Globally available service for data movement
Exactly one input and exactly one output
Support for securely moving between on-premises and the cloud
Automatic type conversions from source to sink data types
File based copy supports binary, text and Avro formats, and allows for conversion between formats
Data Management Gateway supports multiple data sources but only a single Azure Data Factory
Monitoring
Data slices may fail
Drill in to errors, diagnose, fix and rerun
Failed data slices can be rerun and all dependencies are managed by Azure Data Factory
Upstream slices that are Ready stay available
Downstream slices that are dependent stay Pending
Enable diagnostics to produce logs, disabled by default
Add Alerts for Failed or Successful Runs to receive email notification