Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL •...

38
Learn. Connect. Explore. Learn. Connect. Explore.

Transcript of Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL •...

Page 1: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Learn. Connect. Explore.Learn. Connect. Explore.

Page 2: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Architecting Open source solutions on Azure

Nicholas Dritsas

Senior Director, Microsoft Singapore

Page 3: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Agenda

• Developing OSS Apps on Azure

• Customer case with OSS Apps

• Hadoop on Azure

• Customer cases using Hadoop on Azure

Page 4: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Agenda

• Developing OSS Apps on Azure

• Customer case with OSS Apps

• Hadoop on Azure

• Customer cases using Hadoop on Azure

Page 5: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Flexible

Page 6: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Open Source & Azure

• Android, iOS & Node.js back-end via Azure Mobile Services

• Java, Ruby SDKs via Linux VM, Engine Yard & Oracle

• Websites for PHP, Node.js, Python & App Gallery

• MySQL via ClearDB, MongoDBvia MongoLab, Hadoop

• From Linux VMs via Image Gallery & VMDepot

Page 7: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Configuration

Page 8: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Example

Technologies What It Provides

Key/value stores

Column family stores

Document databases

Redis, Microsoft Azure

Tables and Cache

Fast access to large amounts of simply

structured data

Cassandra, HBase

MongoDB, CouchDB

Example Use Case

Fast access to large amounts of more structured data

Scalable store for JSON documents

Online shopping cart

A table storing web pages

Persistent store for Node.js application

Page 9: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Agenda

• Developing OSS Apps on Azure

• Customer case with OSS Apps

• Hadoop on Azure

• Customer cases using Hadoop on Azure

Page 10: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Migrating an end to end airline online system to Azure

Page 11: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Background

• FlyAir has very aggressive growth plans. As such, they expect their growth rates to be very high and they need to plan for better systems.

• The current systems are based on OSS. Centos/Ubuntu Linux OS running PHP and MySQL.

• FlyAir’s system consists of the following 4 main areas:• B2C, where they host the main web page and consumer interaction for

booking or managing flights directly.

• B2T, where they support the travel agencies and where the majority of the revenue is coming from

• B2M, mobile users support

• B2B, for corporate accounts

Page 12: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Migration process

• We moved all these 4 systems from on premises to Azure in a few weeks.

• The system is hosted in Singapore Data Center and it consists of a number of Large/Extra Large Ubuntu/CentOS VMs that host PHP for the front end and MySQL for the backend.

• HA is achieved using Azure Load Balancer, VM Availability sets and MySQL replication.

• Site to site VPN was established using a Cisco device to support connectivity to on premises LOB systems plus ticketing interface to Amadeus (centralized ticketing system).

Page 13: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Infrastructure view of B2C

Page 14: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Current state and futures

• System has been running stable and well performant since November 2013.

• FlyAir plans to add DR site in Hong Kong data center and utilize Traffic Manager and Resource Groups to manage failover/failback process.

• SCOM and Newrelic tools are used to monitor the sites and manage alerts and resource warnings.

Page 15: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Agenda

• Developing OSS Apps on Azure

• Customer case with OSS Apps

• Hadoop on Azure

• Customer cases using Hadoop on Azure

Page 16: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Azure HDInsight

Page 17: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

HDInsight Supports Hive

• SQL-like queries on Hadoop data in HDInsight• HDInsight provides easy-to-use graphical query interface for Hive

• HiveQL is a SQL-like language (subset of SQL)

• Hive structures include well-understood database concepts such as tables, rows, columns, partitions

• Compiled into MapReduce jobs that are executed on Hadoop

• Dramatic performance gains with Stinger/Tez• Stinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with Hive

• Brings query execution engine technology from Microsoft SQL Server to Hive

• Performance gains up to 100x

Hadoop 2.0

Page 18: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

HDInsight Supports HBase

Data Node Data Node Data Node Data Node

Task Tracker Task Tracker Task Tracker Task Tracker

Name Node

Job Tracker

HMasterCoordination

Region Server Region Server Region Server Region Server

• NoSQL database on data in HDInsight

Page 19: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

HDInsight Supports Mahout

• Machine learning library

Page 20: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

HDInsight Supports StormComing Q4, CY2014

• Stream analytics for Near-Real Time processing

Page 21: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Connect Cloud Hadoop With On-premise

Page 22: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Scenarios For Deploying Hadoop As Hybrid

Page 23: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Agenda

• Developing OSS Apps on Azure

• Customer case with OSS Apps

• Hadoop on Azure

• Customer cases using Hadoop on Azure

Page 24: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Hadoop customer cases

1. Data Broker Company

Page 25: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Company Profile

Who is the customer

• Customer is a Seattle-based cloud software company, focused

exclusively on opening access to government data.

• SaaS government public set platform accessible via web, mobile,

and restful interfaces

Product details

• Open Data Platform

• GovStat insights and analytics

• API Foundry

Page 26: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Business Problem

Project Milestones

• M1: migration of Open data platform to Azure with 4-6 design validation

customers. Scaled down and ramp up as needed. Support and escalation

path defined for PFE.

• ~150 cores and 1.5 TB of data to be served for this phase

• M2: support up to100 customers. DR, monitoring and alerting

enhancements, compliance validation against FISMA/FedRamp. OData

integration, Windows 8 .NET application, Windows phone .NET

application, SQL IS integration for willing customers, Windows Azure

Marketplace integration and Localization.

• M3: IS integration completion post GA, OData enhancements, HDInsight

integration, Office 365 integration and PaaS transition study. 10 months

after M2.

Page 27: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Catalog

• Published Search API

• DCAT API

• Search over:

• Metadata

• Dataset contents

• Filters based on:

• View/Visualization

type

• Category

• Tags

• Geography

• Sorting over catalog

• Dataset view on Catalog

Page 28: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Views

Four basic visualizations

• Tabular

• Maps

• Charts

• Calendars

Operations

• Export (CSV, JSON, XLSX,

XML/RDF)

• Group By, Filter, Order By

• SoQL Requests

• Create Derived Views

Dataset Only Operations:

• Upsert, Append, Replace

• CSV upload

Can be embedded using the

Data Player

Page 29: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

The Solution Architecture

Technology Landscape:

• ~120 cores of Ubuntu VMs in Production. ~50 VMs each in staging and

production environment.

• Standard 3-tier web application architecture

• Web tier is a RoR MVC application

• Application tier is Java deployed on Jetty, a servlet container

• REST API access to app layer. JAX-RS with Jersey

• SODA API

• Data tier is primarily PostgreSQL

• NoSQL options for monitoring, central service, rate limiting cache,

aggregate cache

• Deploys Redis, Cassandra, MongoDB for NoSQL

• Lucene based Orester service for search

• Zookeeper and ActiveMQ for coordination service, messaging, inter

process synchronization, discovery of services

• Miscellaneous for GeoServer, Monitoring, Alerting

• Deployment via Chef with azure-knife driver

• PureFTP for ftp uploads

Page 30: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

High Level Component Architecture

Page 31: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

High Level Role + Dataflow

Page 32: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Hadoop customer cases

2. Phone tracking and service company

Page 33: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Company 2 is providing technology protection services for mobile phones, consumer electronics, and

home appliance devices.

• Mobile telemetry scenario (uni-directional); data published from protected mobile devices

• Goal is to predict, detect and potentially mitigate failure conditions

• Business driver is improving customer claim experience; predicting customer escalation during claim

(self-service to agent), etc

• 6k events/second target (36M / day)

Page 34: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Project Overview

Page 35: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Business use cases

Page 36: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Ingestion Svc

Web Role(s)

Event Broker

Kafka

Predictive Maint.

Scoring

Customer Sat Scoring

Operational

Dashboard

Troubleshooting

Alerting

Blo

b

Sp

oo

ler

Azure Storage

Cloud ML

Cloud ML

Model Publishing

Model (Re)Training (Cloud

ML)

Orchestration (MDP)

Usage Reports & Analytics

Curated Data Sets for Self

Service

Insight

Backup &

device

telemetry

Call-Center and

Support-Site

logs

CRM Data

On-Premises

Anonymize

&

Synchroniz

e

Descriptive Analytics

Data Exploration

Insight

Page 37: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination

Your Feedback is Important

OPTION 3: Feedback stations outside the hall

Fill out evaluation of this session and help shape future events.

OPTION 1 OPTION 2

Page 38: Learn. Connect. Explore. - Microsoft... · • Deploys Redis, Cassandra, MongoDB for NoSQL • Lucene based Orester service for search • Zookeeper and ActiveMQ for coordination