(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Post on 20-Mar-2017

1.744 views 0 download

Transcript of (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Keith Blizard, Bob Tordella

October 2015

Self-service Cloud Services

How J&J Is Managing AWS at Scale

for Enterprise Workloads

ARC305

What to Expect from the Session

- Reviewing Enterprise Challenges & Incorporating Cloud Capabilities

- Provide approach for enabling Enterprise Controls

- Example Architecture & Implementations

- Example Patterns (HPC & Workspaces)

- Lessons Learned

J&J is a Global Health Care Leader

More than 270 Operating Companies in 60 Countries, with 126,000 employees

Selling Products in more than 175 Countries

The world’s sixth-largest consumer health, pharmaceuticals, and biologics company

The world’s largest medical devices and diagnostics business

Big Company, Big Challenges

Thousands of Systems

Complex IT Ops

Limited Financial Impact

Cloud Patterns & Acceleration

Automated IT Cost Transparency

Current State of Enterprise IT Cloud Strategy Offers Agility

Transformation to a Flexible Hybrid Cloud Strategy

N. America

DC

Provides complete infrastructure platform through

Amazon Web Services and integrated with J&J

processes and policies

On-Premise Cloud (OPCx)Virtual Private Cloud (VPCx)

Provides a highly flexible reference architecture (built

on VMware stack) to deliver ‘on-demand’ VMs inside

our Enterprise Data Centers or Co-location facilities

in each region

Europe

DCAP DC

Compliance Data Protection Operation Transparency Speed + Agility

N. America

Region

Europe

RegionAP Region

Virtual Private Cloud (VPCx) VisionEmpower the business by providing an integrated, scalable, secure self-service cloud IT platform that

enables agility, enforces policy, and accelerates best practices

Enable Agility

• Self Service

• Rapid Provisioning

• Capacity Mgmt.

• Full stack Availability

Ensure Policy

• AD Integration

• J&J AMIs

• Enterprise Logging

• Backup & Retention

• Firewall & Security Rules

Accelerate Best Practice

• Monitoring & Alerts

• VM Scheduling

• Encryption

• Software Config. Mgmt.

Enterprise Control without the Bottleneck

Preventative Controls

Detective Controls

Core principles for security,

compliance & management

Enforce Least Privilege Approach

Log Everything

J&J Identity & Group

Management

J&J Network Extension

Enforce our Images

Account Isolation

xbot

Big Data Account

Workspaces

Account

Xbot / Management Architecture

AWS Services

VPCx

Help

Assurance

Monitor

VPCx

DB

xbot

Admin

AD

Console

Billing

AWS

Console

Billing

Project Owners

VPCx Administrators

HPC Account

• Centralized Policy Enforcement - xbot

• Each Application Account is completely

isolated from each other

• Controls are executed through both

Assurance and Enforcement tests run

every 10 minutes

• Tickets are created for drift to

allowable values

Enterprise Control - Queue Management & Automation

Work

Queue

Work

Items

API Execution @

Each Account:

List, Info, Delete,

Update, Setup,

Admin, Login

Metadata:

Project Details,

Allowable Cloud Objects,

Chargeback,

Acceptable Values

Ex: HPC Account

Ticket

System

image = project.get_ec2_images(project_info['Id'], region, image_ids=image_id)

images = []

for img in image_objs:

unserialized_obj = binascii.a2b_qp(img['image'])

images.append(img)

instance_info[key][i.id]['Name']=i.tags.get('Name', '')

instance_info[key][i.id]['Env'] = i.tags.get('Environment', '')

instance_info[key][i.id]['Hostname'] = i.tags.get('Hostname', '')

instance_info[key][i.id][’ImageId'] = i.tags.get(’ami-id', ‘’)

If instance_info.img_id != allowable value

error.name = ‘instance-value-error’

error.value = instance_info

create_support_ticket(error.name=‘instance-value-error’)

Sample Control – Only Allowing Approved Images

Amazon DynamoDB – Project Metadata

Amazon DynamoDB – Project Level Exceptions

CLI – Automation – Member Info

User Level Information

And access list

CLI – Automation – Project Info

Project Lists including

account-code and

friendly name

CLI – Automation – Project Info

Project Metadata

Project Level Service

Listing

CLI – Automation – Adding Services

Adding New Service

for this Project

CLI – Automation – Project Info

New Service Added with

corresponding IAM

roles, policies

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

AWS Account & Infrastructure Layer Control

Xbot Account

Payer

Account

(Consolidated

Billing)

Consolidated Billing

Xbot Administration

Scalable to 1000s of accounts

App AWS

Account

(001)

Core

Project

Services

Users

Alarms

HPC

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

Core

Project

Services

Users

Alarms

HPC

Operating System & Database Layer Control

Xbot Account

App AWS Account (001)

RDS Amazon

RedshiftEC2

Operating System Database

Managing Amazon Redshift Controls

Encrypt

Sensitive Data

Work

Queue

Work

Items

Account Metadata:

Ex: HPC Account

Ticket

System

Checks 100s of

accounts every 10 min

for new instance;

enforces policy

AD Security

Group Sync

xbot

KMS

Sample Control ― Managing Redshift

audit policy requires:

# rotate_master_passwords=1hour

# apply_cw_metrics=95%CPUutil>60mins;85%DiskUsed>60mins;HealthStatus<1=10mins

# require_ssl=True

# enable_user_activity_logging=True; bucket_name=RegionalS3LogBucket

# backup_retention_period=35days

# modify_cluster(master_user_password=newpassword)

# publicly_accessible=False

# add_tags=‘Environment’;’Production’

# rotate_user_passwords=90days

# sync_users=(conn.rscluster)

## add users, set groups, revoke public schema

## drop users, move schema ownership

User Federates into Account

User creates Cluster

Cluster Created

Within 10 minutes,

xbot takes over

Master User

Master User

Password is reset by

xbot every hour

Master User takes over, abstracts

itself by syncing with AD Security

Groups tied to that AWS Account

Begins to build a Profile / Group

Grants various permissions to group

and associates DBAs

Revokes Access to Public Schema to

ensure least privilege

Xbot detects new Cluster;

applies CloudWatch Alarms

Alarms

Xbot enables logging & sets

the maximum backup retention

Xbot updates Parameter Group

for SSL & User Activity Logging

Xbot resets the

parameter group

within 10 minutes to

enforce policy

Xbot notifies users of

the changes to their

environment

Enterprise Log Management

Queries logs

out of DB

Rotates logs

every week

Temp Location

for Log Movement

Elastic Load

Balancing

S3

Amazon

Redshift

Data Pipeline

EMR

CloudFrontCloudTrail Config

EC2

RDS

Regional S3

Logging Bucket

No API Action to

send DB user

Activity Logs to S3

Regional S3

Logging Bucket

Copies to S3

Bucket

EC2 Elastic Load

BalancingS3EBS Amazon

Glacier

RDS Amazon

Redshift

Compute Storage & Content Delivery Database

AWS Components Orchestrated

DynamoDB

Amazon

Kinesis

Data Pipeline

EMR

VPC Direct Connect

Auto Scaling

CloudFront ElastiCache

CloudFormation CloudWatchCloudTrail

IAM SESSNSCloudSearch SQSSWF Python (boto)

WorkSpacesWorkDocs

Directory

Service

Trusted

Advisor Config

Networking Management Tools

Enterprise Applications

Common Architecture Pattern for Big Data or HPC

us-east-1 (10.X.X.X/25)

us-east-1a

10.X.X.0/27

us-east-1b

10.X.X.32/27

Connected VPC

VPC Peering

Amazon S3

Win/Lin

EC2

DynamoDB

us-east-1 (10.X.X.X/19)

Disconnected VPC for EMR

IGW

us-east-1a

10.X.0.X/21

us-east-1b

10.X.7.X/21

us-east-1c

10.X.15.X/20

Burst High Performance Computing (HPC) workloads

in Private Address Space in same Account

Take advantage of multiple

subnets / AZs for Spot

Instance Pricing

Common Use Cases

• Statistical Analysis on large data sets; e.g.

Genomic Sequencing

• Transformations of large complex data sets for

Advanced Analytics (Sales & Supply Chain)

• Machine Learning engines on unstructured or

non-relatable data

Large volumes of

Structured & Unstructured

DataDirect Connect

VGW

On-Premise Internal Data SourcesAdmins

OIA

J&JDCs

JJNET

MFA

SCCM Site & DP

J&J Resources J&J Facility

Zero Client

ELB

Workspaces Account

Infra Comp Account

Core Infra Account Zero Client Account

TeradiciConnection

Manager

Workspaces Architecture Patterns

Comments

• Global implementation across NA, EMEA and AP

• Infrastructure components living within AWS for scale,

performance and management

• J&J Network extended into AWS

Tradeoff / Lessons Learned

- DevOps is heavily recommended for approach to cloud. Focus on

velocity of new capabilities & operational improvements

- Security Engagement and Partnership is critical

- Identify, Design and remain Diligent with your Cloud Principles

- Early evaluation with CMP – focus has been too much on IaaS &

Provisioning only

- Partnership with 3rd Party is crucial (Log Management, Web

Application Firewall, Utilization & Spend)

- Training of Enterprise IT Users is critical

Key Takeaways

- Lean into PaaS services

- Enable agility of the cloud to your end users through self-service

- Automate your enterprise controls

- Unleash power of the cloud for small to large patterns

Thank you!

Contact Details:

Keith Blizard – kblizard@its.jnj.com

Bob Tordella - btordell@its.jnj.com

Remember to complete

your evaluations!