AWS Certified Architect Professional Master Cheat Sheet

pg. 1

SKILLCERTPRO

AWS Certified Architect Professional Master Cheat Sheet Domain 1.0: High Availability and Business Continuity (15% of exam)

1.1 Demonstrate ability to architect the appropriate level of availability based on stakeholder requirements

1.2 Demonstrate ability to implement DR for systems based on RPO and RTO

1.3 Determine appropriate use of multi-Availability Zones vs. multi-Region architectures

1.4 Demonstrate ability to implement self-healing capabilities

Recovery Point Objective (RPO) – The acceptable amount of data loss measured in time. If your RPO is 1 hour and you have an event at 4pm, you have to restore data up to 3pm.

Recovery Time Objective (RTO) – The time it takes after a disruption to restore a business process to its service level. If your RTO is 1 hour and you have an event @ 4pm, you need to be back up and running

by 5pm. Benefits of using AWS for DR:

Min hardware required for data replication Pay as you use model Scalability Automate DR deployment (scripts, CFTs, etc…)

DRBD – https://aws.amazon.com/blogs/aws/redundant-disk/ AWS Services for DR (https://aws.amazon.com/disaster-recovery/) Read this: http://d36cz9buwru1tt.cloudfront.net/AWS_Disaster_Recovery.pdf

Multiple regions around the globe Storage

S3 – 11 9s & cross region replication Mission critical & primary data storage Redundantly stored on multiple devices across multiple facilities w/in a

region. Can use cross region replication to move from 1 to another

Glacier 3 hours or longer to recover a file (i.e. if your RTO is 15 min…)

Elastic Block Storage (EBS) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html Can create point-in-time snapshots of data volumes Use these snaps as the starting point of new EBS vols Protected long-term because they are stored on S3 Not automatic by default, must be scripted

AWS Import/Export Snowball (https://aws.amazon.com/importexport/) Import TO EBS, Glacier & S3 Can only export FROM S3

https://aws.amazon.com/blogs/aws/redundant-disk/

https://aws.amazon.com/disaster-recovery/

http://d36cz9buwru1tt.cloudfront.net/AWS_Disaster_Recovery.pdf

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html

https://aws.amazon.com/importexport/

pg. 2

SKILLCERTPRO

If you export from S3 bucket with versioning turned on, only latest version is exported

Use cases: Cloud Migration DR Datacenter Decommission Content Distribution

Direct Connect AWS Storage Gateways

Can be deployed on prem (ESXi or Hyper-V) or as an EC2 instance Can schedule snapshots Can use with Direct Connect Can implement bandwidth throttling (good for remote sites) If multiple sites, need one in each location Networking ports: 443 externally. 80 (activation only), 3260 (iSCSI), and

UDP53 (DNS) internally. Encrypted using SSL in transit and AES-256 @ rest. Stores data as EBS snaps in S3 Gateway cached

iSCSI based block storage local storage is frequently accessed data infrequently accessed data stored in S3 if link to AWS goes down, you lose access to your data each volume can go up to 32TB, 32 volumes supported (i.e. 1PB of

data can be stored) need storage for local cache and an upload buffer can take point in time incremental snaps of volumes and store in S3 as

an EBS snapshot Gateway stored

iSCSI based block storage for when you need your entire data set locally each volume can go up to 16TB, 32 volumes supported (i.e. 0.5PB of

data can be stored) can take point in time incremental snaps snaps provide durable off-site backup to S3 as EBS snap use snap of gateway stored volume as starting point of new EBS

volume which you can attach to an EC2 instance Gateway VTL/VTS

Get rid of your physical tape library infrastructure Virtual Tape Library -> backed by S3

Instant retrieval 1500 virtual tapes (150TB)

Virtual Tape Shelf -> backed by Glacier Up to 24 hours to get a virtual tape back Unlimited tapes

need storage for local cache and an upload buffer Software supported:

NetBackup 7.x Backup Exec 2012-15 MS System Center 2012 Data Protection Mgr Veeam 7 & 8

pg. 3

SKILLCERTPRO

Dell NetVault 10 Compute

EC2 EC2 VM Import Connector – Virtual Appliance that works with vCenter to convert

your VMware VMs into EC2 instances in AWS. Networking

Route53 ELB VPC Direct Connect

DBs Supported HA for Databases:

Oracle – RAC & data guard SQL Server – Always on availability groups & SQL mirroring MySQL – Asynch replication

RDS https://aws.amazon.com/documentation/rds/ and https://aws.amazon.com/rds/faqs/ Snapshot data from 1 region to another Can have a read replica running in another region available for MySQL, MariaDB, PostgreSQL, and Amazon Aurora Automatic failover in case of:

An Availability Zone outage The primary DB instance fails The DB instance’s server type is changed The operating system of the DB instance is undergoing software

patching A manual failover of the DB instance was initiated using Reboot with

failover RDS Multi-AZ failover (synchronous replication only):

MySQL, Oracle & PostgreSQL use synch physical replication to keep standby up to date with primary

SQL uses SQL server native mirroring tech. ALWAYS uses synchronous* High Availability Backups are taken from secondary (avoids I/O suspension) Restores are taken from secondary (same reason) Is NOT a scaling solution (use read replicas for scaling)

Read Replicas (asynchronous replication only): Read heavy DB workloads (duh) Serve reads while source DB is unavailable (maintenance, I/O

suspension, etc..) Business reporting When creating a new read replica, if multi-AZ is not enabled the snap

is of the primary (~1 min I/O suspension). If multi-AZ is enabled snap is taken from secondary DB. Read replicas themselves cannot be multi-AZ

When created, given a new end point DNS address Can be promoted to its own standalone DB. Can have up to 5 read replicas MySQL ONLY can have:

read replicas in different regions

https://aws.amazon.com/documentation/rds/

https://aws.amazon.com/rds/faqs/

https://aws.amazon.com/rds/faqs/

pg. 4

SKILLCERTPRO

read replicas of read replicas (further increases latency) Cannot snap or do automate backups of read replicas

DynamoDB Offers cross region replication via open source tool on

Github: https://github.com/awslabs/dynamodb-cross-region-library If application does NOT require Atomicity, Consistency, Isolation, Durability

(ACID) compliance, joins & SQL then consider DynamoDB rather than RDS (more on this in domain 5)

RedShift Snapshot data warehouse to be stored in S3 with in same region or copied

to another region Orchestration

CloudFormation ElasticBeanstalk OpsWork

DR Scenarios Backup & Restore $

Cheapest & most manual Longest RTO/RPO Select appropriate tool to backup data to AWS Ensure appropriate retention policy for data Ensure security measures are in place (encryption & access policies)

Pilot Light $$ Small, most critical core elements of systems in AWS. When you need recovery,

you can quickly provision a full scale production environment around the critical core

2 options for provisioning from a network perspective: Use pre-allocated IPs (& even MACs w/ ENIs) & assoc with instances when

invoking DR. Use ELB to distribute traffic to multiple instances, then update DNS to point

to AWS EC2 instance or to LB using CNAME. Everyone does this option J 1. Setup EC2 instance to replicate or mirror data 2. Have all supporting custom software available in AWS 3. Create & maintain AMIs of key servers where fast recovery is required 4. Regularly run/test/patch/update these servers

Warm Standby $$$ Scaled down version of a fully functional environment. Horizontal scaling is preferred over vertical scaling 1. Setup EC2 instance to replicate or mirror data 2. Create and maintain AMIs 3. Run app using minimal footprint 4. Patch/update these servers in line with prod To recover – scale up/out your AWS footprint, change DNS (or use Rt53

automated health checks) & consider autoscaling Multi-Site (active-active) $$$$

Most expensive and most automated Shortest RPO/RTO Runs on-site AND in AWS as active-active Use Rt53 to route traffic to both sites either symmetrically or asymmetrically.

Change DNS weighting to all AWS in the event of failure. Application logic potentially necessary in the event of site failure

https://github.com/awslabs/dynamodb-cross-region-library

pg. 5

SKILLCERTPRO

Automated Backups Services that have automated backups:

RDS Stored on S3 MySQL DB engine, only the InnoDB storage engine is supported MariaDB, only the XtraDB storage engine is supported Deleting a DB instance deletes all automated backups (manual snaps are not

deleted) Default retention period is one day (values are 0 – 35 days) Manual snap limits – 50 per region – does not apply to automated backups Restore allows you to change engine type (SQL Std to SQL Ent)

Elasticache (only Redis not Memcached) Redshift

Services that do NOT have automated backups EC2

Automate using CLI or Python Stored in S3 Snaps are incremental, charged for incremental space

Each snap still contains base data Domain 2.0: Costing (5% of exam)

2.1 Demonstrate ability to make architectural decisions that minimize and optimize infrastructure cost

2.2 Apply the appropriate AWS account and billing set-up options based on scenario

2.3 Ability to compare and contrast the cost implications of different architectures

EC2 types https://aws.amazon.com/ec2/ On-demand https://aws.amazon.com/ec2/pricing/on-demand/

Highest hourly rate, no commitments Ideal for auto scaling groups and unpredictable workloads Good for Dev/Test

Reserved: https://aws.amazon.com/ec2/pricing/reserved-instances/ Use cases:

Steady state, predictable usage Apps that need reserved capacity Upfront payments reduce hourly rate

Standard RI – get billed whether it’s powered on or off Use this when you expect to run the server 24/7 1 & 3 year contracts, 3 years cheapest There is a marketplace to sell RIs https://aws.amazon.com/ec2/purchasing-

options/reserved-instances/marketplace/ Payment options

All upfront (up to 68% off for 3-year term) Partial upfront (up to 60% off for 3-year term) No upfront (30% off for 1-year term)

Scheduled RI – Use it when you need a set amount of instances for a particular time slice

https://aws.amazon.com/ec2/

https://aws.amazon.com/ec2/pricing/on-demand/

https://aws.amazon.com/ec2/pricing/reserved-instances/

https://aws.amazon.com/ec2/purchasing-options/reserved-instances/marketplace/

https://aws.amazon.com/ec2/purchasing-options/reserved-instances/marketplace/

pg. 6

SKILLCERTPRO

Can only launch instances during that time slice If you launch outside of that window, you are billed on-demand Accrue charges hourly, but billed in monthly increments over term 1 year term commitment

Convertible Brand new, not on test yet Up to 45% off

Spot https://aws.amazon.com/ec2/spot/pricing/ Cheapest Flexible start and end times Grid computing and HPC Bidding type instance, if you are outbid:

Those instances drop with little notice Dedicated (2 types): https://aws.amazon.com/ec2/dedicated-hosts/faqs/

Host (most expensive) Instance (lose visibility into which host you are running on)

Modifying your RIs http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modifying.html Can switch AZs within same region Can change instance size within the same instance type Instance type modifications are supported, but only for Linux… but not RHEL or SUSE Cannot change instance size of Windows RIs See normalization chart for calculating footprint to modify a non-convertible

RI: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modification-instancemove.html

RDS RI’s https://aws.amazon.com/rds/reserved-instances/ RI Based on 5 criteria:

DB Engine

https://aws.amazon.com/ec2/spot/pricing/

https://aws.amazon.com/ec2/dedicated-hosts/faqs/

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modifying.html

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modification-instancemove.html

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modification-instancemove.html

https://aws.amazon.com/rds/reserved-instances/

pg. 7

SKILLCERTPRO

DB Instance Class Deployment Type License Model Region

Any of these 5 items change, RDS reverts to On demand How to configure cross account access.

Create any custom policies 1st Create role with cross account access Apply the policy to that role & note down the ARN Grant access to the role Switch to the role https://aws.amazon.com/blogs/aws/new-cross-account-access-in-the-aws-

management-console/ http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-

roles.html Why multiple AWS accounts?

Security Billing Growth through acquisition

Consolidated billing http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html Linked accounts that feed to one “Paying account” from 1 to 20 (or more with ticket)

“Linked Accounts” Paying account is independent & cannot access resources of linked accounts (and vice

versa) by default Easy to track charges and allocate costs Get volume discounts on all your accounts Unused reserve instances for EC2 are applied across the groups CloudTrail is on a per account and per region basis but can be aggregated into 1 bucket

in the paying account Tagging & Resource Groups

Great for sorting resources in a complex environment Can sort by multiple tag keys (prod, dev, test, app, whatever) By default, works within all regions, but can be filtered down to individual regions By default, works with all services, but can also be filtered Not all resource types support tagging, but you don’t need to know which for exam

Budgets and CloudWatch Alarms Used to track your current costs vs a set “budget” for a billing period Updated every 24 hours Does not show refunds Not automatically created by AWS Can be compared against AWS “estimated” costs to see how much budget is left over Must create budgets on the Payee (in the event of a consolidate billing scenario) account Can set alarms when you exceed actual or forecasted budgets, but you will still exceed.

It won’t stop chargeable services from running Domain 3.0: Deployment Management (10% of exam)

3.1 Ability to manage the lifecycle of an application on AWS

https://aws.amazon.com/blogs/aws/new-cross-account-access-in-the-aws-management-console/

https://aws.amazon.com/blogs/aws/new-cross-account-access-in-the-aws-management-console/

http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html

http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html

http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html

http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html

pg. 8

SKILLCERTPRO

3.2 Demonstrate ability to implement the right architecture for development, testing, and staging

environments

3.3 Position and select most appropriate AWS deployment mechanism based on scenario

CloudFormation https://aws.amazon.com/cloudformation/faqs/

Services supported http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-supported-resources.html#d0e194974 : API Gateway Auto scaling CloudFront CloudTrail CloudWatch CodeCommit CodeDeploy CodePipeline Data Pipeline DynamoDB EC2 ECS ElastiCache Elasticsearch Elastic Beanstalk Elastic Load Balancer (ELB) Elastic Map Reduce (EMR) GameLift Kinesis Lambda IAM (create roles, policies) IoT OpsWorks RDS Redshift Route53 S3 SimpleDB SNS SQS VPC Workspaces

Templates & Stacks: Templates:

Architectural designs Can create, update & delete templates Written in JSON or YAML Don’t need to figure out order/dependencies for provisioning AWS services.

CF takes care of that for you.

https://aws.amazon.com/cloudformation/faqs/

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-supported-resources.html

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-supported-resources.html

pg. 9

SKILLCERTPRO

AWS CloudFormation Design allows you to visualize your templates as diagrams & edit them using drag & drop interface.

Stacks: Deployed resources based on templates Can create, update & delete stacks using templates Can be deployed using AWS mgmt. console, CLI or APIs

CloudFormation Template (CFT) The blueprints for the house (in JSON or YAML format) The CloudFormation Stack is the actual house J Allows you to effectively apply version control to your AWS resources/infra Elements of the Template

File format & version number (mandatory) List of AWS resources and associated config values (mandatory) Template parameters (optional)

Input values that are supplied @ stack creation time Limit of 60

Output values (optional) Output values required once a stack has finished building (public IP,

ELB address, URL of completed web app, etc…) Limit 60

List of data tables (optional) Static config values (e.g. AMI names, Instance sizes, etc…)

Intrinsic Function Reference (Outputting Data) http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html : Fn::Base64 Condition Functions Fn::FindInMap Fn::GetAtt ß most likely to be on exam Fn::GetAZs Fn::ImportValue Fn::Join Fn::Select Fn::Sub Ref

Supports Chef & Puppet Integration: Deploy & configure down to the application layer Bootstrap scripts are supported:

Install packages, files, & services by describing them in the CFT Stack creation errors

By default, “automatic rollback on error” is enabled. You will be charged for resources that are provisioned, even if there is an error CF itself is free

Stacks can wait for applications: Provides a WaitCondition resource that acts as a gate, blocking creation of other

resources until a condition is satisfied. You can specify deletion policies:

Can specify that snaps be created of EBS vols or RDS DBs prior to deletion Can specify that a resource be preserved and not deleted when it’s stack is deleted

(e.g. S3 bucket)

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html

pg. 10

SKILLCERTPRO

You can update a stack after it’s created Can be used to create Roles in IAM:

Then used to grant EC2 instances access to those roles Creation & Customization of VPCs

Can specify IP address ranges (CIDR as well as individual IP addresses for specific instances)

Can specify pre-existing EIPs VPC Peering:

Can create multiple VPCs inside a single template Can enable VPC peering, but only within the same AWS account

Route53: CF can create new hosted zones or update existing ones , A Records, Aliases, C

Names, etc… Elastic Beanstalk https://aws.amazon.com/elasticbeanstalk/faqs/ :

Overview: Integrates with VPC Integrates with IAM Can provision RDS instances Full control of resources Code is stored in S3 Multiple environments are supported (for versioning) Changes from Git repositories are replicated Linux & Windows Server 2012 R2 supported

Ideal for Devs with no AWS experience that need to deploy quickly. They load their code up to Elastic Beanstalk and it takes care of the rest. Capacity provisioning Load balancing Auto-scaling Application health monitoring

Supported languages: Java .NET PHP Node.js Python Ruby Go Docker Web apps

CloudFormation supports Elastic Beanstalk, but Elastic Beanstalk will not provision CFTs With it you can:

Select the operating system that matches your application requirements (e.g., Amazon Linux or Windows Server 2012 R2)

Choose from several available database and storage options Enable login access to Amazon EC2 instances for immediate and direct

troubleshooting Quickly improve application reliability by running in more than one Availability

Zone Enhance application security by enabling HTTPS protocol on the load balancer Access built-in Amazon CloudWatch monitoring and getting notifications on

application health and other important events

https://aws.amazon.com/elasticbeanstalk/faqs/

pg. 11

SKILLCERTPRO

Adjust application server settings (e.g., JVM settings) and pass environment variables

Run other application components, such as a memory caching service, side-by-side in Amazon EC2

Access log files without logging in to the application servers Ways you can provision to Elastic Beanstalk:

Upload deployable code Push Git repository AWS Toolkit for Visual Studio & Eclipse allows you to do it straight from IDE

Updating – you can push updates from Git and only the deltas are transmitted Application files and (optionally) server log files are stored in S3. Elastic Beanstalk can automatically provision an Amazon RDS DB instance. The

information about connectivity to the DB instance is exposed to your application by environment variables.

Multiple environments are allowed to support version control Designed to support multiple running environments, such as one for integration

testing, one for pre-production, and one for production. Each environment is independently configured and runs on its own separate AWS resources. Elastic Beanstalk also stores and tracks application versions over time, so an existing environment can be easily rolled back to a prior version or a new environment can be launched using an older version to try and reproduce a customer problem.

FT = multi-AZ but not multi-region Supports VPC Security:

By default, app is publicly available Can use a VPC to provision a private, isolated section of your app in a virtual

network. This virtual network can be made private through specific security group rules, NACLs, and custom route tables.

Supports IAM Opsworks https://aws.amazon.com/opsworks/faqs/

A config mgmt solution with automation tools that enable you to model/control your apps and their supporting infra. AWS OpsWorks makes it easy to manage the complete application lifecycle, including resource provisioning, config mgmt, app deployment, software updates, monitoring, and access control using Chef.

What is Chef? Automation platform that turns your infra into code Automates how apps are configured, deployed & managed Chef server stores your recipes & other config data

Chef client (node) is installed on each server, VM, Container or networking device

Client periodically polls Chef servers latest policy & state of network If anything is out of date, client remediates

Designed for IT admins and ops-minded devs who want a way to manage apps of nearly any scale and complexity without sacrificing control.

Create a logical arch, provision resources based on that arch, deploy your apps and all supporting software and packages in your chosen configuration, and then operate and maintain the app through lifecycle stages such as auto-scaling events and software updates.

Turns infra into code – infra becomes versionable, testable, and repeatable A GUI to deploy & config your infra quickly

https://aws.amazon.com/opsworks/faqs/

pg. 12

SKILLCERTPRO

Consists of 2 elements, stacks & layers: Stack = group of resources Layer = a layer within the stack (i.e. load balancer layer, application layer, db layer,

etc…) 1 or more layers in a stack An instance must be assigned at least 1 layer Which chef layers run are determined by the layer the instance belongs to Preconfigured layers:

App DB LB Caching

Domain 4.0: Network Design for a complex large scale deployment (10% of Exam)

4.1 Demonstrate ability to design and implement networking features of AWS

4.2 Demonstrate ability to design and implement connectivity features of AWS

When you create a NAT instance, don’t forget to disable source/destination checks! VPC Peering (http://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html) :

Connecting 2 VPCs within a single region Transitive peering is not supported (on purpose) Up to 50 peers can be created (soft limit, contact amazon to go up to 125) Allows you to route traffic between the peer VPCs using private IP addresses; as if they

are part of the same network. Can’t have matching or overlapping CIDR blocks A placement group can span peered VPCs, but you won’t get full bandwidth between

instances Can’t reference a security group from peer VPC as source/destination for ingress/egress

rules. Instead use CIDR blocks Private DNS values cannot resolve between instances in peered VPCs (use private IP

addresses instead) How to setup:

Local VPC owner sends request to remote VPC owner Remote VPC owner has to accept Local VPC adds route out to route table Remote VPC adds route back to their route table Security Groups & NACLs in both VPCs have to allow traffic

AWS Direct Connect (https://aws.amazon.com/directconnect/faqs/) : https://youtu.be/SMvom9QjkPk Can be partitioned into multiple virtual interfaces (VIFS)

use the same connection to access public IP address space (EC2, DynamoDB, & S3) via public VIFs, and private resources (internal IP addresses) via private VIFs

Reduce costs when dealing with large volumes of traffic Increase reliability & bandwidth Available in 10Gbps, 1Gbps and sub-1Gbps (through Direct Connect Partners) Uses 802.1Q Ethernet VLAN trunking Is not redundant:

You can add redundancy by having 2 connections (2 routers, 2 direct connects) or by having a site-to-site VPN in place (using BGP for failover)

http://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html

https://aws.amazon.com/directconnect/faqs/

https://youtu.be/SMvom9QjkPk

pg. 13

SKILLCERTPRO

Layer 2 connections are not supported When using a VPN to connect to a VPC, you need 2 anchor points:

Customer Gateway (CGW): physical or software appliance Virtual Private Gateway (VPG): anchor on the AWS side

In US only, just need 1 Direct Connect to connect to all 4 US regions. Recommended best practice for an HA solution is to use either 2 DXs or 1 DX and 1 VPN. Does not support jumbo frames External Border Gateway Protocol (eBGP) for routing

HPC (https://aws.amazon.com/hpc/) Batch processing with large, compute intensive workloads Demands high CPU, Storage & networking requirements Usually requires jumbo frames as well (MTU 9000)

Enhanced networking (https://aws.amazon.com/ec2/faqs/) is available using SR-IOV on supported instance types: C3, C4, D2, I2, M4, R3 Must be HVM, not PV (paravirtual) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Placement Groups (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) Don’t span AZs, 1 PG = 1 AZ Can span subnets (in same AZ)

only certain supported instances (C3, C4, D2, I2, M4, R3) existing instances can’t be moved into a PG Amazon prefers homogenous instance types, but you can mix

Greater likelihood that launch will succeed Best practice is to size PG for peak load & launch all instances @ the same time as there

may not be sufficient capacity in the AZ to add extra instances later on. Elastic Load Balancers (https://aws.amazon.com/elasticloadbalancing/):

Region wide load balancer Can be used internally or externally Can do SSL termination and processing Cookie based sticky session (session

affinity) http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html ELB routing the same client to the same application server AWS best practice dictates using the DB instead of using ELB for sticky sessions so

that you don’t impact the user Integrates with auto-scaling ELB EC2 health checks (query a page) Integrate with CloudWatch

Advance metric Load balancing based on CloudWatch metrics (CPU, network usage, disk, etc.)

Integrate with Route 53 (Cloud based DNS load balancing) Supported ports:

25 (SMTP) 80/443 1024-65535

Can’t assign Elastic IPs to an ELB IPv4 & IPv6 supported (but VPCs don’t support IPv6 currently) Can load balance to zone apex of domain name

https://aws.amazon.com/hpc/

https://aws.amazon.com/ec2/faqs/

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html

https://aws.amazon.com/elasticloadbalancing/

http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html

http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html

pg. 14

SKILLCERTPRO

Can get history of ELB API calls by turning on CloudTrail (output to S3 bucket & inspect logs in bucket)

1 SSL certificate per ELB unless you have a wildcard cert. NAT instance vs NAT Gateway

Evaluate technical difference between the 2 for your needs, if you don’t need a specific item only supported by an instance, go gateway.

Instance Use a script to manage failover between instance Depends on the bandwidth of the instance type Managed by you Manual port forwarding Use a bastion server to manage Use CloudWatch to see traffic/alarms

Gateway Built in HA. Gateways in each AZ are implemented with redundancy Can burst up to 10Gbps Managed by AWS Optimized for handling NAT traffic Port forwarding not supported Bastion server not supported Traffic metrics not supported

Scaling NATs Scale UP

Increase instance size Choose instance family which supports enhanced networking (C3, C4, D2, I2, M4,

R3) Scale OUT

Add an additional NAT & subnet, then migrate ½ of workloads to new subnet Domain 5.0: Data Storage for a complex large scale deployment (15% of exam)

5.1 Demonstrate ability to make architectural trade off decisions involving storage options

5.2 Demonstrate ability to make architectural trade off decisions involving database options

5.3 Demonstrate ability to implement the most appropriate data storage architecture

5.4 Determine use of synchronous versus asynchronous replication

Optimizing S3 – use parallelization for both PUTs and GETs Optimizing for PUTS: https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/

Parallelizing: Divide your files into small parts & upload those parts simultaneously If 1 part fails, it can be restarted Moves bottleneck to the network itself, helps to increase aggregate

throughput 25-50MB file size chunks on high bandwidth networks 10MB file size chunks on mobile networks

Optimizing for GETS: Use CloudFront

Multiple endpoints globally

https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/

pg. 15

SKILLCERTPRO

Low latency High xfer speeds available Caches objects from S3 2 flavors:

RTMP Web

Use range-based GETs to get multithreaded performance (http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html) Using the Range HTTP header in a GET request, allows you to retrieve a

specific range of bytes in an object stored in S3 Allows you to send multiple GETs at once Compensates for unreliable network performance Maximizes bandwidth throughput

S3 is Lexicographical (stored in dictionary order A-Z) The more random you make your file structure within a particular bucket, the

better performance you get from S3. Really applies to super large buckets

Once you turn on versioning, you can’t turn it off. You can only suspend it. Securing S3:

Use bucket policies to restrict deletes You can also use MFA Delete (which is exactly what it sounds like, you need creds

and google auth to delete anything) Versioning does not protect you against deleting a bucket:

Backup your bucket to another separate S3 bucket owned by a different account

Database Design Patterns (http://media.amazonwebservices.com/AWS_Storage_Options.pdf) ß read the anti-patterns for DBs Multi-AZ vs Read Replicas

Multi-AZ Used for DR only, not for scaling Synchronous replication

Read Replica Used for scaling out, not DR Asynchronous replication

RDS Use Cases Ideal for existing apps that rely on MySQL, Oracle, SQL, PostgreSQL, MairaDB &

Aurora Amazon RDS offers full compatibility & direct access to native DB engines. Most

code, libs & tools designed to play with these DBs should work unmodified w/ Amazon RDS

Optimal for new apps with structured data that requires more sophisticated querying & joining than can be provided by Amazon’s NoSQL offering: DynamoDB

ACID = RDS Atomicity – in a transaction with 2 or more discrete pieces of info, all data is

committed or none is Consistency – a transaction either creates a new valid state of data, or if any

failure occurs, returns all data to state before transaction occurred Isolation – a transaction in process but not committed remains isolated from

other transactions

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html

http://media.amazonwebservices.com/AWS_Storage_Options.pdf

pg. 16

SKILLCERTPRO

Durability – data is available in correct state even in the event of a failure & system restart

When NOT to use RDS Index & query focused data – use DynamoDB Numerous Binary Large Objects – BLOBs (audio files, videos, images) Automated scalability (RDS good for scaling UP, DynamoDB good for scaling

OUT) – Use DynamoDB Other database platforms (IBM DB2, Informix, Sybase) – Use EC2 If you need complete, OS level control of the DB server with full root admin

– use EC2 DynamoDB Use Cases

Existing or new applications that need a flexible NoSQL DB with low read/write latencies

The ability to scale storage & throughput up & down w/out code changes or downtime

Common use cases: Mobile apps Gaming Digital ads Live voting Audience interaction for live events Sensor networks Log ingestion Access control for web based content Metadata storage for S3 objects E-comm shopping carts Web session mgmt.

If you need to automatically scale your DB, think DynamoDB Where NOT to use DynamoDB

Apps that need traditional relational DB Joins and/or complex transactions BLOB data – use S3, however use DynamoDB to keep track of metadata Large data w/ low I/O rate – again use S3

Domain 6.0: Security (20% of exam)

6.1 Design information security management systems and compliance controls

6.2 Design security controls with the AWS shared responsibility model and global infrastructure

6.3 Design identity and access management controls

6.4 Design protection of Data at Rest controls

6.5 Design protection of Data in Flight and Network Perimeter controls

AWS Directory Services (3 flavors) – https://aws.amazon.com/directoryservice/faqs/ AD connector:

Essentially a custom federation proxy connects to your existing MS AD structure

https://aws.amazon.com/directoryservice/faqs/

pg. 17

SKILLCERTPRO

once connected, end users use existing corporate creds to log into AWS applications

existing security policies can be enforced consistently password expiration/history account lockouts

supports MFA, can use to integrate with existing RADIUS-based MFA infrastructure no information is cached on AWS (unless you have an AD server on AWS

obviously) availability is tied to your networking – if connection goes down, you lose AD

capabilities Comes in 2 sizes:

Small (up to 500 users) Large (up to 5000 users)

Manage AWS resources via IAM role-based access on the console Simple AD:

Used for new AD deployments Managed directory powered by Samba 4 Active Directory compatible server MFA not supported Only 2 domain controllers (in 2 different AZs)

Can’t add additional domain controllers No forest/domain trust relationships Can’t transfer FSMO roles Can domain-join EC2 instances Provides Kerberos based SSO No LDAP-S support Schema extension not supported Comes in 2 sizes:

Small (up to 500 users) Large (up to 5000 users)

Microsoft AD AWS managed (Nothing to install, AWS handles patching & updates) Powered by windows 2012 R2 Supports up to 50k users (200k directory objects) Run directory aware windows workloads Create trust relationships between MS AD domains in AWS cloud & on-prem. Deployed across multiple AZs Monitors & automatically detects/replaces failed DCs Data replication & automated daily snaps are configured out of the box

Security Token Service (STS) http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html Grants users limited & temporary access to AWS resources. Users can come from 3

sources: Federated temporary access to AWS resources (typically Active Directory)

Uses SAML 2.0 User doesn’t need to be in IAM, grants access based on AD creds SSO allows user to log into AWS console w/out assigning IAM creds

Federation w/ Mobile Apps Use FB/Amazon/Google or other OpenID providers to log in

Cross account access Let’s users from one AWS account access resources in another

E.g. Users from “Dev” account can access resources in “Prod” account

http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html

pg. 18

SKILLCERTPRO

Steps to create cross account access In dev account create a user & group Copy account ID Switch to prod – create new role “DevAccess” Assign permissions (policies) to new role Copy role ARN On Dev account create new policy using policy generator, service –

AWS Security Token Service Use ARN from DevAccess role In dev account attach new policy to group Use IAM user sign-in link from prod account to sign in Dev user, paste

role switch URL into browser to switch over to prod Steps for setting up SSO for App X:

Employee logs into application (enters username/pwd) App X calls an Identity Broker The Identity Broker checks with LDAP to confirm that the user account is

valid ID Broker initiates a call to the AWS Security Token Service (STS). The call

must include an IAM policy and a duration (1 to 36 hours), along with a policy that specifies the permissions to be granted.

STS will (on validation) return back 4 values: Access Key Secret Access Key The Token The duration (15 min – 12 hr default – 36 hr max)

ID Broker returns the token to the application App X uses the token to call to AWS resource (say S3) S3 uses IAM to verify credentials IAM verifies credentials & gives S3 the go-ahead to perform the operation

Terminology: Federation = joining lists of users from 1 directory service to another (AD,

IAM, FB, Google) Identity Broker = trusted 3rd party broker that you can use to federate

multiple directories Identity Store = list of users Identities = users

Monitoring your Network CloudTrail https://aws.amazon.com/cloudtrail/faqs/

Retrieve a history of API calls and other events for all regions in your account Can be enabled on a per region basis

Use Cases: Security Analysis Track & Monitor changes to AWS resources (great in conjunction with AWS

Config) Compliance Aid Troubleshoot operational issues (answers who, what, when, where)

Not enabled by default Recorded info includes:

Identity of the API caller Time of the API call Source IP of the API caller

https://aws.amazon.com/cloudtrail/faqs/

pg. 19

SKILLCERTPRO

Request parameters Response elements returned by the AWS service

Delivers logs to an S3 bucket in JSON format Configured on a per region basis & can include global services

Logs from different regions can be sent to the same S3 bucket Calls & events made by the AWS mgmt. console, command line tools, AWS SDKs,

or other AWS services (i.e. CloudFormation) Used for auditing and collecting a history of API calls, NOT a logging service Can integrate with SNS, CloudWatch & CloudWatch Logs to send notifications

when a specific API event occurs CloudWatch https://aws.amazon.com/cloudwatch/faqs/

Monitoring service for AWS cloud resources & the applications running on AWS Gain system wide visibility into resource util Collect/track/monitor metrics/logs, and set alarms. Automatically react to changes in AWS resources Can monitor:

EC2 instances DynamoDB tables RDS DB instances Custom metrics generated by apps & services Any log file an application can generate Operational health Application performance

By default, CloudWatch Logs will store log data indefinitely You can change retention for each log group @ any time CloudWatch Alarm history stored for 14 days CloudTrail logs can be sent to CloudWatch Logs for real-time monitoring

Can use CloudWatch Log metric filters to eval for specific terms, phrases or values in CloudTrail logs

You can monitor events & ship those logs to CloudWatch Central logging system (Splunk, SumoLogic, etc) Script your own logs & store on S3 Avoid storing logs on non-persistent storage

Root device volume on EC2 instance Ephemeral storage Best answers are usually S3 or CloudWatch Logs

CloudTrail can be used across multiple accounts in a single S3 bucket (w/ cross account access)

CloudWatch Logs can be used across multiple accounts (w/ cross account access) VPC Flow Logs

Can be turned on at the VPC/Subnet/EC2Instance level Can be filtered for all/accepted/rejected traffic Can dump logs to CloudWatch for alerting

Cloud Hardware Security Modules (HSM’s) https://aws.amazon.com/cloudhsm/ Physical device that safeguards and manages digital keys Used to protect high value keys Onboard secure cryptographic key generation, storage & mgmt. Offloading app servers for asymmetric and symmetric cryptography Upfront fee for each device & hourly cost afterwards Single tenant (1 physical device per customer)

https://aws.amazon.com/cloudwatch/faqs/

https://aws.amazon.com/cloudhsm/

pg. 20

SKILLCERTPRO

Must be used within a VPC Can use VPC peering to connect to a CloudHSM Can use EBS volume encryption, S3 object encryption & key mgmt. w/ HSM, but this

requires custom scripting If you need FT, you need to get 2 devices & build a CloudHSM cluster Can integrate with RDS (Oracle & SQL) as well as Redshift Monitor via Syslog

DDoS overview & mitigation https://d0.awsstatic.com/whitepapers/Security/DDoS_White_Paper.pdf ß read twice! How to Mitigate DDoS?

Minimize the attack surface – fewer/hardened entry points

Place instances inside of private subnets where possible Bastion hosts w/ white list IP accessability WAF = Web Application Firewall – L7 Firewall Use ELB/CloudFront to distribute load to apps Multi-Tier app architecture provides layered protection against attacks

Scale to absorb the attack Design your infra to scale as needed

Auto Scaling (for both WAFs and Web Servers) CloudFront Route 53 ELBs WAFs CloudWatch

Horizontal Vertical

Safeguard exposed resources when you can’t eliminate Internet entry points. 3 services that can help with this: CloudFront

Built in ability to absorb and deter DDoS attacks Solves UDP and SYN flood DDoS attacks Geo Restrictions – whitelists/blacklists Origin Access ID

Route 53 Alias Record Sets to redirect traffic to CloudFrount or different ELB or

“DDoS resilient environment” Private DNS

WAFs (Web Application Firewall) – controls input & shows what the traffic is doing and where it is coming from. Many WAFs have built in IDS Solves DDoS attacks that happen @ the app layer Filters traffic and can ID/prevent injection attacks DDoS mitigation Malware protection Dataloss prevention Detect suspicious activity and block/report “WAF sandwich”

Learn normal behavior (IDS/WAFs) Allows you to ID abnormal behavior faster (duh)

Create a plan for attacks:

https://d0.awsstatic.com/whitepapers/Security/DDoS_White_Paper.pdf

pg. 21

SKILLCERTPRO

Validate design of architecture in the event of the different attacks Know what techniques to employ when you get attacked Know who to contact when you get attacked

Amplification/Reflection Attacks NTP, SSDP, DSN, SNMP, Chargen Attacker sends a 3rd party server a spoofed IP request.

Application Attacks (L7) Flood of GET requests Slowloris attack

IDS & IPS IDS inspects all inbound/outbound network traffic & IDs suspicious patterns that can

indicate a network or system attack Monitors only, isn’t proactive

IPS is a network threat prevention technology that examines network traffic to detect & prevent exploitation of systems.

IDS/IPS appliance usually sits in public subnet w/ agents installed on all EC2 instances it is monitoring. Feeds data either to SOC or S3 bucket.

Domain 7.0: Scalability and Elasticity (15% of exam)

7.1 Demonstrate the ability to design a loosely coupled system

7.2 Demonstrate ability to implement the most appropriate front-end scaling architecture

7.3 Demonstrate ability to implement the most appropriate middle-tier scaling architecture

7.4 Demonstrate ability to implement the most appropriate data storage scaling architecture

7.5 Determine trade-offs between vertical and horizontal scaling

CloudFront https://aws.amazon.com/cloudfront/faqs/ Can be used to deliver dynamic, static, streaming, and interactive content of a website using a

global network of edge locations Requests for content are automatically routed to nearest edge location for best possible

performance Is optimized to work with other AWS like S3, EC2, ELB & Route 53 Key Concepts:

2 Distribution Types Web Distributions RTMP Distributions

Geo Restrictions (Geo Blocking) Whitelist or Blacklist by country Done by either API or console Blacklisted viewer will see a HTTP 403 error Can create custom error pages

Support for GET, HEAD, POST, PUT, PATCH, DELETE & OPTIONS CloudFront doesn’t cache responses to POST, PUT, DELETE or PATCH requests –

these requests are proxied back to the origin server SSL configs – can use either HTTP or HTTPS with CloudFront. Can use either default

CloudFront URL or a custom URL with your own certificate. If you go with custom URL: Dedicated IP custom SSL:

https://aws.amazon.com/cloudfront/faqs/

pg. 22

SKILLCERTPRO

Dedicated IP addresses to server your SSL content @ each CloudFront edge location.

Expensive $600 per certificate per month per endpoint Supports older browsers

SNI (Server Name Indication) Custom SSL: Relies on SNI extension of Transport Layer Security protocol Allows multiple domains to serve SSL traffic over same IP address by

including the hostname browsers are trying to connect to Does not support older browsers

Wildcard CNAME supported Up to 100 CNAME aliases to each distribution

Invalidation If you delete a file from your origin, it will be deleted from edge locations when

that file reaches its expiration time (as defined in the objects HTTP header) You can proactively remove ahead of expiration time using the Invalidation API to

remove an object from all CloudFront edge locations Use in the event of offensive or potentially harmful material Call an invalidation request You do get charged for it

Zone Apex Support You can use CloudFront to deliver content from the root domain, or “zone apex”

of your website. For example, you can configure both http://www.example.com and

http://example.com to point at the same CloudFront distribution, without the performance penalty or availability risk of managing a redirect service.

To use this feature, you create a Route 53 Alias record to map the root of your domain to your CloudFront distribution.

Edge caching – Dynamic Content Support CloudFront supports delivery of dynamic content that is customized or

personalized using HTTP cookies. To use this feature, you specify whether you want Amazon CloudFront to forward

some or all your cookies to your origin server. CloudFront then considers the forwarded cookie values when identifying a unique

object in its cache. Get both the benefit of content that is personalized with a cookie and the

performance benefits of CloudFront. You can also optionally choose to log the cookie values in CloudFront access logs.

ElastiCache https://aws.amazon.com/elasticache/faqs/ Memcached vs

Redis http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/SelectEngine.Uses.html

https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf Lazy Loading

App tries to get data from cache, if no data avail cache returns null. App gets data from DB, app then updates cache only requested data is in cache node failures don’t matter as request simply goes back to DB again

Write Through Cache is updated when data is written to DB (each write is 2 steps, 1 to DB, 1 to cache)

https://aws.amazon.com/elasticache/faqs/

http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/SelectEngine.Uses.html

http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/SelectEngine.Uses.html

https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf

pg. 23

SKILLCERTPRO

ensures data is never stale good for apps that don’t have a lot of writes infrequently accessed data gets stored in cache (bad) if node is spinning up it could miss writing & cause missing data (bad)

Use Memcached if the following apply to your situation: Does not manage it’s own persistence (relies on DB to have the most recent data) Can be run in a cluster of nodes Can’t backup clusters – goes back to DB to repopulate You need the simplest model possible. You need to run large nodes with multiple cores or threads (Multithreaded

performance). You need the ability to scale out/in, adding and removing nodes as demand on your

system increases and decreases (Horizontal scaling). You need to partition your data across multiple shards. Can populate with both Lazy Loading and Write Through great solution for storing “session” state, making web servers stateless which allows for

easy scaling Use Redis 2.8.x or Redis 3.2 (non-clustered mode) if the following apply to your situation:

You need complex data types, such as strings, hashes, lists, sets, sorted sets, and bitmaps.

You need to sort or rank in-memory data-sets. You need persistence of your key store. You need to replicate your data from the primary to one or more read replicas for read

intensive applications. You need automatic failover if your primary node fails (Multi-AZ). You need publish and subscribe (pub/sub) capabilities—to inform clients about events

on the server. You need backup and restore capabilities. You need to support multiple databases.

Use Redis 3.2 (clustered mode) if you require all the functionality of Redis 2.8.x with the following differences: You need to partition your data across 2 to 15 node groups. (Cluster mode only.) You need geospatial indexing. (clustered mode or non-clustered mode) You do not need to support multiple databases. Redis (cluster mode enabled) cluster mode has the following limitations:

No scale up to larger node types. No changing the number of node groups (partitions). No changing the number of replicas in a node group (partition).

Kinesis Streams https://aws.amazon.com/kinesis/streams/faqs/ Enables you to build custom applications that process or analyze streaming data for

specialized needs. You can continuously add various types of data such as clickstreams, application logs, and

social media to a Kinesis stream from hundreds of thousands of sources. Within seconds, the data will be available for your Kinesis Applications to read and process

from the stream. Data in Kinesis is stored for 24 hours by default, can increase to 7 days Kinesis Streams is not persistent storage, use S3, Redshift, DynamoDB, EMR etc. to store

processed data long term Synchronously replicates streaming data across 3 AZs When would you use Kinesis?

https://aws.amazon.com/kinesis/streams/faqs/

pg. 24

SKILLCERTPRO

Gaming – collect data like player actions into gaming platform to have a reactive environment based off real-time events

Real-time analytics Application alerts Log/Event Data collection Mobile data capture

Key Concepts: Data Producers (e.g. EC2 Instances, IoT Sensors, Clients, Mobile, Server)

Kinesis Streams API PutRecord (single record) PutRecords (multiple records)

Kinesis Producer Library (KPL) simplifies producer application development, allowing developers to achieve

high write throughput to a Kinesis Stream Kinesis Agent

Java app that you can install on Linux devices Shards

Shard is the base throughput unit of an Amazon Kinesis stream One shard provides a capacity of 1MB/sec data input and 2MB/sec data output.

One shard can support up to 1000 PUT records per second. You will specify the number of shards needed when you create a stream. For example, you can create a stream with two shards. This stream has a

throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second.

Can dynamically add/remove shards from stream via resharding Data Records – the unit of data stored in an Amazon Kinesis stream

Sequence number – a unique identifier for each record Assigned by streams after you write to the stream with client.putRecord(s)

Partition Key – used to segregate and route records to different shards of a stream Used to group data by shard within a stream Stream service segregates data records belonging to a stream into multiple

shards Use partition keys associated w/ each data record to determine which shard

a given data record belongs to Specified by the app putting the data into a stream

Data (blob) – data your producer is adding to stream. Max size = 1MB Data Consumers (e.g. Amazon Kinesis Streams Applications)

Typically EC2 instances that are querying the Kinesis Streams Run analytics against the data & pass data onto persistent storage

SNS Mobile Push https://aws.amazon.com/sns/faqs/ Subset of SNS Push notifications can be sent to mobile devices and desktops using one of the following

services: Amazon Device Messaging (ADM) Apple Push Notification Service (APNS) Google Cloud Messaging (GCM) Windows Push Notification Service (WNS) for Windows 8+ and Windows Phone 8.1+ Microsoft Push Notification Service (MPNS) for Windows Phone 7+ Baidu Cloud Push for Android devices in China

Steps:

https://aws.amazon.com/sns/faqs/

pg. 25

SKILLCERTPRO

Request creds from mobile platform Request token from mobile platform Create platform application object Create platform endpoint object Publish message to mobile endpoint http://docs.aws.amazon.com/sns/latest/dg/mobile-push-pseudo.html

Domain 8.0: Cloud Migration and Hybrid Architecture (10% of exam)

8.1 Plan and execute for applications migrations

8.2 Demonstrate ability to design hybrid cloud architectures

VMware Integration AWS management portal for vCenter: https://aws.amazon.com/ec2/vcenter-portal/ Portal installs as a vCenter plug-in Enables you to migrate VMware VMs to EC2 & Manage AWS resources from within vCenter Use cases:

Migrate VMs to EC2 Reach new geographies from vCenter Self-Service AWS portal from within vCenter Leverage VMware experience while learning AWS

Migrating to cloud using Storage Gateway https://aws.amazon.com/storagegateway/details/ Can use storage gateway to migrate on-prem VMs to AWS Snaps must be consistent. Take VM offline before taking snap or use OS/App tool to flush to

disk Data Pipeline https://aws.amazon.com/datapipeline/ and http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html Data Pipeline is a web service that helps you reliably process and move data between different

AWS compute and storage services, as well as on-premises data sources, at specified intervals. Can create, access & manage using:

AWS mgmt. console CLI SDKs Query API

Supported compute services: EC2 EMR

Supported Services to store data: DynamoDB RDS Redshit

Can be extended on on-premises: AWS supplies a Task Runner package that can be installed on your on-premises hosts.

This package polls the Data Pipeline service for work to perform. When it’s time to run an activity, Data Pipeline will issue the appropriate command to the Task Runner.

With Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as: S3

http://docs.aws.amazon.com/sns/latest/dg/mobile-push-pseudo.html

https://aws.amazon.com/ec2/vcenter-portal/

https://aws.amazon.com/storagegateway/details/

https://aws.amazon.com/datapipeline/

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

pg. 26

SKILLCERTPRO

RDS DynamoDB Elastic MapReduce (EMR).

Pipeline – the resource that contains the definition of the dependent chain of data sources, destinations, and predefined or custom data processing activities required to execute your business logic. Contains the datanodes, activities, preconditions & schedules Can run on EC2 or EMR Consists of:

Task Runner – package continuously polls the AWS Data Pipeline service for work to perform. Installed in 1 of 2 ways: Installed automatically on resources that are launched and managed by the

Data Pipeline service Manually installed on a compute resource that you manage, such as a long-

running EC2 instance or an on-premises server Data node – The end destination for your data. a data node can reference a

specific Amazon S3 path. Data Pipeline supports an expression language that makes it easy to reference data which is generated on a regular basis For example, you could specify that your Amazon S3 data format is

s3://example-bucket/my-logs/logdata-#{scheduledStartTime(‘YYYY-MM-dd-HH’)}.tgz

Other examples: A DynamoDB table that contains data for HiveActivity or EmrActivity

to use A MySQL table and database query that represents data for a pipeline

activity to use A Redshift table that contains data for RedshiftCopyActivity to use

Activity – an action that AWS Data Pipeline initiates on your behalf as part of a pipeline. Example activities are EMR or Hive jobs, copies, SQL queries, or command-line scripts Data pipeline provides pre-packaged activities like:

Move/copy data from one location to another Run an EMR cluster Run a Hive query Copy data to/from Redshift tables Run a custom UNIX/Linux shell command as an activity Run a SQL query on a DB

Use ShellCommandActivity to specify custom activities Precondition – a readiness check (consisting of conditional statements that must

be true) that can be optionally associated with a data source or activity. This can be useful if you are running an activity that is expensive to

compute, and should not run until specific criteria are met (i.e. does data/table/S3 path/S3 file exist?)

Can specify pre-packaged preconditions and custom preconditions 2 types of preconditions:

System managed User managed

Schedule – when your pipeline activities run and the frequency with which the service expects your data to be available

Network Migrations CIDR Reservations:

pg. 27

SKILLCERTPRO

Biggest you can have is /16 Smallest you can have is /28 5 IP addresses are reserved per CIDR block, in a /24:

.0 = network address .1 = VPC router .2 = mapping to amazon provided DNS .3 = reserved for future use .255 = broadcast is not supported in a VPC, so AWS reserves this address

VPN to Direct Connect Migrations Most orgs start with VPN & then move to Direct Connect as traffic increases Once Direct Connect is installed, VPN and Direct Connect should be configured to be in

the same BGP community Then config BGP so that VPN has a higher cost than Direct Connect connection

Overall Summary:

Active Directory

SimpleAD

Microsoft Active Directory compatible directory from AWS Directory Service and

supports common features of an active directory.

Cannot connect to existing on-prem AD

AWS Directory Service for Microsoft Active Directory

Managed Microsoft Active Directory that is hosted on AWS cloud.

AD Connector

Proxy service for connecting your on-premises Microsoft Active Directory to the

AWS cloud.

WorkDocs

Can we used to share documents via AD directory services

Can define time duration or passcodes to access the document

API Gateway

pg. 28

SKILLCERTPRO

Lambda non-proxy integration flow

Method Request -> Integration Request -> Integration Response -> Method

Response

Maximum integration timeout for AWS API gateway is 29 seconds

If you want to change the default timeout for an integration request, uncheck Use Default

Timeout and change it to something else then 5 seconds

You can capture a response code and rewrite it to something custom via Gateway Responses

Athena

Serverless platform

Automatically executes queries in parallel

If asked whether to use Athena or Quicksite, look for a mention of whether the team has

experience with SQL. If they do, pick Athena

Aurora

Can replicate from an external master instance or a MySQL DB instance on AWS RDS.

Aurora serverless is best suited to situations where you can’t predict what traffic will be like

Backup

The following services can be backed up and restored using AWS Backup

EFS, DynamoDB, EBS, RDS, Volume Gateway

Batch

Configures resources, schedules when to run data analytics workloads

Suitable for running a bash script using a job

pg. 29

SKILLCERTPRO

Batch scheduler evaluates when / where / how to run jobs (no need for integration with

cloudwatch events to schedule)

Key components

Jobs: unit of work (script, exec, docker container)

Job Definitions: specifies how a job is run

Job Queues: Jobs submitted are added to queues

Compute Environment: compute resources that run jobs

If your Batch jobs are stuck in RUNNABLE state check:

Role assigned has adequate permissions

CPU and RAM given as per compute allocation

Check EC2 limits on the account

Beanstalk

No concept of programmable infrastructure / Git source. Can’t do infrastructure as code end

to end without other tooling.

Billing

Billing reports can be delivered to an S3 bucket

Consolidated billing is only available in master accounts (where there are children accounts

under organisations). These reports include activity for all child accounts

CloudFormation

Retain data for S3: Set DeletionPolicy on S3 resource to retain

Create RDS Snapshot on delete: Set RDS resource DeletePolicy to snapshot

There are three options for RDS DeletePolicy: Retain, Snapshot and Delete

To coordinate stack creations that rely on configuration to be executed on an EC2 you

should use the CreationPolicy attribute under the wait condition.

pg. 30

SKILLCERTPRO

If you need to reference AZ info within CloudFormation templates you can make use of the

Fn::GetAZs function

Launching EC2 instances with CloudFormation requires IAM permissions to be provided to

the person creating the stack

Intrinsic functions can be used in Properties, Outputs, Metadata attributes and update policy

attribute

CloudFront

Managed content delivery network (CDN)

S3 Transfer Acceleration can be used to distribute S3 content more efficiently globally

Origin Access Identity can be used to grant access to objects in s3 without having to give a

bucket public access.

Different HTTP methods for CloudFront forwarding and there’s uses:

GET, HEAD: You can use CloudFront only to get objects from your origin or to get

object headers.

GET, HEAD, OPTIONS: You can use CloudFront only to get objects from your origin,

get object headers, or retrieve a list of the options that your origin server supports.

GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE: You can use CloudFront to get,

add, update, and delete objects, and to get object headers. In addition, you can

perform other POST operations such as submitting data from a web form.

Support for common content types such as:

Static content (S3 website or web assets)

Live events (streaming video)

Media content (HLS)

TTL can be changed on CloudFront to deliver new content immediately when it changes

CloudWatch

Cron event to trigger lambda

pg. 31

SKILLCERTPRO

CloudWatch Events -> Create Rule

provide a valid expression (00 08 * ? )

Provide a target(s)

Can trigger a number of different services like Lambda, SNS, SQS, CodeBuild etc.

Cross-region dashboards are a thing that work so if you need to display metrics from

different regions on one dashboard, it is possible

Data Migration Service

Suitable for migrating databases like MySQL to Aurora or RDS

Data migrated is encrypted with KMS

By default is uses the AWS managed aws/dms key, or a custom managed key (CMK)

can be provided

DMS input stream can be throttled to accommodate downstream systems that can’t ingest

at full speed.

Ingesting data to elasticsearch where indexing queue fills up

DirectConnect

Link aggregation groups (LAG) can bond DirectConnects together

Direct connect to a VPC provides access to all AZ’s

Maximum number of DirectConnect instances in a LAG is 4

All must contain the same bandwidth

Troubleshooting Direct Connect

Confirm no firewalls are blocking TCP 179 (or ephemeral ports)

Confirm the ASNs match on both sides

DynamoDB

Supports autoscaling

https://aws.amazon.com/blogs/aws/cross-account-cross-region-dashboards-with-amazon-cloudwatch/

pg. 32

SKILLCERTPRO

When defining primary keys, use the many to few concept

Supported CloudWatch metrics

ProvisionedWriteCapacityUnits

ProvisionedReadCapacityUnits

ConsumedWriteCapacityUnits

ConsumedReadCapacityUnits

You cannot configure On Demand for read and Provisioned for write separately. It has to be

used for one or both

Attribute called EXPIRE can provide a way of doing TTL on items in a dynamodb table

Glue

Containers crawlers that connect to

S3, JDBC and DynamoDB

Glue has a central metadata repository (data catalog)

Fully serverless ETL

EC2

High network performance

Cluster placement groups are recommended for applications that need low network

latency / high throughput between instances in the same group

You can detach secondary network instances when the instance is running or

stopped.

You can’t detach the primary

If you need a static MAC address, you have to create an ENI (where a random one will be

assigned). Then reattach that same ENI to different instances going forward.

There is no way to manually config a MAC address on AWS EC2

Reservable Instances can be pools among accounts in AWS Organisations

pg. 33

SKILLCERTPRO

If 3 t2.mediums are purchased, and 1 is used in 1 account, 2 more could be used in

other accounts within the organisation

Placement groups could suffer from capacity errors if you try to add new instances to the

group. It's recommended to relaunch these instances again to see if you get capacity. The

best thing to do is launch with the number of instances you are going to need in a placement

group at the start obviously.

To improve high network throughput make use of Single Root I/O Virtualization (SR-IOV)

In order to hibernate an EC2 instances you require the following

Instance root has to be EBS and not instance store

Instance cannot be in an Autoscaling group (or used by ECS)

Instance root volume must be large enough so RAM can be stored

Hibernation of instances must also be using a HVM AMI type.

It has to be enabled on creation of the EC2 instance as well as be supported on your

AMI

For specifying Drive letters on Windows instances make use of EC2Config

Lost your SSH keys? Two options

Stop the instance, detach the root volume, attach it as another volume to another

EC2, modify the authorized_keys file, move the volume back to the original instance,

start it.

Systems Manager Automation with AWSSupport-ResetAccess

AMI

You cannot create an AMI from an ec2 connected instance store

If you launch an AMI, the PEM keys will be removed, however the authorization keys will still

be on the instance.

You need to ensure that the AMI is launched with the same PEM key

Autoscaling

pg. 34

SKILLCERTPRO

AZRebalance will attempt to balance the number of instances in different availability zones.

When associating an ELB with an ASG the ASG gets awareness about unhealthy instances

(and can terminate)

EBS

When using an encrypted EBS volume the following data is encrypted:

Data at rest in the volume

Data moving between the volume and instance

Snapshots created from the volume

Volumes created from the snapshots

Snapshots can be created every 2, 3, 4, 6, 8, 12, 24 hours

Lifecycle policies help retain backups required for compliance / audits. Also deleted

unnecessary ones to save cost.

When using snapshots (if you don’t want downtime) don’t use RAID

Copies of snapshots with retention policies do not have policies carried over during copy.

In order to mount an EBS volume, it must be in the same AZ as the instance you are

mounting to

Root volume can be changed without stopping the instance provided its to gp2, iot1,

standard

sc1 or st1 cannot UNLESS they are non-root volumes (must be at least 500gb)

When an EBS volume has two tags, multiple lifecycle policies can run at the same time

Encrypted snapshots cannot be copied to non encrypted ones

Non encrypted snapshots can be encrypted when copying them using the --kms-key-id (with

your CMK)

EFS

Data is distributed across multiple availability zones which provides durability and

availability.

pg. 35

SKILLCERTPRO

Supports 2 throughput modes

Bursting throughput: uses burst credits to determine if the filesystem can burst

Provisioned throughput

Provides both in-transit and at-rest encryption using AWS KMS

Mount an EFS volume with encryption in transit by

Getting EFS id, create mount targets for EC2 instance, use the mount helper with

the -o tls flag

Does not support Windows-based clients

Storage Gateway / File Gateway is the recommendation if you need file store (using

SMB mount)

Load Balancers

When using Network Load Balancers Secure connections should be TCP 443 with targets also

using TLS (port 443)

When sticky sessions are needed, it's usually recommended to use ElastiCache to store

session state

You don’t want to bind a user to a particular instance under a load balancer

Requires code to retrieve session state from ElastiCache

If you need to get the client IP when using a Classic Load Balancer:

TCP: configure proxy protocol to pass the IP address in a new TCP header

HTTP: send the client IP in the x-forward-for header

Cross-zone load balancing can be enabled to spread requests across your AZ

If a static IP is needed with a load balancer, provision a NLB with an attached EIP

Application Load Balancers support SNI

Is able to deal with multiple SSL certificates per listener

ElastiCache

Redis

pg. 36

SKILLCERTPRO

Can only be upgraded, cannot be downgraded

IAM

AssumeRole can be secured down with an ExternalId

Flow for using a custom identity system

Custom identity broker app, this authenticates the user

Uses GetFederationToken API and passes a permission policy to get temp credentials

from STS

Alternatively can call AssumeRole API to get temp access using role-based access

instead

Kinesis

Ideal for real-time data ingestion

Kinesis Data Streams

Can store records in order and replay them in the same order later (up to 7 days)

Makes it ideal for financial transactions

Able to have multiple applications consume from the same stream concurrently.

Kinesis Video Streams

HLS can be used for live playback

Use GetHLSStreamingSessionURL and then use the resulting URL in the video player

of your choice

Content delivery typically leverages AWS Elemental MediaLive / MediaPackage and

CloudFront to distribute content globally

Can view either Live or archived video

pg. 37

SKILLCERTPRO

KMS

Two types of keys

Master keys: used directly to encrypt and decrypt up to 4 kilobytes of data and can

also protect data keys

Data keys: used to encrypt and decrypt customer data

If you are accessing a very large number of KMS encrypted files at a time there is a chance

you will hit the KMS encrypt request account limit. You might need to open a support case

to resolve

Grants in KMS

Dynamically / programmatically revoke a key after its use.

Better then changing roles / policies

Managed Blockchain

Supported frameworks include Hyperledger Fabric and Ethereum.

If you have members who would like to deploy their own blockchain networks they can use

the CloudFormation templates to support ECS clusters or EC2 instances

Migration Hub

AWS Discovery Agent can transmit to Migration hub, then Data exploration can be done in

Athena

Agentless migrations can only pull information like RAM or Disk I/O from VMware

If your OS isn’t supports for import, you can provide the details yourself via import template

Migration steps from VMware

Schedule migration job

Upload your VMDK and then convert it to and EBS snapshot

Create an AMI from the snapshot

pg. 38

SKILLCERTPRO

OpsWorks

Can be managed by CloudFormation AWS::OpsWorks::Stack

This can be part of a nested stack with a parent containing all the VPC, NAT Gateway

etc. resources

Lifecycle events:

Setup, Configure, Deploy, Undeploy, Shutdown

Handles autohealing of instances

Bluegreen style deployments can be accomplished by creating a new stack with identical

configuration

This can be used when making updates to AMIs

Process for deploying with AWS CodePipeline

Create stack, layer and instance in a OpsWorks Stack

Upload app code to bucket, then add your app to OpsWorks stack

Create a pipeline (run it), verify the app deployment in OpsWorks stack

Process for updating OpsWork stacks to the latest security patches

Run update dependencies stack command

Create new instances to replace the only ones

When you attach a load balancer to a layer

Deregisters currently registered instances

Re-registers layer instances when they come online (removes offline ones)

Handles the starting of routing requests to the registered instances

Organizations

You may only join one organization (even if you receive more than one invite)

Invitations expire after 15 days

To resend an invite, you must cancel the pending one, then create a new invitation

In order to move an account to a different OU you need the following permissions

organizations:MoveAccount

pg. 39

SKILLCERTPRO

organizations:DescribeOrganization

Accounts can be dragged into different OU’s, however OU’s can’t be dragged around to new

locations in the organization's structure.

Instead you must create new OU’s and then reassign any SCP’s you had inplace.

Then move the accounts to these OU’s again.

If you want to block access to unused services, check the IAM Activity for services (never

used, last used date) and base your blocks on this information

SCP’s can only be Deny (not allow)

Explicite denies will always overrule explicit allows

To apply WAF rules across an organization make use of AWS Firewall Manager

You cannot restrict a member account from the ability to change its root password or

manage MFA settings

Improve consolidated billing by also tagging resources

This will group expenses on the detailed billing report

To access a member account

Use sts:AssumeRole with OrganizationAccountAccessRole

The master account isn't impacted by SCPs

Redshift

Does not have read replicas

Queries cannot be paused in Redshift

Use redshift workload management groups

Priorities of these workloads can be assigned to these groups

Can create single node cluster via CLI (and in Console as of recently)

Using the RedshiftCommands.sql file from the Billing section of your account you can

analyse billing reports.

Redshift snapshots are a very expensive solution normally, so if cost is important, don’t

select anything to do with snapshots.

pg. 40

SKILLCERTPRO

Snapshots on redshift could be pointless too if you can repopulate all the data in the

cluster with S3 instead

RDS

When a primary DB instance fails in a multi-AZ deployment, the CNAME is changed from

primary to standby so there’s no need to change a reference to the other DB in code.

Multi-AZ replication is done synchronously

For redundant architectures Multi-AZ support is used

Read-replicas aren’t used for redundancy, they are used to improve performance.

If Encryption is enabled on the RDS instance that:

Encrypts the underlying storing

Defaults to also encrypting the snapshots as they are created

RDS does not support Oracle RAC

RDS Oracle can read/write from S3 directly.

Option groups should have a role with permissions to access S3

Feature S3_INTEGRATION

If there is an RDS update available that you aren’t ready to apply, you can Defer the updates

indefinitely until you are ready.

Read Replicas require access to backups for maintaining their read replica logs. This means if

you want to disable automatic backups you must remove all Read Replicas first.

RDS for Oracle

Supported backup / restore options

Oracle Import/Export

Oracle Data Pump Import/Export

RDS Snapshot / point in time recover

RDS VMware

pg. 41

SKILLCERTPRO

Manages:

Patching

Multi AZ configurations

Backups based on retention policies

Point-in-time restores (from on-prem or cloud backups)

Route53

Latency based routing

Redirect requests to nearest region

If you have issues with route53 not routing to ‘live’ hosts, check to make sure you have

“Evaluate Target Health” set to “Yes” on the latency alias. Same goes with HTTP health

checks on weighted resources.

Resolve two domains to one domain (test1.example.com, test2.example.com ->

test3.example.com)

CNAME for the records test1.example.com, test2.example.com to

test3.example.com

Resolve a DNS entry to an ALB

Alias record test3.example.com to ALB address

S3

Using the x-amz-server-side-encryption request header when making an API call will ensure

an object is server side encrypted (SSE)

If versioning is enabled on S3 after objects are already put in, those objects with have a

version ID of null

Referrer keys in a bucket policy can make sure requests to Objects come from a domain you

operate

INTELLIGENT_TIERING storing class is used to optimize storage costs automatically for you

pg. 42

SKILLCERTPRO

SQS

Message group ID can be used on FIFO delivery to ensure messages that belong to the same

message group are always processed one by one.

E.g. binding platform with multiple products, FIFO and a message group based on

the product being bid on

Dead-letter queues need to match the queue they are set up for. So a standard SQS queue

needs to use a standard dead-letter queue (not FIFO)

Systems Manager

Troubleshooting why you can’t Run commands on a SSM host

Check the latest SSM Agent is installed on the instances

Verify the instances has an IAM role that lets its talk to SSM API

Services that can have costs associated to them

On-Premises Instance Management: pay-as-you-go pricing

Parameter Store: calling API costs

System Manager Automation

Schedule log file copying from hosts

State Manager to run a script at a given time

Schedule in Maintenance Windows for the log file moves

Patch management can be applied to instances using the following methods

Tag key/value pairs that identify the resources

Patch groups, where a group requires a particular tag

Manual selection of the hosts to patch

VPC

You cannot create subnets with overlapping CIDR ranges, you’ll get an error on trying to

create.

pg. 43

SKILLCERTPRO

VPC subnets will have 5 reserved addresses

10.0.0.0: Network address.

10.0.0.1: Reserved by AWS for the VPC router.

10.0.0.2: Reserved by AWS. The IP address of the DNS server is the base of the VPC

network range plus two.

10.0.0.3: Reserved by AWS for future use.

10.0.0.255: Network broadcast address (but no broadcast supported).

When wanting to make changes to a DHCP option set, you must create a new one then

assiate it to your VPC replacing the old one.

Troubleshooting EC2 in VPC unable to talk to data-center over Direct connect?

Make sure route propagation to the Virtual Private Gateway (VGW) is setup

Make sure the IPv4 dest address that routes the traffic over the VGW as a prefix you

want to advertise

Sharing a SaaS product out via your VPC to customers can be done via AWS endpoint service

(PrivateLink) to other customers VPC’s

Customers need to use an interface VPC endpoint on their end.

Options for sharing an application running in a shared VPC within Organization

VPN between two VPCs

Use AWS Resource Access Manager to share subnets within the account

VPN

Creating a VPN connection requires the static IP of the customer gateway device

With dynamic routing type, a Autonomous System Number (ASN) is also required

An option for if you need Multicask in a VPC is to build a virtual overlay network

Create ENIs between subnets

Runs on the OS level on the instances in your VPC.

Endpoints

https://aws.amazon.com/articles/overlay-multicast-in-amazon-virtual-private-cloud/

pg. 44

SKILLCERTPRO

Provide a secure link to access AWS resources from a VPC

NAT Gateway

Used to communicate with the internet via a private subnet

Secure private resources like Databases and Application servers that shouldn’t have

public connectivity

X-Ray

Segments allow for detailed tracing

Annotations can help find specific areas of the application in the tracing records (isolate the

issues / impact area)

Miscellaneous

Below is a set of random pieces of information that didn't really need it own section.

IPS/IDS systems within VPC

Configure to listen / block suspected bad traffic in and out of VPC

The system could be Palo Alto networks

Monitors, alerts and filters on potential bad traffic sent in / out of VPC.

Reducing DDOS surface area

Remove non-critical internet entry IPs

Configure ELB to auto-scale

Rekognition CLI example for detecting faces

aws rekognition detect-faces

SAML identity provider in IAM

SAML metadata document from the identity provider

Create a SAML IAM identity provider in AWS

Configure the SAML Identity provider with relying party trust

pg. 45

SKILLCERTPRO

In Identity provider configure SAML assertions for auth response

AWS has its own ways of protecting customers from DDoS

If you are trying to flood a connection, or running a pentest you will likely find that

you’ll be blocked by AWS

You need to notify and have AWS grant you permission if you are running pentesting

jobs

Want to access Support Ticket API?

You need Business support plans

Alexa for Business

You can have Alexa devices perform tasks for staff (getting info for them, booking

meetings)

AWS Certified Architect Professional Master Cheat Sheet

Documents

Transcript of AWS Certified Architect Professional Master Cheat Sheet