AWS Certified Architect Professional Master Cheat Sheet
Transcript of AWS Certified Architect Professional Master Cheat Sheet
pg. 1
SKILLCERTPRO
AWS Certified Architect Professional Master Cheat Sheet Domain 1.0: High Availability and Business Continuity (15% of exam)
1.1 Demonstrate ability to architect the appropriate level of availability based on stakeholder requirements
1.2 Demonstrate ability to implement DR for systems based on RPO and RTO
1.3 Determine appropriate use of multi-Availability Zones vs. multi-Region architectures
1.4 Demonstrate ability to implement self-healing capabilities
Recovery Point Objective (RPO) – The acceptable amount of data loss measured in time. If your RPO is 1 hour and you have an event at 4pm, you have to restore data up to 3pm.
Recovery Time Objective (RTO) – The time it takes after a disruption to restore a business process to its service level. If your RTO is 1 hour and you have an event @ 4pm, you need to be back up and running
by 5pm. Benefits of using AWS for DR:
Min hardware required for data replication Pay as you use model Scalability Automate DR deployment (scripts, CFTs, etc…)
DRBD – https://aws.amazon.com/blogs/aws/redundant-disk/ AWS Services for DR (https://aws.amazon.com/disaster-recovery/) Read this: http://d36cz9buwru1tt.cloudfront.net/AWS_Disaster_Recovery.pdf
Multiple regions around the globe Storage
S3 – 11 9s & cross region replication Mission critical & primary data storage Redundantly stored on multiple devices across multiple facilities w/in a
region. Can use cross region replication to move from 1 to another
Glacier 3 hours or longer to recover a file (i.e. if your RTO is 15 min…)
Elastic Block Storage (EBS) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html Can create point-in-time snapshots of data volumes Use these snaps as the starting point of new EBS vols Protected long-term because they are stored on S3 Not automatic by default, must be scripted
AWS Import/Export Snowball (https://aws.amazon.com/importexport/) Import TO EBS, Glacier & S3 Can only export FROM S3
pg. 2
SKILLCERTPRO
If you export from S3 bucket with versioning turned on, only latest version is exported
Use cases: Cloud Migration DR Datacenter Decommission Content Distribution
Direct Connect AWS Storage Gateways
Can be deployed on prem (ESXi or Hyper-V) or as an EC2 instance Can schedule snapshots Can use with Direct Connect Can implement bandwidth throttling (good for remote sites) If multiple sites, need one in each location Networking ports: 443 externally. 80 (activation only), 3260 (iSCSI), and
UDP53 (DNS) internally. Encrypted using SSL in transit and AES-256 @ rest. Stores data as EBS snaps in S3 Gateway cached
iSCSI based block storage local storage is frequently accessed data infrequently accessed data stored in S3 if link to AWS goes down, you lose access to your data each volume can go up to 32TB, 32 volumes supported (i.e. 1PB of
data can be stored) need storage for local cache and an upload buffer can take point in time incremental snaps of volumes and store in S3 as
an EBS snapshot Gateway stored
iSCSI based block storage for when you need your entire data set locally each volume can go up to 16TB, 32 volumes supported (i.e. 0.5PB of
data can be stored) can take point in time incremental snaps snaps provide durable off-site backup to S3 as EBS snap use snap of gateway stored volume as starting point of new EBS
volume which you can attach to an EC2 instance Gateway VTL/VTS
Get rid of your physical tape library infrastructure Virtual Tape Library -> backed by S3
Instant retrieval 1500 virtual tapes (150TB)
Virtual Tape Shelf -> backed by Glacier Up to 24 hours to get a virtual tape back Unlimited tapes
need storage for local cache and an upload buffer Software supported:
NetBackup 7.x Backup Exec 2012-15 MS System Center 2012 Data Protection Mgr Veeam 7 & 8
pg. 3
SKILLCERTPRO
Dell NetVault 10 Compute
EC2 EC2 VM Import Connector – Virtual Appliance that works with vCenter to convert
your VMware VMs into EC2 instances in AWS. Networking
Route53 ELB VPC Direct Connect
DBs Supported HA for Databases:
Oracle – RAC & data guard SQL Server – Always on availability groups & SQL mirroring MySQL – Asynch replication
RDS https://aws.amazon.com/documentation/rds/ and https://aws.amazon.com/rds/faqs/ Snapshot data from 1 region to another Can have a read replica running in another region available for MySQL, MariaDB, PostgreSQL, and Amazon Aurora Automatic failover in case of:
An Availability Zone outage The primary DB instance fails The DB instance’s server type is changed The operating system of the DB instance is undergoing software
patching A manual failover of the DB instance was initiated using Reboot with
failover RDS Multi-AZ failover (synchronous replication only):
MySQL, Oracle & PostgreSQL use synch physical replication to keep standby up to date with primary
SQL uses SQL server native mirroring tech. ALWAYS uses synchronous* High Availability Backups are taken from secondary (avoids I/O suspension) Restores are taken from secondary (same reason) Is NOT a scaling solution (use read replicas for scaling)
Read Replicas (asynchronous replication only): Read heavy DB workloads (duh) Serve reads while source DB is unavailable (maintenance, I/O
suspension, etc..) Business reporting When creating a new read replica, if multi-AZ is not enabled the snap
is of the primary (~1 min I/O suspension). If multi-AZ is enabled snap is taken from secondary DB. Read replicas themselves cannot be multi-AZ
When created, given a new end point DNS address Can be promoted to its own standalone DB. Can have up to 5 read replicas MySQL ONLY can have:
read replicas in different regions
pg. 4
SKILLCERTPRO
read replicas of read replicas (further increases latency) Cannot snap or do automate backups of read replicas
DynamoDB Offers cross region replication via open source tool on
Github: https://github.com/awslabs/dynamodb-cross-region-library If application does NOT require Atomicity, Consistency, Isolation, Durability
(ACID) compliance, joins & SQL then consider DynamoDB rather than RDS (more on this in domain 5)
RedShift Snapshot data warehouse to be stored in S3 with in same region or copied
to another region Orchestration
CloudFormation ElasticBeanstalk OpsWork
DR Scenarios Backup & Restore $
Cheapest & most manual Longest RTO/RPO Select appropriate tool to backup data to AWS Ensure appropriate retention policy for data Ensure security measures are in place (encryption & access policies)
Pilot Light $$ Small, most critical core elements of systems in AWS. When you need recovery,
you can quickly provision a full scale production environment around the critical core
2 options for provisioning from a network perspective: Use pre-allocated IPs (& even MACs w/ ENIs) & assoc with instances when
invoking DR. Use ELB to distribute traffic to multiple instances, then update DNS to point
to AWS EC2 instance or to LB using CNAME. Everyone does this option J 1. Setup EC2 instance to replicate or mirror data 2. Have all supporting custom software available in AWS 3. Create & maintain AMIs of key servers where fast recovery is required 4. Regularly run/test/patch/update these servers
Warm Standby $$$ Scaled down version of a fully functional environment. Horizontal scaling is preferred over vertical scaling 1. Setup EC2 instance to replicate or mirror data 2. Create and maintain AMIs 3. Run app using minimal footprint 4. Patch/update these servers in line with prod To recover – scale up/out your AWS footprint, change DNS (or use Rt53
automated health checks) & consider autoscaling Multi-Site (active-active) $$$$
Most expensive and most automated Shortest RPO/RTO Runs on-site AND in AWS as active-active Use Rt53 to route traffic to both sites either symmetrically or asymmetrically.
Change DNS weighting to all AWS in the event of failure. Application logic potentially necessary in the event of site failure
pg. 5
SKILLCERTPRO
Automated Backups Services that have automated backups:
RDS Stored on S3 MySQL DB engine, only the InnoDB storage engine is supported MariaDB, only the XtraDB storage engine is supported Deleting a DB instance deletes all automated backups (manual snaps are not
deleted) Default retention period is one day (values are 0 – 35 days) Manual snap limits – 50 per region – does not apply to automated backups Restore allows you to change engine type (SQL Std to SQL Ent)
Elasticache (only Redis not Memcached) Redshift
Services that do NOT have automated backups EC2
Automate using CLI or Python Stored in S3 Snaps are incremental, charged for incremental space
Each snap still contains base data Domain 2.0: Costing (5% of exam)
2.1 Demonstrate ability to make architectural decisions that minimize and optimize infrastructure cost
2.2 Apply the appropriate AWS account and billing set-up options based on scenario
2.3 Ability to compare and contrast the cost implications of different architectures
EC2 types https://aws.amazon.com/ec2/ On-demand https://aws.amazon.com/ec2/pricing/on-demand/
Highest hourly rate, no commitments Ideal for auto scaling groups and unpredictable workloads Good for Dev/Test
Reserved: https://aws.amazon.com/ec2/pricing/reserved-instances/ Use cases:
Steady state, predictable usage Apps that need reserved capacity Upfront payments reduce hourly rate
Standard RI – get billed whether it’s powered on or off Use this when you expect to run the server 24/7 1 & 3 year contracts, 3 years cheapest There is a marketplace to sell RIs https://aws.amazon.com/ec2/purchasing-
options/reserved-instances/marketplace/ Payment options
All upfront (up to 68% off for 3-year term) Partial upfront (up to 60% off for 3-year term) No upfront (30% off for 1-year term)
Scheduled RI – Use it when you need a set amount of instances for a particular time slice
pg. 6
SKILLCERTPRO
Can only launch instances during that time slice If you launch outside of that window, you are billed on-demand Accrue charges hourly, but billed in monthly increments over term 1 year term commitment
Convertible Brand new, not on test yet Up to 45% off
Spot https://aws.amazon.com/ec2/spot/pricing/ Cheapest Flexible start and end times Grid computing and HPC Bidding type instance, if you are outbid:
Those instances drop with little notice Dedicated (2 types): https://aws.amazon.com/ec2/dedicated-hosts/faqs/
Host (most expensive) Instance (lose visibility into which host you are running on)
Modifying your RIs http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modifying.html Can switch AZs within same region Can change instance size within the same instance type Instance type modifications are supported, but only for Linux… but not RHEL or SUSE Cannot change instance size of Windows RIs See normalization chart for calculating footprint to modify a non-convertible
RI: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-modification-instancemove.html
RDS RI’s https://aws.amazon.com/rds/reserved-instances/ RI Based on 5 criteria:
DB Engine
pg. 7
SKILLCERTPRO
DB Instance Class Deployment Type License Model Region
Any of these 5 items change, RDS reverts to On demand How to configure cross account access.
Create any custom policies 1st Create role with cross account access Apply the policy to that role & note down the ARN Grant access to the role Switch to the role https://aws.amazon.com/blogs/aws/new-cross-account-access-in-the-aws-
management-console/ http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-
roles.html Why multiple AWS accounts?
Security Billing Growth through acquisition
Consolidated billing http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html Linked accounts that feed to one “Paying account” from 1 to 20 (or more with ticket)
“Linked Accounts” Paying account is independent & cannot access resources of linked accounts (and vice
versa) by default Easy to track charges and allocate costs Get volume discounts on all your accounts Unused reserve instances for EC2 are applied across the groups CloudTrail is on a per account and per region basis but can be aggregated into 1 bucket
in the paying account Tagging & Resource Groups
Great for sorting resources in a complex environment Can sort by multiple tag keys (prod, dev, test, app, whatever) By default, works within all regions, but can be filtered down to individual regions By default, works with all services, but can also be filtered Not all resource types support tagging, but you don’t need to know which for exam
Budgets and CloudWatch Alarms Used to track your current costs vs a set “budget” for a billing period Updated every 24 hours Does not show refunds Not automatically created by AWS Can be compared against AWS “estimated” costs to see how much budget is left over Must create budgets on the Payee (in the event of a consolidate billing scenario) account Can set alarms when you exceed actual or forecasted budgets, but you will still exceed.
It won’t stop chargeable services from running Domain 3.0: Deployment Management (10% of exam)
3.1 Ability to manage the lifecycle of an application on AWS
pg. 8
SKILLCERTPRO
3.2 Demonstrate ability to implement the right architecture for development, testing, and staging
environments
3.3 Position and select most appropriate AWS deployment mechanism based on scenario
CloudFormation https://aws.amazon.com/cloudformation/faqs/
Services supported http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-supported-resources.html#d0e194974 : API Gateway Auto scaling CloudFront CloudTrail CloudWatch CodeCommit CodeDeploy CodePipeline Data Pipeline DynamoDB EC2 ECS ElastiCache Elasticsearch Elastic Beanstalk Elastic Load Balancer (ELB) Elastic Map Reduce (EMR) GameLift Kinesis Lambda IAM (create roles, policies) IoT OpsWorks RDS Redshift Route53 S3 SimpleDB SNS SQS VPC Workspaces
Templates & Stacks: Templates:
Architectural designs Can create, update & delete templates Written in JSON or YAML Don’t need to figure out order/dependencies for provisioning AWS services.
CF takes care of that for you.
pg. 9
SKILLCERTPRO
AWS CloudFormation Design allows you to visualize your templates as diagrams & edit them using drag & drop interface.
Stacks: Deployed resources based on templates Can create, update & delete stacks using templates Can be deployed using AWS mgmt. console, CLI or APIs
CloudFormation Template (CFT) The blueprints for the house (in JSON or YAML format) The CloudFormation Stack is the actual house J Allows you to effectively apply version control to your AWS resources/infra Elements of the Template
File format & version number (mandatory) List of AWS resources and associated config values (mandatory) Template parameters (optional)
Input values that are supplied @ stack creation time Limit of 60
Output values (optional) Output values required once a stack has finished building (public IP,
ELB address, URL of completed web app, etc…) Limit 60
List of data tables (optional) Static config values (e.g. AMI names, Instance sizes, etc…)
Intrinsic Function Reference (Outputting Data) http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html : Fn::Base64 Condition Functions Fn::FindInMap Fn::GetAtt ß most likely to be on exam Fn::GetAZs Fn::ImportValue Fn::Join Fn::Select Fn::Sub Ref
Supports Chef & Puppet Integration: Deploy & configure down to the application layer Bootstrap scripts are supported:
Install packages, files, & services by describing them in the CFT Stack creation errors
By default, “automatic rollback on error” is enabled. You will be charged for resources that are provisioned, even if there is an error CF itself is free
Stacks can wait for applications: Provides a WaitCondition resource that acts as a gate, blocking creation of other
resources until a condition is satisfied. You can specify deletion policies:
Can specify that snaps be created of EBS vols or RDS DBs prior to deletion Can specify that a resource be preserved and not deleted when it’s stack is deleted
(e.g. S3 bucket)
pg. 10
SKILLCERTPRO
You can update a stack after it’s created Can be used to create Roles in IAM:
Then used to grant EC2 instances access to those roles Creation & Customization of VPCs
Can specify IP address ranges (CIDR as well as individual IP addresses for specific instances)
Can specify pre-existing EIPs VPC Peering:
Can create multiple VPCs inside a single template Can enable VPC peering, but only within the same AWS account
Route53: CF can create new hosted zones or update existing ones , A Records, Aliases, C
Names, etc… Elastic Beanstalk https://aws.amazon.com/elasticbeanstalk/faqs/ :
Overview: Integrates with VPC Integrates with IAM Can provision RDS instances Full control of resources Code is stored in S3 Multiple environments are supported (for versioning) Changes from Git repositories are replicated Linux & Windows Server 2012 R2 supported
Ideal for Devs with no AWS experience that need to deploy quickly. They load their code up to Elastic Beanstalk and it takes care of the rest. Capacity provisioning Load balancing Auto-scaling Application health monitoring
Supported languages: Java .NET PHP Node.js Python Ruby Go Docker Web apps
CloudFormation supports Elastic Beanstalk, but Elastic Beanstalk will not provision CFTs With it you can:
Select the operating system that matches your application requirements (e.g., Amazon Linux or Windows Server 2012 R2)
Choose from several available database and storage options Enable login access to Amazon EC2 instances for immediate and direct
troubleshooting Quickly improve application reliability by running in more than one Availability
Zone Enhance application security by enabling HTTPS protocol on the load balancer Access built-in Amazon CloudWatch monitoring and getting notifications on
application health and other important events
pg. 11
SKILLCERTPRO
Adjust application server settings (e.g., JVM settings) and pass environment variables
Run other application components, such as a memory caching service, side-by-side in Amazon EC2
Access log files without logging in to the application servers Ways you can provision to Elastic Beanstalk:
Upload deployable code Push Git repository AWS Toolkit for Visual Studio & Eclipse allows you to do it straight from IDE
Updating – you can push updates from Git and only the deltas are transmitted Application files and (optionally) server log files are stored in S3. Elastic Beanstalk can automatically provision an Amazon RDS DB instance. The
information about connectivity to the DB instance is exposed to your application by environment variables.
Multiple environments are allowed to support version control Designed to support multiple running environments, such as one for integration
testing, one for pre-production, and one for production. Each environment is independently configured and runs on its own separate AWS resources. Elastic Beanstalk also stores and tracks application versions over time, so an existing environment can be easily rolled back to a prior version or a new environment can be launched using an older version to try and reproduce a customer problem.
FT = multi-AZ but not multi-region Supports VPC Security:
By default, app is publicly available Can use a VPC to provision a private, isolated section of your app in a virtual
network. This virtual network can be made private through specific security group rules, NACLs, and custom route tables.
Supports IAM Opsworks https://aws.amazon.com/opsworks/faqs/
A config mgmt solution with automation tools that enable you to model/control your apps and their supporting infra. AWS OpsWorks makes it easy to manage the complete application lifecycle, including resource provisioning, config mgmt, app deployment, software updates, monitoring, and access control using Chef.
What is Chef? Automation platform that turns your infra into code Automates how apps are configured, deployed & managed Chef server stores your recipes & other config data
Chef client (node) is installed on each server, VM, Container or networking device
Client periodically polls Chef servers latest policy & state of network If anything is out of date, client remediates
Designed for IT admins and ops-minded devs who want a way to manage apps of nearly any scale and complexity without sacrificing control.
Create a logical arch, provision resources based on that arch, deploy your apps and all supporting software and packages in your chosen configuration, and then operate and maintain the app through lifecycle stages such as auto-scaling events and software updates.
Turns infra into code – infra becomes versionable, testable, and repeatable A GUI to deploy & config your infra quickly
pg. 12
SKILLCERTPRO
Consists of 2 elements, stacks & layers: Stack = group of resources Layer = a layer within the stack (i.e. load balancer layer, application layer, db layer,
etc…) 1 or more layers in a stack An instance must be assigned at least 1 layer Which chef layers run are determined by the layer the instance belongs to Preconfigured layers:
App DB LB Caching
Domain 4.0: Network Design for a complex large scale deployment (10% of Exam)
4.1 Demonstrate ability to design and implement networking features of AWS
4.2 Demonstrate ability to design and implement connectivity features of AWS
When you create a NAT instance, don’t forget to disable source/destination checks! VPC Peering (http://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html) :
Connecting 2 VPCs within a single region Transitive peering is not supported (on purpose) Up to 50 peers can be created (soft limit, contact amazon to go up to 125) Allows you to route traffic between the peer VPCs using private IP addresses; as if they
are part of the same network. Can’t have matching or overlapping CIDR blocks A placement group can span peered VPCs, but you won’t get full bandwidth between
instances Can’t reference a security group from peer VPC as source/destination for ingress/egress
rules. Instead use CIDR blocks Private DNS values cannot resolve between instances in peered VPCs (use private IP
addresses instead) How to setup:
Local VPC owner sends request to remote VPC owner Remote VPC owner has to accept Local VPC adds route out to route table Remote VPC adds route back to their route table Security Groups & NACLs in both VPCs have to allow traffic
AWS Direct Connect (https://aws.amazon.com/directconnect/faqs/) : https://youtu.be/SMvom9QjkPk Can be partitioned into multiple virtual interfaces (VIFS)
use the same connection to access public IP address space (EC2, DynamoDB, & S3) via public VIFs, and private resources (internal IP addresses) via private VIFs
Reduce costs when dealing with large volumes of traffic Increase reliability & bandwidth Available in 10Gbps, 1Gbps and sub-1Gbps (through Direct Connect Partners) Uses 802.1Q Ethernet VLAN trunking Is not redundant:
You can add redundancy by having 2 connections (2 routers, 2 direct connects) or by having a site-to-site VPN in place (using BGP for failover)
pg. 13
SKILLCERTPRO
Layer 2 connections are not supported When using a VPN to connect to a VPC, you need 2 anchor points:
Customer Gateway (CGW): physical or software appliance Virtual Private Gateway (VPG): anchor on the AWS side
In US only, just need 1 Direct Connect to connect to all 4 US regions. Recommended best practice for an HA solution is to use either 2 DXs or 1 DX and 1 VPN. Does not support jumbo frames External Border Gateway Protocol (eBGP) for routing
HPC (https://aws.amazon.com/hpc/) Batch processing with large, compute intensive workloads Demands high CPU, Storage & networking requirements Usually requires jumbo frames as well (MTU 9000)
Enhanced networking (https://aws.amazon.com/ec2/faqs/) is available using SR-IOV on supported instance types: C3, C4, D2, I2, M4, R3 Must be HVM, not PV (paravirtual) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html
Placement Groups (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) Don’t span AZs, 1 PG = 1 AZ Can span subnets (in same AZ)
only certain supported instances (C3, C4, D2, I2, M4, R3) existing instances can’t be moved into a PG Amazon prefers homogenous instance types, but you can mix
Greater likelihood that launch will succeed Best practice is to size PG for peak load & launch all instances @ the same time as there
may not be sufficient capacity in the AZ to add extra instances later on. Elastic Load Balancers (https://aws.amazon.com/elasticloadbalancing/):
Region wide load balancer Can be used internally or externally Can do SSL termination and processing Cookie based sticky session (session
affinity) http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html ELB routing the same client to the same application server AWS best practice dictates using the DB instead of using ELB for sticky sessions so
that you don’t impact the user Integrates with auto-scaling ELB EC2 health checks (query a page) Integrate with CloudWatch
Advance metric Load balancing based on CloudWatch metrics (CPU, network usage, disk, etc.)
Integrate with Route 53 (Cloud based DNS load balancing) Supported ports:
25 (SMTP) 80/443 1024-65535
Can’t assign Elastic IPs to an ELB IPv4 & IPv6 supported (but VPCs don’t support IPv6 currently) Can load balance to zone apex of domain name
pg. 14
SKILLCERTPRO
Can get history of ELB API calls by turning on CloudTrail (output to S3 bucket & inspect logs in bucket)
1 SSL certificate per ELB unless you have a wildcard cert. NAT instance vs NAT Gateway
Evaluate technical difference between the 2 for your needs, if you don’t need a specific item only supported by an instance, go gateway.
Instance Use a script to manage failover between instance Depends on the bandwidth of the instance type Managed by you Manual port forwarding Use a bastion server to manage Use CloudWatch to see traffic/alarms
Gateway Built in HA. Gateways in each AZ are implemented with redundancy Can burst up to 10Gbps Managed by AWS Optimized for handling NAT traffic Port forwarding not supported Bastion server not supported Traffic metrics not supported
Scaling NATs Scale UP
Increase instance size Choose instance family which supports enhanced networking (C3, C4, D2, I2, M4,
R3) Scale OUT
Add an additional NAT & subnet, then migrate ½ of workloads to new subnet Domain 5.0: Data Storage for a complex large scale deployment (15% of exam)
5.1 Demonstrate ability to make architectural trade off decisions involving storage options
5.2 Demonstrate ability to make architectural trade off decisions involving database options
5.3 Demonstrate ability to implement the most appropriate data storage architecture
5.4 Determine use of synchronous versus asynchronous replication
Optimizing S3 – use parallelization for both PUTs and GETs Optimizing for PUTS: https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/
Parallelizing: Divide your files into small parts & upload those parts simultaneously If 1 part fails, it can be restarted Moves bottleneck to the network itself, helps to increase aggregate
throughput 25-50MB file size chunks on high bandwidth networks 10MB file size chunks on mobile networks
Optimizing for GETS: Use CloudFront
Multiple endpoints globally
pg. 15
SKILLCERTPRO
Low latency High xfer speeds available Caches objects from S3 2 flavors:
RTMP Web
Use range-based GETs to get multithreaded performance (http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html) Using the Range HTTP header in a GET request, allows you to retrieve a
specific range of bytes in an object stored in S3 Allows you to send multiple GETs at once Compensates for unreliable network performance Maximizes bandwidth throughput
S3 is Lexicographical (stored in dictionary order A-Z) The more random you make your file structure within a particular bucket, the
better performance you get from S3. Really applies to super large buckets
Once you turn on versioning, you can’t turn it off. You can only suspend it. Securing S3:
Use bucket policies to restrict deletes You can also use MFA Delete (which is exactly what it sounds like, you need creds
and google auth to delete anything) Versioning does not protect you against deleting a bucket:
Backup your bucket to another separate S3 bucket owned by a different account
Database Design Patterns (http://media.amazonwebservices.com/AWS_Storage_Options.pdf) ß read the anti-patterns for DBs Multi-AZ vs Read Replicas
Multi-AZ Used for DR only, not for scaling Synchronous replication
Read Replica Used for scaling out, not DR Asynchronous replication
RDS Use Cases Ideal for existing apps that rely on MySQL, Oracle, SQL, PostgreSQL, MairaDB &
Aurora Amazon RDS offers full compatibility & direct access to native DB engines. Most
code, libs & tools designed to play with these DBs should work unmodified w/ Amazon RDS
Optimal for new apps with structured data that requires more sophisticated querying & joining than can be provided by Amazon’s NoSQL offering: DynamoDB
ACID = RDS Atomicity – in a transaction with 2 or more discrete pieces of info, all data is
committed or none is Consistency – a transaction either creates a new valid state of data, or if any
failure occurs, returns all data to state before transaction occurred Isolation – a transaction in process but not committed remains isolated from
other transactions
pg. 16
SKILLCERTPRO
Durability – data is available in correct state even in the event of a failure & system restart
When NOT to use RDS Index & query focused data – use DynamoDB Numerous Binary Large Objects – BLOBs (audio files, videos, images) Automated scalability (RDS good for scaling UP, DynamoDB good for scaling
OUT) – Use DynamoDB Other database platforms (IBM DB2, Informix, Sybase) – Use EC2 If you need complete, OS level control of the DB server with full root admin
– use EC2 DynamoDB Use Cases
Existing or new applications that need a flexible NoSQL DB with low read/write latencies
The ability to scale storage & throughput up & down w/out code changes or downtime
Common use cases: Mobile apps Gaming Digital ads Live voting Audience interaction for live events Sensor networks Log ingestion Access control for web based content Metadata storage for S3 objects E-comm shopping carts Web session mgmt.
If you need to automatically scale your DB, think DynamoDB Where NOT to use DynamoDB
Apps that need traditional relational DB Joins and/or complex transactions BLOB data – use S3, however use DynamoDB to keep track of metadata Large data w/ low I/O rate – again use S3
Domain 6.0: Security (20% of exam)
6.1 Design information security management systems and compliance controls
6.2 Design security controls with the AWS shared responsibility model and global infrastructure
6.3 Design identity and access management controls
6.4 Design protection of Data at Rest controls
6.5 Design protection of Data in Flight and Network Perimeter controls
AWS Directory Services (3 flavors) – https://aws.amazon.com/directoryservice/faqs/ AD connector:
Essentially a custom federation proxy connects to your existing MS AD structure
pg. 17
SKILLCERTPRO
once connected, end users use existing corporate creds to log into AWS applications
existing security policies can be enforced consistently password expiration/history account lockouts
supports MFA, can use to integrate with existing RADIUS-based MFA infrastructure no information is cached on AWS (unless you have an AD server on AWS
obviously) availability is tied to your networking – if connection goes down, you lose AD
capabilities Comes in 2 sizes:
Small (up to 500 users) Large (up to 5000 users)
Manage AWS resources via IAM role-based access on the console Simple AD:
Used for new AD deployments Managed directory powered by Samba 4 Active Directory compatible server MFA not supported Only 2 domain controllers (in 2 different AZs)
Can’t add additional domain controllers No forest/domain trust relationships Can’t transfer FSMO roles Can domain-join EC2 instances Provides Kerberos based SSO No LDAP-S support Schema extension not supported Comes in 2 sizes:
Small (up to 500 users) Large (up to 5000 users)
Microsoft AD AWS managed (Nothing to install, AWS handles patching & updates) Powered by windows 2012 R2 Supports up to 50k users (200k directory objects) Run directory aware windows workloads Create trust relationships between MS AD domains in AWS cloud & on-prem. Deployed across multiple AZs Monitors & automatically detects/replaces failed DCs Data replication & automated daily snaps are configured out of the box
Security Token Service (STS) http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html Grants users limited & temporary access to AWS resources. Users can come from 3
sources: Federated temporary access to AWS resources (typically Active Directory)
Uses SAML 2.0 User doesn’t need to be in IAM, grants access based on AD creds SSO allows user to log into AWS console w/out assigning IAM creds
Federation w/ Mobile Apps Use FB/Amazon/Google or other OpenID providers to log in
Cross account access Let’s users from one AWS account access resources in another
E.g. Users from “Dev” account can access resources in “Prod” account
pg. 18
SKILLCERTPRO
Steps to create cross account access In dev account create a user & group Copy account ID Switch to prod – create new role “DevAccess” Assign permissions (policies) to new role Copy role ARN On Dev account create new policy using policy generator, service –
AWS Security Token Service Use ARN from DevAccess role In dev account attach new policy to group Use IAM user sign-in link from prod account to sign in Dev user, paste
role switch URL into browser to switch over to prod Steps for setting up SSO for App X:
Employee logs into application (enters username/pwd) App X calls an Identity Broker The Identity Broker checks with LDAP to confirm that the user account is
valid ID Broker initiates a call to the AWS Security Token Service (STS). The call
must include an IAM policy and a duration (1 to 36 hours), along with a policy that specifies the permissions to be granted.
STS will (on validation) return back 4 values: Access Key Secret Access Key The Token The duration (15 min – 12 hr default – 36 hr max)
ID Broker returns the token to the application App X uses the token to call to AWS resource (say S3) S3 uses IAM to verify credentials IAM verifies credentials & gives S3 the go-ahead to perform the operation
Terminology: Federation = joining lists of users from 1 directory service to another (AD,
IAM, FB, Google) Identity Broker = trusted 3rd party broker that you can use to federate
multiple directories Identity Store = list of users Identities = users
Monitoring your Network CloudTrail https://aws.amazon.com/cloudtrail/faqs/
Retrieve a history of API calls and other events for all regions in your account Can be enabled on a per region basis
Use Cases: Security Analysis Track & Monitor changes to AWS resources (great in conjunction with AWS
Config) Compliance Aid Troubleshoot operational issues (answers who, what, when, where)
Not enabled by default Recorded info includes:
Identity of the API caller Time of the API call Source IP of the API caller
pg. 19
SKILLCERTPRO
Request parameters Response elements returned by the AWS service
Delivers logs to an S3 bucket in JSON format Configured on a per region basis & can include global services
Logs from different regions can be sent to the same S3 bucket Calls & events made by the AWS mgmt. console, command line tools, AWS SDKs,
or other AWS services (i.e. CloudFormation) Used for auditing and collecting a history of API calls, NOT a logging service Can integrate with SNS, CloudWatch & CloudWatch Logs to send notifications
when a specific API event occurs CloudWatch https://aws.amazon.com/cloudwatch/faqs/
Monitoring service for AWS cloud resources & the applications running on AWS Gain system wide visibility into resource util Collect/track/monitor metrics/logs, and set alarms. Automatically react to changes in AWS resources Can monitor:
EC2 instances DynamoDB tables RDS DB instances Custom metrics generated by apps & services Any log file an application can generate Operational health Application performance
By default, CloudWatch Logs will store log data indefinitely You can change retention for each log group @ any time CloudWatch Alarm history stored for 14 days CloudTrail logs can be sent to CloudWatch Logs for real-time monitoring
Can use CloudWatch Log metric filters to eval for specific terms, phrases or values in CloudTrail logs
You can monitor events & ship those logs to CloudWatch Central logging system (Splunk, SumoLogic, etc) Script your own logs & store on S3 Avoid storing logs on non-persistent storage
Root device volume on EC2 instance Ephemeral storage Best answers are usually S3 or CloudWatch Logs
CloudTrail can be used across multiple accounts in a single S3 bucket (w/ cross account access)
CloudWatch Logs can be used across multiple accounts (w/ cross account access) VPC Flow Logs
Can be turned on at the VPC/Subnet/EC2Instance level Can be filtered for all/accepted/rejected traffic Can dump logs to CloudWatch for alerting
Cloud Hardware Security Modules (HSM’s) https://aws.amazon.com/cloudhsm/ Physical device that safeguards and manages digital keys Used to protect high value keys Onboard secure cryptographic key generation, storage & mgmt. Offloading app servers for asymmetric and symmetric cryptography Upfront fee for each device & hourly cost afterwards Single tenant (1 physical device per customer)
pg. 20
SKILLCERTPRO
Must be used within a VPC Can use VPC peering to connect to a CloudHSM Can use EBS volume encryption, S3 object encryption & key mgmt. w/ HSM, but this
requires custom scripting If you need FT, you need to get 2 devices & build a CloudHSM cluster Can integrate with RDS (Oracle & SQL) as well as Redshift Monitor via Syslog
DDoS overview & mitigation https://d0.awsstatic.com/whitepapers/Security/DDoS_White_Paper.pdf ß read twice! How to Mitigate DDoS?
Minimize the attack surface – fewer/hardened entry points
Place instances inside of private subnets where possible Bastion hosts w/ white list IP accessability WAF = Web Application Firewall – L7 Firewall Use ELB/CloudFront to distribute load to apps Multi-Tier app architecture provides layered protection against attacks
Scale to absorb the attack Design your infra to scale as needed
Auto Scaling (for both WAFs and Web Servers) CloudFront Route 53 ELBs WAFs CloudWatch
Horizontal Vertical
Safeguard exposed resources when you can’t eliminate Internet entry points. 3 services that can help with this: CloudFront
Built in ability to absorb and deter DDoS attacks Solves UDP and SYN flood DDoS attacks Geo Restrictions – whitelists/blacklists Origin Access ID
Route 53 Alias Record Sets to redirect traffic to CloudFrount or different ELB or
“DDoS resilient environment” Private DNS
WAFs (Web Application Firewall) – controls input & shows what the traffic is doing and where it is coming from. Many WAFs have built in IDS Solves DDoS attacks that happen @ the app layer Filters traffic and can ID/prevent injection attacks DDoS mitigation Malware protection Dataloss prevention Detect suspicious activity and block/report “WAF sandwich”
Learn normal behavior (IDS/WAFs) Allows you to ID abnormal behavior faster (duh)
Create a plan for attacks:
pg. 21
SKILLCERTPRO
Validate design of architecture in the event of the different attacks Know what techniques to employ when you get attacked Know who to contact when you get attacked
Amplification/Reflection Attacks NTP, SSDP, DSN, SNMP, Chargen Attacker sends a 3rd party server a spoofed IP request.
Application Attacks (L7) Flood of GET requests Slowloris attack
IDS & IPS IDS inspects all inbound/outbound network traffic & IDs suspicious patterns that can
indicate a network or system attack Monitors only, isn’t proactive
IPS is a network threat prevention technology that examines network traffic to detect & prevent exploitation of systems.
IDS/IPS appliance usually sits in public subnet w/ agents installed on all EC2 instances it is monitoring. Feeds data either to SOC or S3 bucket.
Domain 7.0: Scalability and Elasticity (15% of exam)
7.1 Demonstrate the ability to design a loosely coupled system
7.2 Demonstrate ability to implement the most appropriate front-end scaling architecture
7.3 Demonstrate ability to implement the most appropriate middle-tier scaling architecture
7.4 Demonstrate ability to implement the most appropriate data storage scaling architecture
7.5 Determine trade-offs between vertical and horizontal scaling
CloudFront https://aws.amazon.com/cloudfront/faqs/ Can be used to deliver dynamic, static, streaming, and interactive content of a website using a
global network of edge locations Requests for content are automatically routed to nearest edge location for best possible
performance Is optimized to work with other AWS like S3, EC2, ELB & Route 53 Key Concepts:
2 Distribution Types Web Distributions RTMP Distributions
Geo Restrictions (Geo Blocking) Whitelist or Blacklist by country Done by either API or console Blacklisted viewer will see a HTTP 403 error Can create custom error pages
Support for GET, HEAD, POST, PUT, PATCH, DELETE & OPTIONS CloudFront doesn’t cache responses to POST, PUT, DELETE or PATCH requests –
these requests are proxied back to the origin server SSL configs – can use either HTTP or HTTPS with CloudFront. Can use either default
CloudFront URL or a custom URL with your own certificate. If you go with custom URL: Dedicated IP custom SSL:
pg. 22
SKILLCERTPRO
Dedicated IP addresses to server your SSL content @ each CloudFront edge location.
Expensive $600 per certificate per month per endpoint Supports older browsers
SNI (Server Name Indication) Custom SSL: Relies on SNI extension of Transport Layer Security protocol Allows multiple domains to serve SSL traffic over same IP address by
including the hostname browsers are trying to connect to Does not support older browsers
Wildcard CNAME supported Up to 100 CNAME aliases to each distribution
Invalidation If you delete a file from your origin, it will be deleted from edge locations when
that file reaches its expiration time (as defined in the objects HTTP header) You can proactively remove ahead of expiration time using the Invalidation API to
remove an object from all CloudFront edge locations Use in the event of offensive or potentially harmful material Call an invalidation request You do get charged for it
Zone Apex Support You can use CloudFront to deliver content from the root domain, or “zone apex”
of your website. For example, you can configure both http://www.example.com and
http://example.com to point at the same CloudFront distribution, without the performance penalty or availability risk of managing a redirect service.
To use this feature, you create a Route 53 Alias record to map the root of your domain to your CloudFront distribution.
Edge caching – Dynamic Content Support CloudFront supports delivery of dynamic content that is customized or
personalized using HTTP cookies. To use this feature, you specify whether you want Amazon CloudFront to forward
some or all your cookies to your origin server. CloudFront then considers the forwarded cookie values when identifying a unique
object in its cache. Get both the benefit of content that is personalized with a cookie and the
performance benefits of CloudFront. You can also optionally choose to log the cookie values in CloudFront access logs.
ElastiCache https://aws.amazon.com/elasticache/faqs/ Memcached vs
Redis http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/SelectEngine.Uses.html
https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf Lazy Loading
App tries to get data from cache, if no data avail cache returns null. App gets data from DB, app then updates cache only requested data is in cache node failures don’t matter as request simply goes back to DB again
Write Through Cache is updated when data is written to DB (each write is 2 steps, 1 to DB, 1 to cache)
pg. 23
SKILLCERTPRO
ensures data is never stale good for apps that don’t have a lot of writes infrequently accessed data gets stored in cache (bad) if node is spinning up it could miss writing & cause missing data (bad)
Use Memcached if the following apply to your situation: Does not manage it’s own persistence (relies on DB to have the most recent data) Can be run in a cluster of nodes Can’t backup clusters – goes back to DB to repopulate You need the simplest model possible. You need to run large nodes with multiple cores or threads (Multithreaded
performance). You need the ability to scale out/in, adding and removing nodes as demand on your
system increases and decreases (Horizontal scaling). You need to partition your data across multiple shards. Can populate with both Lazy Loading and Write Through great solution for storing “session” state, making web servers stateless which allows for
easy scaling Use Redis 2.8.x or Redis 3.2 (non-clustered mode) if the following apply to your situation:
You need complex data types, such as strings, hashes, lists, sets, sorted sets, and bitmaps.
You need to sort or rank in-memory data-sets. You need persistence of your key store. You need to replicate your data from the primary to one or more read replicas for read
intensive applications. You need automatic failover if your primary node fails (Multi-AZ). You need publish and subscribe (pub/sub) capabilities—to inform clients about events
on the server. You need backup and restore capabilities. You need to support multiple databases.
Use Redis 3.2 (clustered mode) if you require all the functionality of Redis 2.8.x with the following differences: You need to partition your data across 2 to 15 node groups. (Cluster mode only.) You need geospatial indexing. (clustered mode or non-clustered mode) You do not need to support multiple databases. Redis (cluster mode enabled) cluster mode has the following limitations:
No scale up to larger node types. No changing the number of node groups (partitions). No changing the number of replicas in a node group (partition).
Kinesis Streams https://aws.amazon.com/kinesis/streams/faqs/ Enables you to build custom applications that process or analyze streaming data for
specialized needs. You can continuously add various types of data such as clickstreams, application logs, and
social media to a Kinesis stream from hundreds of thousands of sources. Within seconds, the data will be available for your Kinesis Applications to read and process
from the stream. Data in Kinesis is stored for 24 hours by default, can increase to 7 days Kinesis Streams is not persistent storage, use S3, Redshift, DynamoDB, EMR etc. to store
processed data long term Synchronously replicates streaming data across 3 AZs When would you use Kinesis?
pg. 24
SKILLCERTPRO
Gaming – collect data like player actions into gaming platform to have a reactive environment based off real-time events
Real-time analytics Application alerts Log/Event Data collection Mobile data capture
Key Concepts: Data Producers (e.g. EC2 Instances, IoT Sensors, Clients, Mobile, Server)
Kinesis Streams API PutRecord (single record) PutRecords (multiple records)
Kinesis Producer Library (KPL) simplifies producer application development, allowing developers to achieve
high write throughput to a Kinesis Stream Kinesis Agent
Java app that you can install on Linux devices Shards
Shard is the base throughput unit of an Amazon Kinesis stream One shard provides a capacity of 1MB/sec data input and 2MB/sec data output.
One shard can support up to 1000 PUT records per second. You will specify the number of shards needed when you create a stream. For example, you can create a stream with two shards. This stream has a
throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second.
Can dynamically add/remove shards from stream via resharding Data Records – the unit of data stored in an Amazon Kinesis stream
Sequence number – a unique identifier for each record Assigned by streams after you write to the stream with client.putRecord(s)
Partition Key – used to segregate and route records to different shards of a stream Used to group data by shard within a stream Stream service segregates data records belonging to a stream into multiple
shards Use partition keys associated w/ each data record to determine which shard
a given data record belongs to Specified by the app putting the data into a stream
Data (blob) – data your producer is adding to stream. Max size = 1MB Data Consumers (e.g. Amazon Kinesis Streams Applications)
Typically EC2 instances that are querying the Kinesis Streams Run analytics against the data & pass data onto persistent storage
SNS Mobile Push https://aws.amazon.com/sns/faqs/ Subset of SNS Push notifications can be sent to mobile devices and desktops using one of the following
services: Amazon Device Messaging (ADM) Apple Push Notification Service (APNS) Google Cloud Messaging (GCM) Windows Push Notification Service (WNS) for Windows 8+ and Windows Phone 8.1+ Microsoft Push Notification Service (MPNS) for Windows Phone 7+ Baidu Cloud Push for Android devices in China
Steps:
pg. 25
SKILLCERTPRO
Request creds from mobile platform Request token from mobile platform Create platform application object Create platform endpoint object Publish message to mobile endpoint http://docs.aws.amazon.com/sns/latest/dg/mobile-push-pseudo.html
Domain 8.0: Cloud Migration and Hybrid Architecture (10% of exam)
8.1 Plan and execute for applications migrations
8.2 Demonstrate ability to design hybrid cloud architectures
VMware Integration AWS management portal for vCenter: https://aws.amazon.com/ec2/vcenter-portal/ Portal installs as a vCenter plug-in Enables you to migrate VMware VMs to EC2 & Manage AWS resources from within vCenter Use cases:
Migrate VMs to EC2 Reach new geographies from vCenter Self-Service AWS portal from within vCenter Leverage VMware experience while learning AWS
Migrating to cloud using Storage Gateway https://aws.amazon.com/storagegateway/details/ Can use storage gateway to migrate on-prem VMs to AWS Snaps must be consistent. Take VM offline before taking snap or use OS/App tool to flush to
disk Data Pipeline https://aws.amazon.com/datapipeline/ and http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html Data Pipeline is a web service that helps you reliably process and move data between different
AWS compute and storage services, as well as on-premises data sources, at specified intervals. Can create, access & manage using:
AWS mgmt. console CLI SDKs Query API
Supported compute services: EC2 EMR
Supported Services to store data: DynamoDB RDS Redshit
Can be extended on on-premises: AWS supplies a Task Runner package that can be installed on your on-premises hosts.
This package polls the Data Pipeline service for work to perform. When it’s time to run an activity, Data Pipeline will issue the appropriate command to the Task Runner.
With Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as: S3
pg. 26
SKILLCERTPRO
RDS DynamoDB Elastic MapReduce (EMR).
Pipeline – the resource that contains the definition of the dependent chain of data sources, destinations, and predefined or custom data processing activities required to execute your business logic. Contains the datanodes, activities, preconditions & schedules Can run on EC2 or EMR Consists of:
Task Runner – package continuously polls the AWS Data Pipeline service for work to perform. Installed in 1 of 2 ways: Installed automatically on resources that are launched and managed by the
Data Pipeline service Manually installed on a compute resource that you manage, such as a long-
running EC2 instance or an on-premises server Data node – The end destination for your data. a data node can reference a
specific Amazon S3 path. Data Pipeline supports an expression language that makes it easy to reference data which is generated on a regular basis For example, you could specify that your Amazon S3 data format is
s3://example-bucket/my-logs/logdata-#{scheduledStartTime(‘YYYY-MM-dd-HH’)}.tgz
Other examples: A DynamoDB table that contains data for HiveActivity or EmrActivity
to use A MySQL table and database query that represents data for a pipeline
activity to use A Redshift table that contains data for RedshiftCopyActivity to use
Activity – an action that AWS Data Pipeline initiates on your behalf as part of a pipeline. Example activities are EMR or Hive jobs, copies, SQL queries, or command-line scripts Data pipeline provides pre-packaged activities like:
Move/copy data from one location to another Run an EMR cluster Run a Hive query Copy data to/from Redshift tables Run a custom UNIX/Linux shell command as an activity Run a SQL query on a DB
Use ShellCommandActivity to specify custom activities Precondition – a readiness check (consisting of conditional statements that must
be true) that can be optionally associated with a data source or activity. This can be useful if you are running an activity that is expensive to
compute, and should not run until specific criteria are met (i.e. does data/table/S3 path/S3 file exist?)
Can specify pre-packaged preconditions and custom preconditions 2 types of preconditions:
System managed User managed
Schedule – when your pipeline activities run and the frequency with which the service expects your data to be available
Network Migrations CIDR Reservations:
pg. 27
SKILLCERTPRO
Biggest you can have is /16 Smallest you can have is /28 5 IP addresses are reserved per CIDR block, in a /24:
.0 = network address .1 = VPC router .2 = mapping to amazon provided DNS .3 = reserved for future use .255 = broadcast is not supported in a VPC, so AWS reserves this address
VPN to Direct Connect Migrations Most orgs start with VPN & then move to Direct Connect as traffic increases Once Direct Connect is installed, VPN and Direct Connect should be configured to be in
the same BGP community Then config BGP so that VPN has a higher cost than Direct Connect connection
Overall Summary:
Active Directory
SimpleAD
Microsoft Active Directory compatible directory from AWS Directory Service and
supports common features of an active directory.
Cannot connect to existing on-prem AD
AWS Directory Service for Microsoft Active Directory
Managed Microsoft Active Directory that is hosted on AWS cloud.
AD Connector
Proxy service for connecting your on-premises Microsoft Active Directory to the
AWS cloud.
WorkDocs
Can we used to share documents via AD directory services
Can define time duration or passcodes to access the document
API Gateway
pg. 28
SKILLCERTPRO
Lambda non-proxy integration flow
Method Request -> Integration Request -> Integration Response -> Method
Response
Maximum integration timeout for AWS API gateway is 29 seconds
If you want to change the default timeout for an integration request, uncheck Use Default
Timeout and change it to something else then 5 seconds
You can capture a response code and rewrite it to something custom via Gateway Responses
Athena
Serverless platform
Automatically executes queries in parallel
If asked whether to use Athena or Quicksite, look for a mention of whether the team has
experience with SQL. If they do, pick Athena
Aurora
Can replicate from an external master instance or a MySQL DB instance on AWS RDS.
Aurora serverless is best suited to situations where you can’t predict what traffic will be like
Backup
The following services can be backed up and restored using AWS Backup
EFS, DynamoDB, EBS, RDS, Volume Gateway
Batch
Configures resources, schedules when to run data analytics workloads
Suitable for running a bash script using a job
pg. 29
SKILLCERTPRO
Batch scheduler evaluates when / where / how to run jobs (no need for integration with
cloudwatch events to schedule)
Key components
Jobs: unit of work (script, exec, docker container)
Job Definitions: specifies how a job is run
Job Queues: Jobs submitted are added to queues
Compute Environment: compute resources that run jobs
If your Batch jobs are stuck in RUNNABLE state check:
Role assigned has adequate permissions
CPU and RAM given as per compute allocation
Check EC2 limits on the account
Beanstalk
No concept of programmable infrastructure / Git source. Can’t do infrastructure as code end
to end without other tooling.
Billing
Billing reports can be delivered to an S3 bucket
Consolidated billing is only available in master accounts (where there are children accounts
under organisations). These reports include activity for all child accounts
CloudFormation
Retain data for S3: Set DeletionPolicy on S3 resource to retain
Create RDS Snapshot on delete: Set RDS resource DeletePolicy to snapshot
There are three options for RDS DeletePolicy: Retain, Snapshot and Delete
To coordinate stack creations that rely on configuration to be executed on an EC2 you
should use the CreationPolicy attribute under the wait condition.
pg. 30
SKILLCERTPRO
If you need to reference AZ info within CloudFormation templates you can make use of the
Fn::GetAZs function
Launching EC2 instances with CloudFormation requires IAM permissions to be provided to
the person creating the stack
Intrinsic functions can be used in Properties, Outputs, Metadata attributes and update policy
attribute
CloudFront
Managed content delivery network (CDN)
S3 Transfer Acceleration can be used to distribute S3 content more efficiently globally
Origin Access Identity can be used to grant access to objects in s3 without having to give a
bucket public access.
Different HTTP methods for CloudFront forwarding and there’s uses:
GET, HEAD: You can use CloudFront only to get objects from your origin or to get
object headers.
GET, HEAD, OPTIONS: You can use CloudFront only to get objects from your origin,
get object headers, or retrieve a list of the options that your origin server supports.
GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE: You can use CloudFront to get,
add, update, and delete objects, and to get object headers. In addition, you can
perform other POST operations such as submitting data from a web form.
Support for common content types such as:
Static content (S3 website or web assets)
Live events (streaming video)
Media content (HLS)
TTL can be changed on CloudFront to deliver new content immediately when it changes
CloudWatch
Cron event to trigger lambda
pg. 31
SKILLCERTPRO
CloudWatch Events -> Create Rule
provide a valid expression (00 08 * ? )
Provide a target(s)
Can trigger a number of different services like Lambda, SNS, SQS, CodeBuild etc.
Cross-region dashboards are a thing that work so if you need to display metrics from
different regions on one dashboard, it is possible
Data Migration Service
Suitable for migrating databases like MySQL to Aurora or RDS
Data migrated is encrypted with KMS
By default is uses the AWS managed aws/dms key, or a custom managed key (CMK)
can be provided
DMS input stream can be throttled to accommodate downstream systems that can’t ingest
at full speed.
Ingesting data to elasticsearch where indexing queue fills up
DirectConnect
Link aggregation groups (LAG) can bond DirectConnects together
Direct connect to a VPC provides access to all AZ’s
Maximum number of DirectConnect instances in a LAG is 4
All must contain the same bandwidth
Troubleshooting Direct Connect
Confirm no firewalls are blocking TCP 179 (or ephemeral ports)
Confirm the ASNs match on both sides
DynamoDB
Supports autoscaling
pg. 32
SKILLCERTPRO
When defining primary keys, use the many to few concept
Supported CloudWatch metrics
ProvisionedWriteCapacityUnits
ProvisionedReadCapacityUnits
ConsumedWriteCapacityUnits
ConsumedReadCapacityUnits
You cannot configure On Demand for read and Provisioned for write separately. It has to be
used for one or both
Attribute called EXPIRE can provide a way of doing TTL on items in a dynamodb table
Glue
Containers crawlers that connect to
S3, JDBC and DynamoDB
Glue has a central metadata repository (data catalog)
Fully serverless ETL
EC2
High network performance
Cluster placement groups are recommended for applications that need low network
latency / high throughput between instances in the same group
You can detach secondary network instances when the instance is running or
stopped.
You can’t detach the primary
If you need a static MAC address, you have to create an ENI (where a random one will be
assigned). Then reattach that same ENI to different instances going forward.
There is no way to manually config a MAC address on AWS EC2
Reservable Instances can be pools among accounts in AWS Organisations
pg. 33
SKILLCERTPRO
If 3 t2.mediums are purchased, and 1 is used in 1 account, 2 more could be used in
other accounts within the organisation
Placement groups could suffer from capacity errors if you try to add new instances to the
group. It's recommended to relaunch these instances again to see if you get capacity. The
best thing to do is launch with the number of instances you are going to need in a placement
group at the start obviously.
To improve high network throughput make use of Single Root I/O Virtualization (SR-IOV)
In order to hibernate an EC2 instances you require the following
Instance root has to be EBS and not instance store
Instance cannot be in an Autoscaling group (or used by ECS)
Instance root volume must be large enough so RAM can be stored
Hibernation of instances must also be using a HVM AMI type.
It has to be enabled on creation of the EC2 instance as well as be supported on your
AMI
For specifying Drive letters on Windows instances make use of EC2Config
Lost your SSH keys? Two options
Stop the instance, detach the root volume, attach it as another volume to another
EC2, modify the authorized_keys file, move the volume back to the original instance,
start it.
Systems Manager Automation with AWSSupport-ResetAccess
AMI
You cannot create an AMI from an ec2 connected instance store
If you launch an AMI, the PEM keys will be removed, however the authorization keys will still
be on the instance.
You need to ensure that the AMI is launched with the same PEM key
Autoscaling
pg. 34
SKILLCERTPRO
AZRebalance will attempt to balance the number of instances in different availability zones.
When associating an ELB with an ASG the ASG gets awareness about unhealthy instances
(and can terminate)
EBS
When using an encrypted EBS volume the following data is encrypted:
Data at rest in the volume
Data moving between the volume and instance
Snapshots created from the volume
Volumes created from the snapshots
Snapshots can be created every 2, 3, 4, 6, 8, 12, 24 hours
Lifecycle policies help retain backups required for compliance / audits. Also deleted
unnecessary ones to save cost.
When using snapshots (if you don’t want downtime) don’t use RAID
Copies of snapshots with retention policies do not have policies carried over during copy.
In order to mount an EBS volume, it must be in the same AZ as the instance you are
mounting to
Root volume can be changed without stopping the instance provided its to gp2, iot1,
standard
sc1 or st1 cannot UNLESS they are non-root volumes (must be at least 500gb)
When an EBS volume has two tags, multiple lifecycle policies can run at the same time
Encrypted snapshots cannot be copied to non encrypted ones
Non encrypted snapshots can be encrypted when copying them using the --kms-key-id (with
your CMK)
EFS
Data is distributed across multiple availability zones which provides durability and
availability.
pg. 35
SKILLCERTPRO
Supports 2 throughput modes
Bursting throughput: uses burst credits to determine if the filesystem can burst
Provisioned throughput
Provides both in-transit and at-rest encryption using AWS KMS
Mount an EFS volume with encryption in transit by
Getting EFS id, create mount targets for EC2 instance, use the mount helper with
the -o tls flag
Does not support Windows-based clients
Storage Gateway / File Gateway is the recommendation if you need file store (using
SMB mount)
Load Balancers
When using Network Load Balancers Secure connections should be TCP 443 with targets also
using TLS (port 443)
When sticky sessions are needed, it's usually recommended to use ElastiCache to store
session state
You don’t want to bind a user to a particular instance under a load balancer
Requires code to retrieve session state from ElastiCache
If you need to get the client IP when using a Classic Load Balancer:
TCP: configure proxy protocol to pass the IP address in a new TCP header
HTTP: send the client IP in the x-forward-for header
Cross-zone load balancing can be enabled to spread requests across your AZ
If a static IP is needed with a load balancer, provision a NLB with an attached EIP
Application Load Balancers support SNI
Is able to deal with multiple SSL certificates per listener
ElastiCache
Redis
pg. 36
SKILLCERTPRO
Can only be upgraded, cannot be downgraded
IAM
AssumeRole can be secured down with an ExternalId
Flow for using a custom identity system
Custom identity broker app, this authenticates the user
Uses GetFederationToken API and passes a permission policy to get temp credentials
from STS
Alternatively can call AssumeRole API to get temp access using role-based access
instead
Kinesis
Ideal for real-time data ingestion
Kinesis Data Streams
Can store records in order and replay them in the same order later (up to 7 days)
Makes it ideal for financial transactions
Able to have multiple applications consume from the same stream concurrently.
Kinesis Video Streams
HLS can be used for live playback
Use GetHLSStreamingSessionURL and then use the resulting URL in the video player
of your choice
Content delivery typically leverages AWS Elemental MediaLive / MediaPackage and
CloudFront to distribute content globally
Can view either Live or archived video
pg. 37
SKILLCERTPRO
KMS
Two types of keys
Master keys: used directly to encrypt and decrypt up to 4 kilobytes of data and can
also protect data keys
Data keys: used to encrypt and decrypt customer data
If you are accessing a very large number of KMS encrypted files at a time there is a chance
you will hit the KMS encrypt request account limit. You might need to open a support case
to resolve
Grants in KMS
Dynamically / programmatically revoke a key after its use.
Better then changing roles / policies
Managed Blockchain
Supported frameworks include Hyperledger Fabric and Ethereum.
If you have members who would like to deploy their own blockchain networks they can use
the CloudFormation templates to support ECS clusters or EC2 instances
Migration Hub
AWS Discovery Agent can transmit to Migration hub, then Data exploration can be done in
Athena
Agentless migrations can only pull information like RAM or Disk I/O from VMware
If your OS isn’t supports for import, you can provide the details yourself via import template
Migration steps from VMware
Schedule migration job
Upload your VMDK and then convert it to and EBS snapshot
Create an AMI from the snapshot
pg. 38
SKILLCERTPRO
OpsWorks
Can be managed by CloudFormation AWS::OpsWorks::Stack
This can be part of a nested stack with a parent containing all the VPC, NAT Gateway
etc. resources
Lifecycle events:
Setup, Configure, Deploy, Undeploy, Shutdown
Handles autohealing of instances
Bluegreen style deployments can be accomplished by creating a new stack with identical
configuration
This can be used when making updates to AMIs
Process for deploying with AWS CodePipeline
Create stack, layer and instance in a OpsWorks Stack
Upload app code to bucket, then add your app to OpsWorks stack
Create a pipeline (run it), verify the app deployment in OpsWorks stack
Process for updating OpsWork stacks to the latest security patches
Run update dependencies stack command
Create new instances to replace the only ones
When you attach a load balancer to a layer
Deregisters currently registered instances
Re-registers layer instances when they come online (removes offline ones)
Handles the starting of routing requests to the registered instances
Organizations
You may only join one organization (even if you receive more than one invite)
Invitations expire after 15 days
To resend an invite, you must cancel the pending one, then create a new invitation
In order to move an account to a different OU you need the following permissions
organizations:MoveAccount
pg. 39
SKILLCERTPRO
organizations:DescribeOrganization
Accounts can be dragged into different OU’s, however OU’s can’t be dragged around to new
locations in the organization's structure.
Instead you must create new OU’s and then reassign any SCP’s you had inplace.
Then move the accounts to these OU’s again.
If you want to block access to unused services, check the IAM Activity for services (never
used, last used date) and base your blocks on this information
SCP’s can only be Deny (not allow)
Explicite denies will always overrule explicit allows
To apply WAF rules across an organization make use of AWS Firewall Manager
You cannot restrict a member account from the ability to change its root password or
manage MFA settings
Improve consolidated billing by also tagging resources
This will group expenses on the detailed billing report
To access a member account
Use sts:AssumeRole with OrganizationAccountAccessRole
The master account isn't impacted by SCPs
Redshift
Does not have read replicas
Queries cannot be paused in Redshift
Use redshift workload management groups
Priorities of these workloads can be assigned to these groups
Can create single node cluster via CLI (and in Console as of recently)
Using the RedshiftCommands.sql file from the Billing section of your account you can
analyse billing reports.
Redshift snapshots are a very expensive solution normally, so if cost is important, don’t
select anything to do with snapshots.
pg. 40
SKILLCERTPRO
Snapshots on redshift could be pointless too if you can repopulate all the data in the
cluster with S3 instead
RDS
When a primary DB instance fails in a multi-AZ deployment, the CNAME is changed from
primary to standby so there’s no need to change a reference to the other DB in code.
Multi-AZ replication is done synchronously
For redundant architectures Multi-AZ support is used
Read-replicas aren’t used for redundancy, they are used to improve performance.
If Encryption is enabled on the RDS instance that:
Encrypts the underlying storing
Defaults to also encrypting the snapshots as they are created
RDS does not support Oracle RAC
RDS Oracle can read/write from S3 directly.
Option groups should have a role with permissions to access S3
Feature S3_INTEGRATION
If there is an RDS update available that you aren’t ready to apply, you can Defer the updates
indefinitely until you are ready.
Read Replicas require access to backups for maintaining their read replica logs. This means if
you want to disable automatic backups you must remove all Read Replicas first.
RDS for Oracle
Supported backup / restore options
Oracle Import/Export
Oracle Data Pump Import/Export
RDS Snapshot / point in time recover
RDS VMware
pg. 41
SKILLCERTPRO
Manages:
Patching
Multi AZ configurations
Backups based on retention policies
Point-in-time restores (from on-prem or cloud backups)
Route53
Latency based routing
Redirect requests to nearest region
If you have issues with route53 not routing to ‘live’ hosts, check to make sure you have
“Evaluate Target Health” set to “Yes” on the latency alias. Same goes with HTTP health
checks on weighted resources.
Resolve two domains to one domain (test1.example.com, test2.example.com ->
test3.example.com)
CNAME for the records test1.example.com, test2.example.com to
test3.example.com
Resolve a DNS entry to an ALB
Alias record test3.example.com to ALB address
S3
Using the x-amz-server-side-encryption request header when making an API call will ensure
an object is server side encrypted (SSE)
If versioning is enabled on S3 after objects are already put in, those objects with have a
version ID of null
Referrer keys in a bucket policy can make sure requests to Objects come from a domain you
operate
INTELLIGENT_TIERING storing class is used to optimize storage costs automatically for you
pg. 42
SKILLCERTPRO
SQS
Message group ID can be used on FIFO delivery to ensure messages that belong to the same
message group are always processed one by one.
E.g. binding platform with multiple products, FIFO and a message group based on
the product being bid on
Dead-letter queues need to match the queue they are set up for. So a standard SQS queue
needs to use a standard dead-letter queue (not FIFO)
Systems Manager
Troubleshooting why you can’t Run commands on a SSM host
Check the latest SSM Agent is installed on the instances
Verify the instances has an IAM role that lets its talk to SSM API
Services that can have costs associated to them
On-Premises Instance Management: pay-as-you-go pricing
Parameter Store: calling API costs
System Manager Automation
Schedule log file copying from hosts
State Manager to run a script at a given time
Schedule in Maintenance Windows for the log file moves
Patch management can be applied to instances using the following methods
Tag key/value pairs that identify the resources
Patch groups, where a group requires a particular tag
Manual selection of the hosts to patch
VPC
You cannot create subnets with overlapping CIDR ranges, you’ll get an error on trying to
create.
pg. 43
SKILLCERTPRO
VPC subnets will have 5 reserved addresses
10.0.0.0: Network address.
10.0.0.1: Reserved by AWS for the VPC router.
10.0.0.2: Reserved by AWS. The IP address of the DNS server is the base of the VPC
network range plus two.
10.0.0.3: Reserved by AWS for future use.
10.0.0.255: Network broadcast address (but no broadcast supported).
When wanting to make changes to a DHCP option set, you must create a new one then
assiate it to your VPC replacing the old one.
Troubleshooting EC2 in VPC unable to talk to data-center over Direct connect?
Make sure route propagation to the Virtual Private Gateway (VGW) is setup
Make sure the IPv4 dest address that routes the traffic over the VGW as a prefix you
want to advertise
Sharing a SaaS product out via your VPC to customers can be done via AWS endpoint service
(PrivateLink) to other customers VPC’s
Customers need to use an interface VPC endpoint on their end.
Options for sharing an application running in a shared VPC within Organization
VPN between two VPCs
Use AWS Resource Access Manager to share subnets within the account
VPN
Creating a VPN connection requires the static IP of the customer gateway device
With dynamic routing type, a Autonomous System Number (ASN) is also required
An option for if you need Multicask in a VPC is to build a virtual overlay network
Create ENIs between subnets
Runs on the OS level on the instances in your VPC.
Endpoints
pg. 44
SKILLCERTPRO
Provide a secure link to access AWS resources from a VPC
NAT Gateway
Used to communicate with the internet via a private subnet
Secure private resources like Databases and Application servers that shouldn’t have
public connectivity
X-Ray
Segments allow for detailed tracing
Annotations can help find specific areas of the application in the tracing records (isolate the
issues / impact area)
Miscellaneous
Below is a set of random pieces of information that didn't really need it own section.
IPS/IDS systems within VPC
Configure to listen / block suspected bad traffic in and out of VPC
The system could be Palo Alto networks
Monitors, alerts and filters on potential bad traffic sent in / out of VPC.
Reducing DDOS surface area
Remove non-critical internet entry IPs
Configure ELB to auto-scale
Rekognition CLI example for detecting faces
aws rekognition detect-faces
SAML identity provider in IAM
SAML metadata document from the identity provider
Create a SAML IAM identity provider in AWS
Configure the SAML Identity provider with relying party trust
pg. 45
SKILLCERTPRO
In Identity provider configure SAML assertions for auth response
AWS has its own ways of protecting customers from DDoS
If you are trying to flood a connection, or running a pentest you will likely find that
you’ll be blocked by AWS
You need to notify and have AWS grant you permission if you are running pentesting
jobs
Want to access Support Ticket API?
You need Business support plans
Alexa for Business
You can have Alexa devices perform tasks for staff (getting info for them, booking
meetings)