Cloud CustodianFleet Management in AWS
Source Source
Serverless
BUT
We still have Servers
Lots
And
Lots
:(
A sea of policies- fleet wide savings policies
- off hours stops for dev environments- garbage collect ebs, elb, etc- Detect over-provisioned resources
- numerous security policies- Encrypt all the Things- Access Control- ssl ciphers
- numerous compliance policies- tag compliance / chargeback- current images- backups
Source
Fleet ManagementAcross Lots of federated accounts.
Natural tendency- One off scripts-
But- How are they implemented- How are they deployed- How are they configured- How are they managed- Who owns them
Software Engineering- How are they Tested- Are they Reviewed
Who Knows? Source
Cloud Custodian•A rules engine for infrastructure management.
•YAML DSL for policies based on query resources or subscribe to events, apply filters, take actions.
Integrated Lambda provisioning and event sources.
•Outputs to Amazon S3, Amazon Cloud Watch Logs, Amazon Cloud Watch Metrics
Opensource @ https://github.com/capitalone/cloud-custodian
- name: require-rds-encrypt-and-non-public resource: rds mode:
- type: cloudtrail- events:
- CreateDBInstance filters:
- or: - Encrypted: false - PubliclyAvailable: true actions:
- type: delete skip-snapshot: true
Amazon Cloud Watch EventsFeatures
● Powerful infrastructure observation capabilities
● Enables “realtime” rules enforcement and reaction with wide coverage of AWS product APIs.
Sources
● All Cloud Trail Events (P99 @ 90s delivery window as of April 2016)
● EC2 instance state changes (600ms)● ASG instance membership changes
(600ms)● Periodic Scheduling (custom)● Custom events
Cloud CustodianResource type policies (ec2 instance, ami, auto scale group, bucket, elb, etc).
Filter resources
Invoke actions on filtered set
Output resource json to s3, metrics to cloudwatch
Vocabularies of actions, and filters for policy construction.
- name: ebs-copy-instance-tags resource: ebs filters: - type: value key: "Attachments[0].Device" value: not-null actions: - type: copy-instance-tags tags: - App - Env - Owner - Name
Filtering resourcesGeneric Value filter
- Jmespath expressions on resource’s json representation
- Lots of operator matching (in, not-in, absent, not-null, gte, regex, etc)
Arbitrary nesting of filters with ‘or’ and ‘and’ blocks.
Simple key/value are equality matches with value expressions
- type: value # Ignore keys that start with # 'aws:' as they don't count towards the limit. Key: "[length(Tags[?!starts_with(Key,'aws:')])][0]" op: less-than value: 10
- or: - “tag:App”: absent - “tag:Env”: absent - and: - Encrypted: false
Multi Step Workflows
“Poorly tagged instances, should be stopped in 1 day, and then terminated in 3”
- mark-for-op- marked-for-op
Chain together multiple policies.
- name: ec2-tag-compliance-mark resource: ec2 description: | Find all non-compliant tag instances for stoppage in 1 days. mode: type: periodic schedule: rate(1 day) filters: - "tag:maid_status": absent - or: - "tag:App": absent - "tag:Env": absent - "tag:Owner": absent actions: - type: mark-for-op op: stop days: 1
- name: ec2-tag-compliance-stop resource: ec2 description: | Stop poorly tagged and schedule Terminate. mode: type: periodic schedule: rate(1 day) filters: - type: marked-for-op op: stop - or: - "tag:App": absent - "tag:Env": absent - "tag:Owner": absent actions: - stop - type: mark-for-op op: terminate days: 4
Custodian Vocabulariesasg: actions: - delete - mark-for-op - rename-tag - suspend - tag - remove-tag - resume propagate-tags filters: - vpc-id - time - marked-for-op - not-encrypted - image-age - onhour - tag-count - offhour - launch-config
ec2: actions: - mark-for-op - remove-tag - snapshot - tag - start - tag-trim - stop - terminate filters: - ebs - marked-for-op - ephemeral - image - instance-age - onhour - tag-count - offhour - image-age
s3: actions: - attach-encrypt - remove-statements - encrypt-keys - encryption-policy - delete-global-grants filters: - missing-statement - global-grants - is-log-target - has-statement
Additional resource types
- RDS - ELB - Redshift - CloudFormation - AMI - EBS - EBS Snapshot
MetricsResource Count
Action Time
Query/Filter Time
Custom
Example Policy - Amazon S3 EncryptionRequire encryption for objects
name: s3-require-encryptionresource: s3description: | Apply encryption required policy to new bucketsmode: type: cloudtrail events: - CreateBucketactions: - encryption-policy - encrypt-keys
Find elb/s3 logs sinks and switch to lambda encrypt name: s3-remediateresource: s3description: | Encryption required policymode: type: periodic schedule: rate(1 day) filters: - type: is-log-targetactions: - attach-encrypt - type: remove-statements statement_ids: - RequireEncryptedPutObject
Roadmap- Elastic search indexing of records / outputs (programmatic dashboards /
historical trending)- Flourish ??- Cross Language support (lambda invoke actions)- Moar filters/actions/resources
https://github.com/capitalone/cloud-custodian/milestones
Incidentally, We’re Hiring ;-)
Top Related