Patterns for Mobile and IoT backends with serverless paradigms
Serverless IoT at iRobot - QCon San Francisco · 2020-06-05 · Serverless IoT at iRobot Ben Kehoe...
Transcript of Serverless IoT at iRobot - QCon San Francisco · 2020-06-05 · Serverless IoT at iRobot Ben Kehoe...
Serverless IoT at iRobot
Ben Kehoe @ben11kehoeCloud Robotics Research Scientist
iRobot 2017 | 2@ben11kehoe
•Cloud Robotics Research Scientist at iRobot
•Serverless evangelist
•AWS Community Hero
About me
iRobot 2017 | 3@ben11kehoe
iRobot 2017 | 4@ben11kehoe
Cloud Robotics:Connecting robots to the internet to help them do more and better things
iRobot 2017 | 5@ben11kehoe
What is serverless?1
iRobot's journey2
Serverless architecture at iRobot3
Serverless deployment at iRobot4
Serverless ops at iRobot5
Serverless organizations6
iRobot 2017 | 6@ben11kehoe
What is serverless?
1
iRobot 2017 | 7@ben11kehoe
•The wrong first question
What is serverless?
iRobot 2017 | 8@ben11kehoe
•Cheaper
•Faster
•Leaner
•Better
What does serverless do?
iRobot 2017 | 9@ben11kehoe
•FaaS
•Event-driven compute
•Never paying for idle
•Containers?
•Public cloud
What serverless is not
iRobot 2017 | 10@ben11kehoe
Increasing serverless-ness with:
•Service-full + emphemeral compute
•Resources billed → resources used
•Smaller, more abstract control plane
Serverless is a spectrum
iRobot 2017 | 11@ben11kehoe
•Cheaper
•Faster
•Leaner
•Better
What does serverless do?
iRobot 2017 | 12@ben11kehoe
On prem IaaS PaaS Serverless
% cost at idle
Cheaper
Not to scale
iRobot 2017 | 13@ben11kehoe
Faster, Leaner
On prem IaaS PaaS Serverless
Effort
Dev Ops DiffOps
On prem IaaS PaaS Serverless
Codebase size
Business logic Other code Infra
Not to scale
iRobot 2017 | 14@ben11kehoe
iRobot's journey
2
iRobot 2017 | 15@ben11kehoe
2015
iRobot 2017 | 16@ben11kehoe
iRobot 2017 | 17@ben11kehoe
iRobot 2017 | 18@ben11kehoe
iRobot 2017 | 19@ben11kehoe
Then
iRobot 2017 | 20@ben11kehoe
Then Now
iRobot 2017 | 21@ben11kehoe
Then Now Future
iRobot 2017 | 22@ben11kehoe
Then Now Future2015
iRobot 2017 | 23@ben11kehoe
AWS IoT
iRobot 2017 | 24@ben11kehoe
•Serverless• Event-driven
• Scalable
• Integrates with AWS ecosystem
•Device shadows
• Integrates with your process
AWS IoT
iRobot 2017 | 25@ben11kehoe
•Event-driven
•Scalable
•Lean for device makers• Reverse: AWS Greengrass
•Focus
IoT + serverlessA natural fit
iRobot 2017 | 26@ben11kehoe
CA Certificate
Robot certificate
signed by CA
Authenticated
MQTT over TLS
AWS IoT
iRobot 2017 | 27@ben11kehoe
HSM
Public key
Signed
certificate
Assembly line
Great Firewall
CA Certificate
Private key
Certificate
AWS IoT
Factory
iRobot 2017 | 28@ben11kehoe
iRobot 2017 | 29@ben11kehoe
•Fully serverless production cloud
•2 million connected robots by 2018
•Mostly serverless analytics platform
•Basis for future data-powered platform
Long story short: success!
iRobot 2017 | 30@ben11kehoe
•Production application:• 100+ Lambda functions
• 25 AWS services
• 0 unmanaged EC2 instances
•AWS footprint:• ~50 accounts, growing constantly
• 1000s of Lambda deploys per day
•Low single digit FTE supporting operations
iRobot scale
iRobot 2017 | 31@ben11kehoe
•Architecture
•Deployment
•Operations
•Organizational
Long story less short
iRobot 2017 | 32@ben11kehoe
Serverless architecture at iRobot
4
iRobot 2017 | 33@ben11kehoe
def foo(input):
quux = bar(input.baz)
internalState.quux = quux
def bar(input):
# do work
return result
Before serverlessaka the dark ages
EC2
iRobot 2017 | 34@ben11kehoe
def handler(event, context):
quux = Lambda.Invoke(
'bar',
event['baz'])
DynamoDB.PutItem(
'quux',
quux)
def handler(event, context):
# do work
return result
Foo
ExternalState
Bar
DynamoDB
Lambda functions
iRobot 2017 | 35@ben11kehoe
/register Check
Cert
Robots to
Register Queue
Reader
Logging Lifecycle event
PermissionsCreate Shadow
Dead Letter Queue
AWS
IoT
API Gateway
Queues
AWS IoT operations
iRobot 2017 | 36@ben11kehoe
•Call graph → component graph
•Distributed system thinking• Traditionally occurs at system
boundaries
• Serverless: must be treated systematically
•Build robust-by-design systems• Better
Serverless architecture
iRobot 2017 | 40@ben11kehoe
Request IoT topic
Presigned URL
Symmetric
encryption key
Robot
public key
Response IoT topic
Presigned URL
Object encrypted with
symmetric key
HTTPS call
to presigned URL
S3
Key
Management
Service
Robot
certificate
S3
blob store
+
symmetric key
encrypted with
robot public key
Lambda handler
iRobot 2017 | 43@ben11kehoe
Serverless deployment at iRobot
3
iRobot 2017 | 44@ben11kehoe
Then Now Future2015
iRobot 2017 | 45@ben11kehoe
iRobot 2017 | 46@ben11kehoe
•Collective noun for cats: "clowder"
•Designated cat herder: CloudFormation
•Custom resource Lambdas• Github repo: iRobotCorporation/cfnlambda
Deployment tool: cloudr
Source
Zipped
functions
S3
blob
store
CFN
template
CFN
stack
Custom
resource
Lambda
iRobot 2017 | 47@ben11kehoe
Red/Black deployments
iRobot 2017 | 48@ben11kehoe
•Serverless = no overhead to red/back
• IoT makes things tricky
•Data stores, etc. have separate life cycle
Hosting multiple versions
iRobot 2017 | 49@ben11kehoe
Deployed system architecture
Application Proxy resources Resources
v1
v2
v3
v4
A1 B1
C1
B2
D1
A1 B1
C1
B2A1
C1
D1
A1
C1
B2
iRobot 2017 | 53@ben11kehoe
Serverless operations at iRobot
5
iRobot 2017 | 54@ben11kehoe
• Sumo Logic
• Essential for log sleuthing
• Get all data associated with an artifact immediately across all accounts
• Provides quantitative metrics on fleet health
• Alarms and notifications
• Extensive use of CloudWatch as well
Monitoring
iRobot 2017 | 58@ben11kehoe
• Same environment in the platform as production
• Orders of magnitude more churn• Exercises the account limits• Tests metrics to determine relevance and
meaning
• Bonus – Developer activity provides additional visibility into how the platform is currently behaving• Higher volume of deployments in many
different AWS accounts means problems found quickly
• This can alert us prior to problems hitting prod
DevOpsDevelopers can be platform testers, canaries, and guinea pigs
iRobot 2017 | 59@ben11kehoe
Metrics are our portal : Example – AWS IoT• AWS IoT today has a ton of metrics
• At launch, it had <10
• Without throttling metrics, thing shadow updates, or WebSocketmetrics it was hard to debug issues• Especially early on with small numbers
of robots
• Can I connect? How many publishes?
• Load scale, are we over our limits?
Visibility
iRobot 2017 | 60@ben11kehoe
• Enterprise Support has been a valuable resource• They are our eyes and ears within AWS
• Engage with them to run load tests, understand account limits
• Our AWS Support team has made the effort to understand our technology choices
• All of our AWS users, company-wide, benefit from being able to create tickets
AWS Enterprise Support
iRobot 2017 | 61@ben11kehoe
• Metrics, metrics, metrics• Service teams are always on the lookout
for which new metrics to include – connect with them and share your requests!
• Kinesis shard-level metrics, Lambda iterator ages, all added with user input and makes a real difference in understanding system performance
• Personal Health Dashboard• Per-account service health means AWS
can update those affected customers more directly
• When performance is degraded, status is important for ops to show evidence that it isn’t a problem with our software
The future of improved AWS visibilityLooking toward the horizon
iRobot 2017 | 62@ben11kehoe
Serverless organizations
6
iRobot 2017 | 63@ben11kehoe
•Heed the warning
• Information flow is different in serverless architecture
•Organization must change for architecture to succeed
Conway's Law
https://commons.wikimedia.org/wiki/File:Gospers_glider_gun.gif
iRobot 2017 | 64@ben11kehoe
•Servers → serverless is like on-prem → cloud
•Easier overall and in most respects• But also includes new challenges
•Outsourcing doesn’t mean you do zero work
•Being clear about this organizationally is critical
DiffOps
iRobot 2017 | 65@ben11kehoe
•No provider is immune to problems
•Small effects are more common than big outages
•More services = blips could be encountered more frequently
•This comes with the territory• Set expectations internally
• Architecting robustly is key
The cloud has weather
iRobot 2017 | 66@ben11kehoe
• You only know what the provider tells you• Architecture• Security• Operations
• How do they actually do all of the stuff they do?
• Many known unknowns and unknown unknowns
• Unknown unknown unknowns: what you don’t know that they don’t know they don’t know
Visibility
iRobot 2017 | 67@ben11kehoe
•First: gather data
•Root cause: our code or platform?
•Own the impact to your customers
•Diagnose your applications' handling of incident• Live and postmortem
•Aftermath
Reacting to incidents
iRobot 2017 | 68@ben11kehoe
Summing up
7
iRobot 2017 | 69@ben11kehoe
Serverless is:
•Cheaper
•Faster
•Leaner
•Better
Serverless
Serverless-ness goes with:
•Service-full + emphemeral compute
•Resources billed → resources used
•Smaller, more abstract control plane
iRobot 2017 | 70@ben11kehoe
•Successfully transitioned from turnkey to application built on public cloud
•Skipped learning to build elastic cloud infrastructure
•Fully serverless production application
Serverless at iRobot
iRobot 2017 | 71@ben11kehoe
•Serverless deployment is still not a solved problem
•Call graph → component graph
•Visibility is the biggest operations obstacle
Lessons learned
iRobot 2017 | 72@ben11kehoe
•Conway's Law
•Cloud has weather
•Set expectations
•Focus on TCO
Serverless organizations
Questions?