Dev ops and safety critical systems
Transcript of Dev ops and safety critical systems
DevOps and Safety
Critical Systems
LEN BASS
Overview
DevOps: What and why
Architecting for Continuous Deployment
Basis for Partial Continuous Deployment
Partial Continuous Deployment
copyright 2015 Len Bass
Is DevOps for you?
DevOps is a set of practices intended to reduce
the time to market for new features.
Question: How much are you willing to pay to
reduce the time to market for your systems?
Installing DevOps practices takes time and
people.
copyright 2015 Len Bass
Partial Continuous Deployment
I will propose something I am calling “partial
continuous deployment”. It involves
Rearchitecting an existing system
Utilizing formal methods to verify the isolation
of safety critical portions.
Convincing yourself and regulators the system
is as safe as existing systems.
copyright 2015 Len Bass
Traditional development
copyright 2015 Len Bass
Board or
marketing
has idea
Developers implement
Operators place
in production
Time
As Software Engineers our view is that there are the
following activities in software development
Requirements
Design
Implementation
Test
Code Complete
Different methodologies will organize these activities in
different ways.
Agile focuses on getting to Code Complete faster than
with other methods.
Where Does the Time Go?
6
Developers
implement
copyright 2015 Len Bass
What is wrong?
Code Complete Code in Production
Between the completion of the code and the
placing of the code into production is a step
called: Deployment
Deploying completed code can be very time
consuming because of concern about errors
that could occur.
7copyright 2015 Len Bass
Deployment pipeline - build
Developer creates and tests code on local machine.
Checks code into a version control system
Continuous integration server (CI) builds the system and
runs a series of integration tests.
copyright 2016 Len Bass
Pre-commit tests
XBuild Image and Perform Integration
tests
UAT / staging / performance
tests
Deploy to production
Commit
...
Pre-commit tests
Commit
Developers
promote to normal
production
Deployment pipeline – staging
and production
After passing the tests, the system is promoted
to a staging environment where it undergoes
more tests including performance, security,
and user acceptance tests.
After passing staging tests, the system is
promoted to provisional production where it
undergoes even more tests.
The system is finally promoted to normal
production but the tests do not necessarily
stop.
© Len Bass
2015
Errors can be discovered at any stage
in the pipeline
Every error must either be corrected or prevented.
Preventing errors can be done through some combination
of
Process
Architecture
Tooling
Coordination among teams.
Coordination takes time.
Correcting errors takes time
copyright 2015 Len Bass
Goal of DevOps
The goal of DevOps is to reduce the time to
market without compromising quality by
Reducing the number of errors that occur
during the placing of your code into
production
Reducing the time for correcting errors that
occur
Minimizing the necessity for coordination
among teams
copyright 2015 Len Bass
DevOps is a set of practices intended to reduce the time
between committing a change to a system and the change
being placed into normal production, while ensuring high
quality.*
DevOps practices involve developers and operators’
processes, architectures, and tools.
DevOps is also a movement – like agile
*DevOps: A Software Architect’s Perspective
What is DevOps?
12TEAR DOWN THAT
WALL!!
Categories of DevOps Practices
1. Make Dev more responsible for incident handling
2. Enforce deployment practices uniformly across both dev
and ops
3. Use continuous deployment
4. Develop infrastructure code using same processes as
application code
13copyright 2015 Len Bass
Overview
DevOps: What and why
Architecting for Continuous Deployment
Basis for Partial Continuous Deployment
Partial Continuous Deployment
copyright 2015 Len Bass
Goal of Continuous Deployment
Allow developers to deploy to production
without the necessity for coordination. I.e. an
individual commit can go into production
regardless of the state of other development
activities.
All tests are automated and system is promoted
from one stage to another in the deployment
pipeline when it passes test.
copyright 2015 Len Bass
Application to safety critical systems
Automated testing is inadequate for safety
critical systems.
Proposal: Only manually test the safety critical
portions of the system. Other portions can have
automated testing. Safety critical portions are a
small percentage of total system
copyright 2015 Len Bass
Wait just a minute!!
Question: How can you be sure that non safety
critical portions do not have an impact on the
safety critical portions?
Answer: I will get to that.
copyright 2015 Len Bass
Architecting for continuous
deployment
Base your system on “microservice architecture” style.
A microservice architecture is
A collection of independently deployable processes
Packaged as services
Communicating only via messages
It is a stripped down version of Service Oriented Architecture (SOA)
copyright 2015 Len Bass
~2002 Amazon instituted the
following design rules - 1
All teams will henceforth expose their data and functionality through service interfaces.
Teams must communicate with each other
through these interfaces.
There will be no other form of inter-process
communication allowed: no direct linking, no
direct reads of another team’s data store, no
shared-memory model, no back-doors
whatsoever. The only communication
allowed is via service interface calls over the
network.
19
Amazon design rules - 2
It doesn’t matter what technology they[services] use.
All service interfaces, without exception, must be
designed from the ground up to be externalizable.
Amazon is providing the specifications for the
“Microservice Architecture”.
20
In Addition
Amazon has a “two pizza” rule.
No team should be larger than can be fed with two pizzas (~7
members).
Each (micro) service is the responsibility
of one team
This means that microservices are
small and intra team bandwidth
is high
Large systems are made up of many microservices.
There may be as many as 140 in a typical Amazon page.21
Micro service architecture
22
Service
Each user request is satisfied by some
sequence of services.
Most services are not externally
available.
Each service communicates with
other services through service
interfaces.
Service depth may
– Shallow (large fan out)
– Deep (small fan out, more
dependent services)
How does Microservice Architecture
reduce requirements for coordination?
Coordination decisions can be made
incrementally as system evolves or
be built into the architecture.
Microservice architecture builds most coordination
decisions into architecture
Consequently they only need to be made once for a
system, not once per release.
copyright 2015 Len Bass
Is Microservice Architecture sufficient
for continuous deployment?
No. There are other architectural techniques that should
be used.
See http://www.slideshare.net/lenbass/deployability for
more information
copyright 2015 Len Bass
Overview
DevOps: What and why
Architecting for Continuous Deployment
Basis for Partial Continuous Deployment
Partial Continuous Deployment
copyright 2015 Len Bass
Partial Continuous Deployment
Identify and isolate safety critical portions of an
architecture
Use continuous deployment for non safety
critical portions
Use traditional testing methods for safety critical
portions
copyright 2015 Len Bass
Based on two past efforts
Smart Grid security controls
Hardening the deployment pipeline
copyright 2015 Len Bass
Smart Grid Security Controls
ASAP SG was a public private effort to accelerate the adoption of security for smart grid technologies.
50% government – SEI, Oak Ridge National Lab
50% private – American Electric Power, Consumers
Energy, Florida Power & Light, Southern California
Edison
Operated under the auspices of UCA International Users
Group
copyright 2015 Len Bass
ASAP SG output
ASAP produced “security profiles” for various portions of the Smart Grid.
The process was
Produce a logical architecture through identifying
Roles within the system
Use cases
Communication topology
Use this logical architecture to identify controls to mitigate vulnerabilities
Process documented in http://osgug.ucaiug.org/utilisec/Shared%20Documents/Security%20Profile%20Blueprint/Security_Profile_Blueprint_-_v1_0_-_20101006.pdf
copyright 2015 Len Bass
Wide Area Management and Control
Communications Topology
copyright 2015 Len Bass
Application to partial continuous
deployment
Observe that in the communications topology
there is no discussion of electric functions, billing
function, or most of the functions of the system.
The focus is on places where security might be
compromised.
In partial continuous deployment, there is a a
step to identify a logical architecture that has
roles with safety critical functions.
copyright 2015 Len Bass
Hardening Deployment Pipeline
PhD research of Paul Rimba who received his PhD
(Building High Assurance Secure Applications using
Security Patterns for Capability-based Platforms) from Univ
New South Wales in 2016
He examined the Jenkins build server from the perspective
of security
This work reported in
https://www.computer.org/csdl/proceedings/releng/2015
/7070/00/7070a004-abs.html
copyright 2015 Len Bass
Process for hardening Jenkins
1. Identify security requirements
2. Create logical architecture
3. Use model checking to identify which components must
be trustworthy from a security perspective
4. Can these components really be trusted
1. Yes. – Done
2. No. – refactor these components into smaller pieces.
5. Repeat from step 3.
copyright 2015 Len Bass
Output of process
Set of components that deserve to be trusted
Verification that with these trusted components, the
architecture is, in fact, secure.
Hardened Jenkins architecture
copyright 2015 Len Bass
AWS OpsWorks
Pull application source code from
repository
Deploy image to Testing/
Production environment on AWS OpsWorks
Pull image from Image storage,
verify image checksum
a) Testing Environment:
Run application testsBuild application
artifacts
Build Image containing
application and its dependencies
Verify image creation,
compute image checksum
Push image to Image storage
Application code
repository (GitHub)
Image storage
(Amazon S3)
Artifact Builder
Image Builder Image Verifier Image Archiver
Run Chef recipe to deploy image to OpsWorks VM
instances
b) Production Environment:
App start serving requests
All tests passed?
Application code
committed to repository
New app version deployed
to production
Deploy to?
Infrastructure-as-Code repository
(GitHub)
Image specifications
Opscode Chef Recipes
Run unit tests on source code
Trigger each step of build sequence
Code RetrieverOrchestrator Unit Tester
Deployer Tru
sted
en
viro
nm
ent
Un
tru
sted
en
viro
nm
ent
Operatornotifiedabouttest
failure No
Yes
Application to partial continuous
deployment
Explicit identification of security requirements
Use of model checking to identify trustworthy
components
Determination of whether trustworthy
components should be trusted.
copyright 2015 Len Bass
Overview
DevOps: What and why
Architecting for Continuous Deployment
Basis for Partial Continuous Deployment
Partial Continuous Deployment
copyright 2015 Len Bass
Partial Continuous Deployment
Process
1. Explicitly state safety requirements. E.g. through FMEA
2. Create logical architecture for target system
3. Use model checking of architecture to identify
components that must be safe for system to be safe.
4. Refactor architecture until safe components are “sufficiently small”
5. Use continuous deployment for components that may
be unsafe
6. Test safe components in normal fashion.
copyright 2015 Len Bass
Caveat
Partial continuous deployment is a proposal.
It has never been tested or implemented
copyright 2015 Len Bass
Gates to implementation (technical)
1. Choose existing system to replicate
2. Make explicit safety requirements
3. Create logical architecture for existing system
4. Model check logical architecture to determine components that are
required to be safe
5. Refine these components until they are as small as possible.
6. Refactor small number of remaining components into microservice
architecture
7. Create test cases for components that are not required to be safe
8. Set up deployment pipeline
9. Implement modified components
10. Manually test components that are required to be safecopyright 2015 Len Bass
Gates to implementation (non-
technical)
Convince regulators that dividing architecture
into one portion required to be safe and
another portion not required to be safe is viable
strategy
Run test system in parallel with actual system in
order to track problems and compare behavior.
copyright 2015 Len Bass
Summary
DevOps is a set of practices intended to reduce
time to market
Continuous deployment is one such practice
Partial continuous deployment is a proposal to
adapt continuous deployment to safety critical
systems
The path to production of partial continuous
deployment requires convincing regulators of
safety of resulting system.
copyright 2015 Len Bass
More Information
Contact [email protected]
DevOps: A Software Architect’s
Perspective is available from your favorite bookseller
42