Dev ops and safety critical systems

42
DevOps and Safety Critical Systems LEN BASS

Transcript of Dev ops and safety critical systems

Page 1: Dev ops and safety critical systems

DevOps and Safety

Critical Systems

LEN BASS

Page 2: Dev ops and safety critical systems

Overview

DevOps: What and why

Architecting for Continuous Deployment

Basis for Partial Continuous Deployment

Partial Continuous Deployment

copyright 2015 Len Bass

Page 3: Dev ops and safety critical systems

Is DevOps for you?

DevOps is a set of practices intended to reduce

the time to market for new features.

Question: How much are you willing to pay to

reduce the time to market for your systems?

Installing DevOps practices takes time and

people.

copyright 2015 Len Bass

Page 4: Dev ops and safety critical systems

Partial Continuous Deployment

I will propose something I am calling “partial

continuous deployment”. It involves

Rearchitecting an existing system

Utilizing formal methods to verify the isolation

of safety critical portions.

Convincing yourself and regulators the system

is as safe as existing systems.

copyright 2015 Len Bass

Page 5: Dev ops and safety critical systems

Traditional development

copyright 2015 Len Bass

Board or

marketing

has idea

Developers implement

Operators place

in production

Time

Page 6: Dev ops and safety critical systems

As Software Engineers our view is that there are the

following activities in software development

Requirements

Design

Implementation

Test

Code Complete

Different methodologies will organize these activities in

different ways.

Agile focuses on getting to Code Complete faster than

with other methods.

Where Does the Time Go?

6

Developers

implement

copyright 2015 Len Bass

Page 7: Dev ops and safety critical systems

What is wrong?

Code Complete Code in Production

Between the completion of the code and the

placing of the code into production is a step

called: Deployment

Deploying completed code can be very time

consuming because of concern about errors

that could occur.

7copyright 2015 Len Bass

Page 8: Dev ops and safety critical systems

Deployment pipeline - build

Developer creates and tests code on local machine.

Checks code into a version control system

Continuous integration server (CI) builds the system and

runs a series of integration tests.

copyright 2016 Len Bass

Pre-commit tests

XBuild Image and Perform Integration

tests

UAT / staging / performance

tests

Deploy to production

Commit

...

Pre-commit tests

Commit

Developers

promote to normal

production

Page 9: Dev ops and safety critical systems

Deployment pipeline – staging

and production

After passing the tests, the system is promoted

to a staging environment where it undergoes

more tests including performance, security,

and user acceptance tests.

After passing staging tests, the system is

promoted to provisional production where it

undergoes even more tests.

The system is finally promoted to normal

production but the tests do not necessarily

stop.

© Len Bass

2015

Page 10: Dev ops and safety critical systems

Errors can be discovered at any stage

in the pipeline

Every error must either be corrected or prevented.

Preventing errors can be done through some combination

of

Process

Architecture

Tooling

Coordination among teams.

Coordination takes time.

Correcting errors takes time

copyright 2015 Len Bass

Page 11: Dev ops and safety critical systems

Goal of DevOps

The goal of DevOps is to reduce the time to

market without compromising quality by

Reducing the number of errors that occur

during the placing of your code into

production

Reducing the time for correcting errors that

occur

Minimizing the necessity for coordination

among teams

copyright 2015 Len Bass

Page 12: Dev ops and safety critical systems

DevOps is a set of practices intended to reduce the time

between committing a change to a system and the change

being placed into normal production, while ensuring high

quality.*

DevOps practices involve developers and operators’

processes, architectures, and tools.

DevOps is also a movement – like agile

*DevOps: A Software Architect’s Perspective

What is DevOps?

12TEAR DOWN THAT

WALL!!

Page 13: Dev ops and safety critical systems

Categories of DevOps Practices

1. Make Dev more responsible for incident handling

2. Enforce deployment practices uniformly across both dev

and ops

3. Use continuous deployment

4. Develop infrastructure code using same processes as

application code

13copyright 2015 Len Bass

Page 14: Dev ops and safety critical systems

Overview

DevOps: What and why

Architecting for Continuous Deployment

Basis for Partial Continuous Deployment

Partial Continuous Deployment

copyright 2015 Len Bass

Page 15: Dev ops and safety critical systems

Goal of Continuous Deployment

Allow developers to deploy to production

without the necessity for coordination. I.e. an

individual commit can go into production

regardless of the state of other development

activities.

All tests are automated and system is promoted

from one stage to another in the deployment

pipeline when it passes test.

copyright 2015 Len Bass

Page 16: Dev ops and safety critical systems

Application to safety critical systems

Automated testing is inadequate for safety

critical systems.

Proposal: Only manually test the safety critical

portions of the system. Other portions can have

automated testing. Safety critical portions are a

small percentage of total system

copyright 2015 Len Bass

Page 17: Dev ops and safety critical systems

Wait just a minute!!

Question: How can you be sure that non safety

critical portions do not have an impact on the

safety critical portions?

Answer: I will get to that.

copyright 2015 Len Bass

Page 18: Dev ops and safety critical systems

Architecting for continuous

deployment

Base your system on “microservice architecture” style.

A microservice architecture is

A collection of independently deployable processes

Packaged as services

Communicating only via messages

It is a stripped down version of Service Oriented Architecture (SOA)

copyright 2015 Len Bass

Page 19: Dev ops and safety critical systems

~2002 Amazon instituted the

following design rules - 1

All teams will henceforth expose their data and functionality through service interfaces.

Teams must communicate with each other

through these interfaces.

There will be no other form of inter-process

communication allowed: no direct linking, no

direct reads of another team’s data store, no

shared-memory model, no back-doors

whatsoever. The only communication

allowed is via service interface calls over the

network.

19

Page 20: Dev ops and safety critical systems

Amazon design rules - 2

It doesn’t matter what technology they[services] use.

All service interfaces, without exception, must be

designed from the ground up to be externalizable.

Amazon is providing the specifications for the

“Microservice Architecture”.

20

Page 21: Dev ops and safety critical systems

In Addition

Amazon has a “two pizza” rule.

No team should be larger than can be fed with two pizzas (~7

members).

Each (micro) service is the responsibility

of one team

This means that microservices are

small and intra team bandwidth

is high

Large systems are made up of many microservices.

There may be as many as 140 in a typical Amazon page.21

Page 22: Dev ops and safety critical systems

Micro service architecture

22

Service

Each user request is satisfied by some

sequence of services.

Most services are not externally

available.

Each service communicates with

other services through service

interfaces.

Service depth may

– Shallow (large fan out)

– Deep (small fan out, more

dependent services)

Page 23: Dev ops and safety critical systems

How does Microservice Architecture

reduce requirements for coordination?

Coordination decisions can be made

incrementally as system evolves or

be built into the architecture.

Microservice architecture builds most coordination

decisions into architecture

Consequently they only need to be made once for a

system, not once per release.

copyright 2015 Len Bass

Page 24: Dev ops and safety critical systems

Is Microservice Architecture sufficient

for continuous deployment?

No. There are other architectural techniques that should

be used.

See http://www.slideshare.net/lenbass/deployability for

more information

copyright 2015 Len Bass

Page 25: Dev ops and safety critical systems

Overview

DevOps: What and why

Architecting for Continuous Deployment

Basis for Partial Continuous Deployment

Partial Continuous Deployment

copyright 2015 Len Bass

Page 26: Dev ops and safety critical systems

Partial Continuous Deployment

Identify and isolate safety critical portions of an

architecture

Use continuous deployment for non safety

critical portions

Use traditional testing methods for safety critical

portions

copyright 2015 Len Bass

Page 27: Dev ops and safety critical systems

Based on two past efforts

Smart Grid security controls

Hardening the deployment pipeline

copyright 2015 Len Bass

Page 28: Dev ops and safety critical systems

Smart Grid Security Controls

ASAP SG was a public private effort to accelerate the adoption of security for smart grid technologies.

50% government – SEI, Oak Ridge National Lab

50% private – American Electric Power, Consumers

Energy, Florida Power & Light, Southern California

Edison

Operated under the auspices of UCA International Users

Group

copyright 2015 Len Bass

Page 29: Dev ops and safety critical systems

ASAP SG output

ASAP produced “security profiles” for various portions of the Smart Grid.

The process was

Produce a logical architecture through identifying

Roles within the system

Use cases

Communication topology

Use this logical architecture to identify controls to mitigate vulnerabilities

Process documented in http://osgug.ucaiug.org/utilisec/Shared%20Documents/Security%20Profile%20Blueprint/Security_Profile_Blueprint_-_v1_0_-_20101006.pdf

copyright 2015 Len Bass

Page 30: Dev ops and safety critical systems

Wide Area Management and Control

Communications Topology

copyright 2015 Len Bass

Page 31: Dev ops and safety critical systems

Application to partial continuous

deployment

Observe that in the communications topology

there is no discussion of electric functions, billing

function, or most of the functions of the system.

The focus is on places where security might be

compromised.

In partial continuous deployment, there is a a

step to identify a logical architecture that has

roles with safety critical functions.

copyright 2015 Len Bass

Page 32: Dev ops and safety critical systems

Hardening Deployment Pipeline

PhD research of Paul Rimba who received his PhD

(Building High Assurance Secure Applications using

Security Patterns for Capability-based Platforms) from Univ

New South Wales in 2016

He examined the Jenkins build server from the perspective

of security

This work reported in

https://www.computer.org/csdl/proceedings/releng/2015

/7070/00/7070a004-abs.html

copyright 2015 Len Bass

Page 33: Dev ops and safety critical systems

Process for hardening Jenkins

1. Identify security requirements

2. Create logical architecture

3. Use model checking to identify which components must

be trustworthy from a security perspective

4. Can these components really be trusted

1. Yes. – Done

2. No. – refactor these components into smaller pieces.

5. Repeat from step 3.

copyright 2015 Len Bass

Page 34: Dev ops and safety critical systems

Output of process

Set of components that deserve to be trusted

Verification that with these trusted components, the

architecture is, in fact, secure.

Hardened Jenkins architecture

copyright 2015 Len Bass

AWS OpsWorks

Pull application source code from

repository

Deploy image to Testing/

Production environment on AWS OpsWorks

Pull image from Image storage,

verify image checksum

a) Testing Environment:

Run application testsBuild application

artifacts

Build Image containing

application and its dependencies

Verify image creation,

compute image checksum

Push image to Image storage

Application code

repository (GitHub)

Image storage

(Amazon S3)

Artifact Builder

Image Builder Image Verifier Image Archiver

Run Chef recipe to deploy image to OpsWorks VM

instances

b) Production Environment:

App start serving requests

All tests passed?

Application code

committed to repository

New app version deployed

to production

Deploy to?

Infrastructure-as-Code repository

(GitHub)

Image specifications

Opscode Chef Recipes

Run unit tests on source code

Trigger each step of build sequence

Code RetrieverOrchestrator Unit Tester

Deployer Tru

sted

en

viro

nm

ent

Un

tru

sted

en

viro

nm

ent

Operatornotifiedabouttest

failure No

Yes

Page 35: Dev ops and safety critical systems

Application to partial continuous

deployment

Explicit identification of security requirements

Use of model checking to identify trustworthy

components

Determination of whether trustworthy

components should be trusted.

copyright 2015 Len Bass

Page 36: Dev ops and safety critical systems

Overview

DevOps: What and why

Architecting for Continuous Deployment

Basis for Partial Continuous Deployment

Partial Continuous Deployment

copyright 2015 Len Bass

Page 37: Dev ops and safety critical systems

Partial Continuous Deployment

Process

1. Explicitly state safety requirements. E.g. through FMEA

2. Create logical architecture for target system

3. Use model checking of architecture to identify

components that must be safe for system to be safe.

4. Refactor architecture until safe components are “sufficiently small”

5. Use continuous deployment for components that may

be unsafe

6. Test safe components in normal fashion.

copyright 2015 Len Bass

Page 38: Dev ops and safety critical systems

Caveat

Partial continuous deployment is a proposal.

It has never been tested or implemented

copyright 2015 Len Bass

Page 39: Dev ops and safety critical systems

Gates to implementation (technical)

1. Choose existing system to replicate

2. Make explicit safety requirements

3. Create logical architecture for existing system

4. Model check logical architecture to determine components that are

required to be safe

5. Refine these components until they are as small as possible.

6. Refactor small number of remaining components into microservice

architecture

7. Create test cases for components that are not required to be safe

8. Set up deployment pipeline

9. Implement modified components

10. Manually test components that are required to be safecopyright 2015 Len Bass

Page 40: Dev ops and safety critical systems

Gates to implementation (non-

technical)

Convince regulators that dividing architecture

into one portion required to be safe and

another portion not required to be safe is viable

strategy

Run test system in parallel with actual system in

order to track problems and compare behavior.

copyright 2015 Len Bass

Page 41: Dev ops and safety critical systems

Summary

DevOps is a set of practices intended to reduce

time to market

Continuous deployment is one such practice

Partial continuous deployment is a proposal to

adapt continuous deployment to safety critical

systems

The path to production of partial continuous

deployment requires convincing regulators of

safety of resulting system.

copyright 2015 Len Bass

Page 42: Dev ops and safety critical systems

More Information

Contact [email protected]

DevOps: A Software Architect’s

Perspective is available from your favorite bookseller

42