Formally Verifying a File System: A Successful Failure CSCI-P515/P415 Spring 2008 Michael Adams...
-
Upload
scott-spencer -
Category
Documents
-
view
217 -
download
0
Transcript of Formally Verifying a File System: A Successful Failure CSCI-P515/P415 Spring 2008 Michael Adams...
Formally Verifying a File System: A Successful Failure
CSCI-P515/P415Spring 2008
Michael Adams ([email protected])Joseph Near ([email protected])
Aaron Kahn ([email protected])
OverviewOverview
MotivationHigh Level DesignApproachMinor Difficulties (and their solutions)Major (Fatal) Difficulty (and
explanation)The Proposed SolutionRecap/Summation
MotivationMotivation
Our goal for this project was to attempt to formally verify a file system◦We were under the impression that this
would be a straight forward task, and as long as the abstraction was simple, there wouldn't be any major problems
LimitationsLimitations
Are doing:◦Can take a file number, and read/write to
it◦Create/Delete files
Not doing:◦Directories◦File Names◦Permissions, Users, Groups, etc
The stuff we're not doing can be added as an abstraction on what we are doing
DesignDesign
Develop a B-tree Structure◦The B-tree is actually serialized onto a
disk Disk represented as an array of bytes
Create the B-tree algorithms ◦insert, delete, lookup
Write the File System (read file, write file, create file, etc) algorithms in terms of the B-tree algorithms.
ProcessProcess
Initially, we wrote the code in Scheme in order to have a fully working model of “live” code to test on, and then translated it in to PVS
In PVS, the file system was abstracted all the way down to a disk representation to allow for better simulation of real problems of writing file systems◦This turned out to be essential to our learning
the difficulties of actually verifying a file system
Additional StructuresAdditional Structures
In addition to the B-tree, we found that these auxiliary structures were needed◦A free list◦Blocks that represent files themselves,
but are not part of the B-tree◦Single block that holds all of the pointers
to the root of the free list and the root of the B-tree (similar to a meta-data block)
AccomplishmentsAccomplishments
B-tree in Scheme◦Thoroughly tested
Were able to successfully translate our code into PVS.
Made a number of discoveries in terms of tricks for proving the algorithms in PVS◦However, very late in the game, we
discovered a fatal limitation of how we modelled things in PVS Have ideas for overcoming the problems in the
future
Minor ProblemsMinor Problems
In a large project, there are many minor problems that are surprisingly difficult to solve
These often require the development of a simple but non-obvious trick
We ran into and solved many of these; here is a sample of what we learned◦More detail included in report
SearchSearchsearch(array, start, stop, val)Search through a sorted array for the
first value greater than or equal to the argument; return the position of that value
If no element is greater than the argument, return the length of the array
Unexpectedly difficult to proveMeasured induction on stop – startEnded up using max(0, stop – start)Lesson: make sure measure is well-
founded; sometimes making it well-founded works
Well-formednessWell-formedness
Designed as part of our testing; believed to be an important part of the proof
Theory: algorithms are correct if they have the desired effect and the disk remains well-formed
Assuming a well-formed disk should give us a basis for proving correctness of our operations
Proved that a newly-formatted disk is well-formed
Partially proved that allocation preserves well-formedness
Well-formednessWell-formedness
Realization: well-formedness is irrelevant!
Well-formedness is defined by the observer (in this case, lookup)◦lookup(key, insert(key, value, disk)) = value
If the observer can correctly interpret the data given to it, then that data is well-formed
Lesson: don't waste time proving things about well-formedness
Proving Proving insertinsert
Many uses of let due to state-passing style◦Exponential blowup of expression size◦Sequents become pages long!
Side effects make proofs difficult◦When an object is effected, the sequent
clauses no longer apply, even if the change doesn't affect them
◦User has to prove that the sequent clauses still apply
Main Problem: Side EffectsMain Problem: Side Effects
State Passing StyleGood for modelling state
◦ Easy to implement, familiarBad for Proving!
The problem with side effectsThe problem with side effects
Effects Invalidate AssumptionsGiven a property about a disk, we
need to prove the same property about a modified disk
Example:◦If P(disk) then P(write_block(block, disk))◦Even if the effect does not affect P, we
have to prove that P still holds◦This makes sense: it does not hold
automatically!
Obvious solution: Hoare Obvious solution: Hoare LogicLogicSubstitution enforces separation of
variables◦So P(x) => P(x) automatically as long as x
isn't effectedRed herring: this only helps if we use
Abstract Data TypesWe serialize our ADT into a single disk
object◦Side-effecting one part will side-effect all
parts, even if we use Hoare Logic
Naive SolutionNaive Solution
Prove that side effecting one part of the B-Tree, Free list, etc doesn't effect assumptions about other parts of the disk
Possible, but Impractical◦For every algorithm
For every effect For every clause of the sequent
Must prove that the assumption still holds after the effect
◦A few such basic proofs were accomplished But even they were long and easy to get lost in
What we want from a better What we want from a better solutionsolutionWe want to write ADT style codeWe want to write ADT style proofsWe want to push a button and have
◦Serialized style code◦Serialized style proofs
Is it Possible???
What a solution would look What a solution would look likelikeSerialization Theorems
◦Example: deserialize(serialize(n)) = n◦Fairly easy to prove
Already done Even grind could do it
Proof that changing one value doesn't effect other values◦Hmm...
Proof of effect independenceProof of effect independence
Language Run-Time for ADT is already doing this◦Objects are serialized to memory
Language Run-Time Limitations◦Language vs Programmer control of
serialization◦The Garbage collector
Known Hard Problem Bad Idea on a Hard Disk
How to avoid GCHow to avoid GC
We don't need general GCSide-effect view:
◦Values only “modified” if only reference Or not reachable from values used in theorems
ADT view:◦Values only “allocated” if we are
“freeing” another valueSolution: ...
Linear Types!!!
What are Linear Types?What are Linear Types?
Objects must always have exactly one reference◦No duplication◦No erasure
No GC needed◦Look Ma, No Garbage!◦“Modifying” something is “de-alloc” plus
“alloc”Our algorithms already treat objects as
linearJust need to teach PVS to take advantage
of that
Linear Types vs. MonadsLinear Types vs. Monads
Lost the battle of representing state to monads
Maybe could win the war for formal proofs
Pros and Cons◦Monads are more General
Non-determinism, environments, etc.
◦Linear Types provide more guarantees A reference to a linearly typed object is
guaranteed to be the only reference
RecapRecap
File Systems are Full of Bugs◦But it is critical that they be right◦Verification could fix this
We designed and implemented a File System◦B-Tree based◦Modelled all the way to “disk”◦Auxiliary structures needed
Free List File Blocks Root File System Block
RecapRecap
We proved linear search◦Lesson: Make sure measures are well-founded◦Lesson: Make measures well-founded if they
aren'tWell-formedness
◦Red-herring◦Actually defined by observers
Exponential blow-up due to let◦Possible Improvement in how PVS presents
sequents
RecapRecap
Side effects are hard in an unexpected way◦Implementing side effects in PVS is easy
Use State Passing Style (e.g. State monad)
◦Proving side effects in a serialized common store is hard Must prove that every effect keeps the
theorems trueNumber of Proofs exploded beyond
our ability
RecapRecap
Linear Types to the Rescue!!◦User writes ADT style proofs◦System converts them to serialized
proofs◦Better than Monads
Need Theory for Linear Types in PVS
Final ResultsFinal Results
Ultimately had to declare failure◦Code is fragmentary
But learned more from failure than success◦Main deliverable is report and what not to do◦We have good ideas for how to make future
attempts... and we don't feel too bad because
others have estimated verifying a file system to take 2-3 years to accomplish.◦A mini-challenge: build a verifiable filesystem.
Rajeev Joshi, Gerard J. Holzmann