HAB Software Woes
-
Upload
jgrahamc -
Category
Technology
-
view
3.341 -
download
3
description
Transcript of HAB Software Woes
HAB Software WoesJohn Graham-CummingSeptember 2012
Or “My capsule didn’t crash but my software did”
Background> 30 years of
programming experience
One HAB flight◦ GAGA-1
http://blog.jgc.org/2011/04/gaga-1-flight.html
https://github.com/jgrahamc/gaga
Where’s your flight’s complexity?Example: GAGA-1
◦One balloon, parachute, polystyrene box◦Many metres of cord attached with knots◦An off-the-shelf camera
◦2,836 lines of code◦Common to see defect rates of 2 to 4 per
KLOC◦So GAGA-1 likely has 5 to 10 errors in it
Real Stuff Seen on HAB flightsComplete computer crashAltitude going negativeLatitude and longitude garbledCutdown triggered in back of carLong periods of no transmissionNot setting the GPS up before launchNot turning the camera onRunning out of camera disk spaceAltitude jumping around rhythmically
The Curse and Joy of DeterminismComputers do what you tell them
to◦Precisely what you tell them to◦Not what you think you told them to
doA Curse
◦Will do things you don’t expect◦Will process bogus input without
complaintThe Joy
◦Easy to test that it does what’s expected
HAB Is A Harsh EnvironmentColdVibrationStuff breaks in flight
Software needs to be able to cope with failing hardware
Very important to think about failure modes
YOUR CODE IS ON ITS OWN OUT THERE
Deadly SinsThe “It works!” FallacyThe Last Minute ChangeBeing Far Too CleverOverlooking Odd BehaviourCopying Other People’s CodeAssuming Finding A Bug Solves
The Problem
The “It works!” FallacyIf you’re an inexperienced (and
sometimes experienced) programmer…◦You hack some code together◦It works once◦You assume it will always work
Only solution to this is◦Testing◦Paranoia
The Last Minute ChangeNever, ever change anything in
code at the last minute no matter how simple.
Example: HABE 1◦Complete camera failure◦Maximum integer size in uBASIC on
CHDK is 999,999◦Last minute change of integer from
600,000 to 1,000,000 caused total failure
Being Far Too CleverExample: GAGA-1
◦Entered the wrong value of 2 * pi in code to do GPS position conversion from radians to degrees
◦Caught before flight because I verified the location of my own back garden
◦Note to self: 2 * pi != 6.2818.
https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/gps.cpp#L113
Overlooking Odd BehaviourExample: GAGA-1
◦ In tests RTTY output was fine some of the time, garbled at other times
◦Turned out to be interrupts from the GPS messing up the RTTY timing
◦Solution: disable GPS serial interface while sending RTTY string
ALWAYS BE HONEST WITH YOURSELF ABOUT YOUR CODE
EXPECT THE SPANISH INQUISITION!
https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/tsip.cpp#L229
Copying Other People’s CodeDon’t do this, you have no idea
what you are copying or who they copied it from
Better practice is to look at other people’s code and…◦Write your own version◦That you understand◦That you are able to test◦Example: GAGA-1
Read lots of people’s RTTY code, wrote my ownhttps://github.com/jgrahamc/gaga/blob/master/gaga-1/
flight/gaga1/rtty.cpp
APRS Tracker using copied code
If the altitude in metres contained an 8 or a 9 the altitude reported would be wrong
http://sharon.esrac.ele.tue.nl/users/pe1rxq/aprstracker/aprstracker.html
Assuming Finding The Bug Solves The ProblemJust because you’ve found A bug
doesn’t mean it was THE bugLots of research in computer
science shows bugs tend to cluster
Example: CLOUD1, CLOUD2◦Three bugs in printing latitude,
longitude and altitude◦One fixed on CLOUD1, …
“The One Thing I Didn’t Test”
http://ukhas.org.uk/guides:common_coding_errors_payload_testing
Common problems with uCLack of floating point supportSmall integers
You might never be a great programmer…
… but you can be a paranoid tester!
Good Things To DoNo infinite loopsSelf-CheckingUnexpected Error HandlingHandle ExceptionsSimulationSimplify, Simplify, SimplifyUnit TestWrite Log Files
No Infinite LoopsNever sit in a loop waiting foreverExample: ATLAS 3while (1) { // Make sure data is available to read if (Serial.available()) { b = Serial.read(); if(bytePos == 8){ navmode = b; return true; } bytePos++; } // Timeout if no valid response in 3 seconds if (millis() - startTime > 3000) { navmode = 0; return false; } }}
https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L211
Self-Checking-- Now enter a self-check of the manual mode settings
log( "Self-check started" )
assert_prop( 49, -32764, "Not in manual mode" )assert_prop( 5, 0, "AF Assist Beam should be Off" )assert_prop( 6, 0, "Focus Mode should be Normal" )assert_prop( 8, 0, "AiAF Mode should be On" )assert_prop( 21, 0, "Auto Rotate should be Off" )assert_prop( 29, 0, "Bracket Mode should be None" )assert_prop( 57, 0, "Picture Mode should be Superfine" )assert_prop( 66, 0, "Date Stamp should be Off" )assert_prop( 95, 0, "Digital Zoom should be None" )assert_prop( 102, 0, "Drive Mode should be Single" )assert_prop( 133, 0, "Manual Focus Mode should be Off" )assert_prop( 143, 2, "Flash Mode should be Off" )assert_prop( 149, 100, "ISO Mode should be 100" )assert_prop( 218, 0, "Picture Size should be L" )assert_prop( 268, 0, "White Balance Mode should be Auto" )assert_gt( get_time("Y"), 2009, "Unexpected year" )assert_gt( get_time("h"), 6, "Hour appears too early" )assert_lt( get_time("h"), 20, "Hour appears too late" )assert_gt( get_vbatt(), 3000, "Batteries seem low" )assert_gt( get_jpg_count(), ns, "Insufficient card space" )
https://github.com/jgrahamc/gaga/blob/master/gaga-1/camera/gaga-1.lua#L96
Self-CheckingExample: ALTAS 3Makes sure uBlox GPS will work
at high altitude; fixes it if not if((count % 10) == 0) { digitalWrite(6, LOW); checkNAV(); delay(1000); if(navmode != 6){ setupGPS(); delay(1000); } checkNAV(); delay(1000); digitalWrite(6, HIGH); }
https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L342
Unexpected Error Handlingdef temperature(): t = at.cmd( 'AT#TEMPMON=1' )
# Command returns something like: # # #TEMPMEAS: 0,28 # # OK # # So split on whitespace first to isolate the temperate 0,28 # and then split on comma to get the temperature
w = t.split() if len(w) < 2: logger.log( "Temperature read returned %s" % t ) return -1000 m = w[1].split(',') if len(m) != 2: logger.log( "Temperature read returned %s" % t ) return -1000 else: return int(m[1])
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/util.py
Handle ExceptionsIf your language can generate
exceptions then you’d better handle them!
Example: GAGA-1◦Recovery computer used Python◦Exception could have killed it◦Global exception handler
Bonus: What’s wrong with that code?
except: logger.log( "Caught exception in main loop: %s" % sys.exc_info()[1] )
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/gaga-1.py#L144
SimulationSimulate a flightExample: UKHAS wiki has
example of using a PC as a fake GPS
Example: GAGA-1◦To test the embedded Telit module
wrote modules that faked the entire Telit Python interface.
http://www.ukhas.org.uk/guides:common_coding_errors_payload_testing
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/GPS.py
https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/MDM.py
Simplify, Simplify, SimplifyMake your code as simple as
possibleNever have ‘duplicated’ or ‘copy
and paste’ codeBreak it up into small functions
that you understandMake sure you understand the
limitations of the functions you call
Unit TestBreak your program up into
small, separate functionsWrite tests that call that function
and make sure it does what you expect.
Lots of ways to do this◦Use something like cpptest◦ArduinoUnit◦Write your own test program
Unit Test ExampleIn the bad APRS programTurn metres to feet code into a
separate function: int m_to_f(int m)assertEquals(m_to_f(1000),3300)assertEquals(m_to_f(2000),6600)assertEquals(m_to_f(3000),9900)assertEquals(m_to_f(4000),13200)assertEquals(m_to_f(5000),16500)assertEquals(m_to_f(6000),19800)assertEquals(m_to_f(7000),23100)assertEquals(m_to_f(8000),26400)assertEquals(m_to_f(9000),29700)assertEquals(m_to_f(10000),33000)
Write Log FilesWrite detailed log files to non-
volatile memory for post flight debugging
Data sent via RTTY or APRS is limited
Log exceptions and errors in detail
Make sure you have a timestamp
Perform system testingTest your entire system before flight
◦Put your tracker in the garden◦Get a GPS lock◦Listen to the RTTY on your radio◦Look at the decoded RTTY on your
computer◦Test uploaded data on the tracker*
◦*I didn’t do that step, on the day people had to fix the tracker for me.