CLR Reliability under Memory Exhaustion
description
Transcript of CLR Reliability under Memory Exhaustion
07/09/04 Windows Reliability Team 1
CLR Reliability under Memory Exhaustion
Solomon Boulos
07/09/04 Windows Reliability Team 2
Temporary Memory Exhaustion causes failures
• Out of Memory (OOM) is temporary• Shouldn’t cause failure
– Just wait for memory to become available– System take action to free up memory
• All managed code depends on CLR• Testing is difficult
– Exceptions are objects– Boxing (casting value type to object)– JIT compilation
07/09/04 Windows Reliability Team 3
Overview
• Previous Work– Reliability Working Group– Improvements for Whidbey
• OOM behavior– Everett (CLR v1.1)– Whidbey (CLR v2.0)– WinFX
• Solutions– Transactions– Recovery
07/09/04 Windows Reliability Team 4
Reliability Working Group
• Discussion of CLR reliability issues
• Interaction with Yukon and Avalon teams
• FailFast Behavior
• Controversial Decisions
• Fault Injection
07/09/04 Windows Reliability Team 5
Improvements for Whidbey
• CLR hardened to Out of Memory (OOM)
• Constrained Execution Regions (CERs)– Eagerly Prepared (No JIT Compiling)– Blocks ThreadAbort
• Reliability Contracts– Describes reliability attributes of code– Allows for function calls within CER
• Unhandled Exception Policy
07/09/04 Windows Reliability Team 6
My Approach
• Exhaust Memory (Not fault injection)
• Find failure points
• Consistently reproduce results
• Examine underlying causes
• Develop solutions
07/09/04 Windows Reliability Team 7
Everett OOM Behavior
• Different classes of failures– Catchable Out of Memory (OOM) Exception– Type Initialization Exception– Invalid Program exception from JIT compiler– Fatal OOM Error– Fatal Execution Engine error
07/09/04 Windows Reliability Team 8
Supporting Datavoid ManagedFunction(){
Regex* myReg = new Regex("*");
}Available Memory Observed Behavior
0-5860K Fatal Error
5892-5912K InvalidProgram
5924-5960K TypeInit
5890-Above Success
07/09/04 Windows Reliability Team 9
Fault Injection Examplestatic void Main(string[] args){try
{ // operations in here
}catch ( OutOfMemoryException ){Console.WriteLine(“Nothing should get past me.");}
}
07/09/04 Windows Reliability Team 10
Whidbey OOM Behavior
• See OOM Exception instead of– TypeInit– InvalidProgram
• Exception to Native host is COMPlusException– Not very helpful
• Fatal OOM only during initialization– Initialization can be large though (e.g. 10MB)
• CERs provide defense, but dangerous– CER { for (;;) } cannot be stopped
• Reliability Contracts = Honor System
07/09/04 Windows Reliability Team 11
• Swallows exceptions
• Shell– Crashes and restarts
• WinFS– Silent Process Failure
• Indigo– False Completion
WinFX Case Studies
Base OSBase OS
Whidbey
WinFX
07/09/04 Windows Reliability Team 12
Shell Failure
• Exhaust System Memory
• CLR throws OOM Exception
• Shell doesn’t catch
• Escalates to unhandled Win32 exception
• Shell crashes and restarts– Major disruption to user
07/09/04 Windows Reliability Team 13
WinFS Test
• Simple Contact Store Functions– AddContact– RenameContact– RemoveContact– ListContacts– ReachMemory
07/09/04 Windows Reliability Team 14
WinFS Test Normal Execution
• ListContacts() : “No Contacts Found”• AddContact(“Shane”) : Shane is added• ListContacts(): “Shane”• RenameContact(“Shane”, “Bob”): Shane is now
Bob• ListContacts(): “Bob”• RemoveContact(“Bob”): Bob is now deleted• ListContacts(): “No Contacts Found”
07/09/04 Windows Reliability Team 15
WinFS Test Stressed Execution
• ListContacts() : “No Contacts Found”
• ReachMemory(8MB): 8MB Available
• AddContact(“Shane”) : Shane should be added
• ListContacts(): “No Contacts Found”
• Process Exits
07/09/04 Windows Reliability Team 16
Indigo Test Specifications
• Client::SendMessage(): – Sends message to server and prints confirmation of
sending.
• Client::ReceiveMessage(): – Prints received message.
• Server::SendMessage(): – Sends message to client and prints confirmation of
sending.
• Server::ReceiveMessage(): – Prints message and responds with SendMessage()
07/09/04 Windows Reliability Team 17
Indigo Test Behavior
• Normal Execution– Client::SendMessage()– Server::ReceiveMessage()– Server::SendMessage()– Client::ReceiveMessage()
• Execution with Memory Pressure– Client::SendMessage()– Server::ReceiveMessage()– Server::ExhaustMemory()– Server::SendMessage()– Client never receives message
07/09/04 Windows Reliability Team 18
Solutions
• Transactions– In Memory– Durable (backed by disk)
• Recovery– Creates Recovery Log– Allows state restore
07/09/04 Windows Reliability Team 19
Transaction Participantpublic TransactionParticipant(String _originalValue)
{ originalValue = _originalValue;
result = originalValue;}
public void Prepare(IPreparingEnlistment pe){ // do work for transactionresult = "New Value";// all is well, vote preparedpe.Prepared();
}
07/09/04 Windows Reliability Team 20
Transaction Participant Continuedpublic void Commit(IEnlistment e){
// no work to do, vote done e.EnlistmentDone();}public void Rollback(IEnlistment e){
// restore originalValue result = originalValue; if ( null != e ) e.EnlistmentDone();}
07/09/04 Windows Reliability Team 21
Simple Transaction ExampleTransactionParticipant tp = new TransactionParticipant(txtInput.Text);
try
{
using (TransactionScope s = new TransactionScope()){
Transaction.Current.VolatileEnlist(tp,false);
s.Consistent = true;
}
}
catch (TransactionAbortedException){}
txtInput.Text = tp.Result;
07/09/04 Windows Reliability Team 22
rNotepad Techniques
• Log user work– KeyPressed Records– Resize Records
• Write work to log file every second
• Write checkpoint every 30 seconds
• Upon startup, recover– Checkpoint speeds up recovery
07/09/04 Windows Reliability Team 23
Conclusion
• Testing is difficult but possible
• Temporary memory pressure shouldn’t cause failures
• Transactions and Recovery can provide resilient and recoverable solutions
07/09/04 Windows Reliability Team 24
Questions?
• More info athttp://windows/sites/reliavuls/CLR/default.aspx