Exception+Logging=Diagnostics 2011
-
date post
17-Oct-2014 -
Category
Technology
-
view
1.112 -
download
0
description
Transcript of Exception+Logging=Diagnostics 2011
Excep&on Handling +
Logging =
Diagnos&cs
@paulogaspar7 (twi<er)
[email protected] (email and G+)
Diagnos&c mechanisms are a good investment…
• If you invest enough and wisely, you get plenty of return
Meaning that …
• If you invest enough &me on the right means to get good diagnos&cs, you will end up saving &me
Concepts
• Excep&ons are a communica&on mechanism
• Logging is one of the channels used to convey their informa&on
Excep&onal communica&on
Excep&ons…
• Are a possible ini&al step when communica&ng failures to: • Calling logic blocks • Users • Sysadmins
• Communica&ng: • Diagnos&cs informa&on
What informa&on to place on an Excep&on?
• Calling logic must have enough informa&on for the caller to decide what to do, including: • What type of error/failure • Relevant addi&onal data to:
• Take any further ac&on (recovery / cleanup) • Inform the sysadmin / programmer • Inform the user, if needed
• Text message should NOT target the user • That is the UI layer's responsibility
(But there should be enough informa&on for the UI layer to build a proper message to the user, if that can be the case)
Excep&on text messages should target the sysadmin / programmer
• Enough informa&on should be provided for the programmer to rapidly reproduce / diagnose the problem
• This informa&on must be logged • Logging a proper Stack Trace is essen&al too • Other context informa&on relevant for diagnos&cs might be provided via previous (or subsequent) log entries
Proper excep&on throwing
Do not be afraid of throwing excep&ons
• Throw them for: • Valida&on and constraints (pre / post condi&ons) • Instead of asserts, as permanent checks
• But do NOT throw them: • For plain flow control • As return mechanism
Proper excep&on throwing &ps
• Tidy up before throwing • Don't be afraid of throwing excep&ons (handling them is the most expensive part)
• Add enough context data, both: • As structured data for handling • In text message, for diagnos&cs
• When re-‐throwing, do NOT loose the stack trace / previous excep&on data
When using typed excep&ons…
• Pick the right excep&on type, if there is one • Know them / search through the ones available • Do not use too generic excep&on types (uninforma&ve)
• If necessary, create new excep&on types • Name the problem, not the thrower • Pick the right parent excep&on type • Create them with chaining constructors • Add appropriate fields if it helps handling
Examples
• DuplicateKeyExcep&on be<er than a SqlExcep&on with an error code • DuplicateKeyExcep&on can/should be an extension of an SqlExcep&on
• InvalidFormData much more useful if holding structured informa&on about all valida&on errors, including type of each error, involved fields, etc. • UI logic can use this informa&on to mark fields with bad data, display error messages for each, etc.
• An XML parsing excep&on is much more useful it it includes the loca&on of the offending fragment
Proper excep&on handling
Proper excep&on handling must consider all of
• Cleanup (when needed) • Logging (at selected spots) • MUST perform ONE of: • Propaga&on • Recovery • User ac&on request
Cleanup
Cleanup
• Free resources • Register / queue resources to be cleaned up later: • Automa&cally • By a human
• Cancel transac&ons • Etc.
Using finally
var res1 = null!var res2 = null!try {! res1 = openResourceA()! res2 = openResourceB(res1)! doSomething(res2)!}!finally {! if (null != res2) res2.close()! if (null != res1) res1.close()!}!
Using finally... in a bad way Things can fail inside a finally!
var res1 = null!var res2 = null!try {! res1 = openResourceA()! res2 = openResourceB(res1)! doSomething(res2)!}!finally {! if (null != res2) res2.close()! if (null != res1) res1.close()!}!
Using finally… More robust, but verbose and ugly
var res1 = null!var res2 = null!try {! res1 = openResourceA()! res2 = openResourceB(res1)! doSomething(res2)!}!finally {! try {! if (null != res2) res2.close()! }! finally {! if (null != res1) res1.close()! }!}!
Using finally... the right way
var res1 = openResourceA()!try {! var res2 = openResourceB(res1)! try {! doSomething(res2)! }! finally {! res2.close()! }!}!finally {! res1.close()!}!
Using finally... the right way Less indents op&on
var res1 = openResourceA()!try {! doMore(bigRes)!}!finally {! res1.close()!}!!function doMore(bigRes) {! var res2 = openResourceB(res1)! try {! doSomething(res2)! }! finally {! res2.close()! }!}!
Using finally... And catching all excep&ons
try {! var res1 = openResourceA()! try {! var res2 = openResourceB(res1)! try {! doSomething(res2)! }! finally {! res2.close()! }! }! finally {! res1.close()! }!}!catch(Exception e) {! -handle-all(e)!}!
Using finally… and throwing an excep&on
var res1 = openResourceA()!try {! var res2 = openResourceB(res1)! try {! var x = doSomething(res2)!
!if (null == x) {! throw new DataNotFoundException(“Don’t have it message…”)! }! }! finally {! res2.close() // if it was open, it will always be closed! }!}!finally {! res1.close() // if it was open, it will always be closed!}!
Using finally… and throwing an excep&on
• Previous slide s&ll a simplifica&on: • Remember that excep&ons can be thrown from inside a finally block
• The DataNotFoundExcep&on might be replaced by an excep&on thrown by one of the close() calls
• Prevent that IF DataNotFoundExcep&on is the most important to propagate in your case (how to do it is out of scope for this presenta&on)
• MIGHT be enough to just log the DataNotFoundExcep&on to cover the low probability event of a close() failure
Propaga&on
Ensure propaga&on! To avoid drowning excep&ons…
• Never have an empty catch statement • If the catch should never be reached, throw an (unsigned) excep&on from inside it, just in case (throw an ImpossibleExcep&on – you can create one)
• If you can't handle it, let it go trough / re-‐throw (you can simply let it pass trough)
• Remember: logging is NOT enough!!! (your sofware might even become illegal...)
Catching and replacing excep&ons try {! doSomething()!}!catch(Exception e) {! LOG.error(“Some meaningful message”, e)! throw new OtherException(“Some message”, context, e)!}!!// Or, at least, chain the exceptions…!try {! doSomething()!}!catch(Exception e) {! throw new OtherException(e)!}!!
How to propagate excep&ons...
• Only handle excep&ons at the right level (where you can do something useful)
• Do nothing / just use "finally" without "catch" (quite ofen, the right thing to do)
• Re-‐throw same excep&on (careful with syntax used in order to avoid loosing stack trace)
• Create and chain new excep&on (chain to keep stack trace / chain of events)
Valid reasons to replace excep&ons
• Replace excep&ons to: • Hide implementa&on details from callers • Make the excep&on more meaningful to the caller • Add informa&on specific to the problem in hand
• Do NOT replace excep&ons… • If not adding seman&c value (more meaning / addi&onal informa&on)
When replacing excep&ons...
• Do NOT loose the stack trace
• Do NOT previous diagnos&cs informa&on
• CHAIN excep&ons (new excep&on refers to the one it replaces)
Recovery
Recovery
• Real recovery: • Just retry (for resources w/ intermi<ent failures) • Repair (problema&c resource) and retry • Collect missing piece of informa&on • Etc.
• Next best thing: • Do has much as possible and ask the user to take further ac&on (e.g.: register request, and email results when they become available)
User ac&on request
Help the user to proceed when repor&ng a problem
• Provide informa&on useful to fix the problem (e.g.: which fields have invalid values and why)
• Provide support contact informa&on and a token iden&fying the problem
• Provide an es&ma&on on when the system is expected to be working correctly again
• Etc.
This is also important for batch processes…
• Provide informa&on on which items failed to be processed and why (and try processing all others)
• A systems operator is s&ll a user
Logging
Logging
• Logging filtering / channeling mechanisms • Namespaces • Log level
• Keep in mind you can have mul&ple channels with different (poten&ally overlapping) filters
Logging Levels
• Info (usually kept visible)
• Error (usually kept visible)
• Warning (usually kept visible)
• Debug (I like to keep it visible)
• Trace • Fatal
(Should be above Info. Is it silly? No longer present on some APIs)
Logging Levels
• Info Important applica&on/service events (startup, shutdown, ini&aliza&on…)
• Error Unexpected problems which might affect opera&on
• Warning Anomalous condi&ons which might signal problems (e.g.: recoverable loss of database connec&vity)
• Debug (Diagnos&c informa&on for poten&al app logic problems e.g.: business logic module ins and outs, including excep&ons)
• Trace (Maniacally logging every li<le step the app takes)
• Fatal (Fatal error – the applica&on is crashing)
Good logging prac&ces
Some aspects to consider for logging
• Message format • Possible automated analysis
• Log level criteria • Sensi&ve data logging criteria (Do not log passwords, credit card numbers, etc.)
• Logging channels • Log file management
Good logs
• Tell you exactly the when, where and how • Have each interes&ng event logged once and only once
• Can be analyzed even without its applica&on at hand
• Are reliable (this can be quite important, depending on its use)
• Do not slowdown (no&ceably) the system
Events to log (could use an audit channel / custom level for some)
• Authen&ca&on and authoriza&on [debug, audit] • System /data access [debug, audit] • System / data changes [debug, audit] • Poten&al threats [warning] • Resources at limits [warning] • Health / Availability [info]
(Startup, shutdown, faults, delays, backup status, etc.)
Event informa&on to log
• Timestamp (+TZ)(when) • Component / module (where) • Full stack traces • Involved par&es (when communica&ng)
• User (who) • Ac&on (what) • Status (result) • Log level
(a.k.a. severity, priority, importance, etc.) • Reason
Stack traces
• Should list the call stack to the excep&on throw point
• Usually present, per entry: • Func&on name • Source file name • Line number
• Should list all chained excep&ons • Should avoid redundant entries
Reading Stack Traces
eu.codebits.somewhere.SomeSillyException! “You are wrong!!!”!eu.codebits.someplace.StrictClass.badMethod()!eu.codebits.someplace.BusinessLogic.call3()!eu.codebits.someplace.BusinessLogic.call2()!eu.codebits.someplace.BusinessLogic.call1()!…!eu.codebits.someplace.frontEndThing()!…!com.my.app.server.WeirThing.weirdCall1001()!…!org.somelanguage.engine.Thing.rootCall()!…!
Logging prac&ces I like
Prac&ces I like (some controversial)
• Log all applica&on input and output (user input, DB access, external service calls, etc.)
• Use DEBUG level in produc&on for custom code (remember: premature op&miza&on is the root of all evil) (Actually, I am using DEBUG like an AUDIT level/channel)
• Place DEBUG and excep&on logging at module boundaries, to avoid log redundancy
• Automate logging at module boundaries • Iden&fy each request / transac&on with an UUID
(Reported to the user in case of error)
Automa&ng logging at module boundaries
• Use AOP or Dynamic Proxies + Introspec&on to intercept calls and apply automated logging
• Can use JSON serializa&on to present func&on/method arguments and results
• AOP or Dynamic Proxies mechanisms available on Java, Javascript, Ruby, JRuby, Perl, Python, Delphi, PHP, C# (.Net???), some C++
Logging at Module Boundaries (example)
Business Logic
Presentation
Database
External Service
LOG
LOG
LOG
Iden&fy each request / transac&on with an UUID
• Hides gory details while giving the user a direct handle to the problem
• Allows matching a problem to its request /transac&on’s log entries (even w/ simple log file + “less”)
• Depends on an intercep&on point at the “top” of the request / transac&on
• Depends on having a Thread Local mechanism (any plaqorm with synchroniza&on mechanisms and thread iden&fica&on should do)
• Thread Local mechanisms available on Java, Ruby, JRuby, Perl, Python, Delphi, PHP, C# (.Net???), C++, javascript seems to have it as NodeJS