Exception+Logging=Diagnostics 2011

Excep&on Handling +

Logging =

Diagnos&cs

@paulogaspar7 (twi<er)

[email protected] (email and G+)

Diagnos&c mechanisms are a good investment…

•  If you invest enough and wisely, you get plenty of return

Meaning that …

•  If you invest enough &me on the right means to get good diagnos&cs, you will end up saving &me

Concepts

•  Excep&ons are a communica&on mechanism

•  Logging is one of the channels used to convey their informa&on

Excep&onal communica&on

Excep&ons…

•  Are a possible ini&al step when communica&ng failures to: •  Calling logic blocks •  Users •  Sysadmins

•  Communica&ng: •  Diagnos&cs informa&on

What informa&on to place on an Excep&on?

•  Calling logic must have enough informa&on for the caller to decide what to do, including: •  What type of error/failure •  Relevant addi&onal data to:

•  Take any further ac&on (recovery / cleanup) •  Inform the sysadmin / programmer •  Inform the user, if needed

•  Text message should NOT target the user •  That is the UI layer's responsibility

(But there should be enough informa&on for the UI layer to build a proper message to the user, if that can be the case)

Excep&on text messages should target the sysadmin / programmer

•  Enough informa&on should be provided for the programmer to rapidly reproduce / diagnose the problem

•  This informa&on must be logged •  Logging a proper Stack Trace is essen&al too •  Other context informa&on relevant for diagnos&cs might be provided via previous (or subsequent) log entries

Proper excep&on throwing

Do not be afraid of throwing excep&ons

•  Throw them for: •  Valida&on and constraints (pre / post condi&ons) •  Instead of asserts, as permanent checks

•  But do NOT throw them: •  For plain flow control •  As return mechanism

Proper excep&on throwing &ps

•  Tidy up before throwing •  Don't be afraid of throwing excep&ons (handling them is the most expensive part)

•  Add enough context data, both: •  As structured data for handling •  In text message, for diagnos&cs

•  When re-‐throwing, do NOT loose the stack trace / previous excep&on data

When using typed excep&ons…

•  Pick the right excep&on type, if there is one •  Know them / search through the ones available •  Do not use too generic excep&on types (uninforma&ve)

•  If necessary, create new excep&on types •  Name the problem, not the thrower •  Pick the right parent excep&on type •  Create them with chaining constructors •  Add appropriate fields if it helps handling

Examples

•  DuplicateKeyExcep&on be<er than a SqlExcep&on with an error code •  DuplicateKeyExcep&on can/should be an extension of an SqlExcep&on

•  InvalidFormData much more useful if holding structured informa&on about all valida&on errors, including type of each error, involved fields, etc. •  UI logic can use this informa&on to mark fields with bad data, display error messages for each, etc.

•  An XML parsing excep&on is much more useful it it includes the loca&on of the offending fragment

Proper excep&on handling

Proper excep&on handling must consider all of

•  Cleanup (when needed) •  Logging (at selected spots) •  MUST perform ONE of: •  Propaga&on •  Recovery •  User ac&on request

Cleanup

Cleanup

•  Free resources •  Register / queue resources to be cleaned up later: •  Automa&cally •  By a human

•  Cancel transac&ons •  Etc.

Using finally

var res1 = null!var res2 = null!try {! res1 = openResourceA()! res2 = openResourceB(res1)! doSomething(res2)!}!finally {! if (null != res2) res2.close()! if (null != res1) res1.close()!}!

Using finally... in a bad way Things can fail inside a finally!

var res1 = null!var res2 = null!try {! res1 = openResourceA()! res2 = openResourceB(res1)! doSomething(res2)!}!finally {! if (null != res2) res2.close()! if (null != res1) res1.close()!}!

Using finally… More robust, but verbose and ugly

var res1 = null!var res2 = null!try {! res1 = openResourceA()! res2 = openResourceB(res1)! doSomething(res2)!}!finally {! try {! if (null != res2) res2.close()! }! finally {! if (null != res1) res1.close()! }!}!

Using finally... the right way

var res1 = openResourceA()!try {! var res2 = openResourceB(res1)! try {! doSomething(res2)! }! finally {! res2.close()! }!}!finally {! res1.close()!}!

Using finally... the right way Less indents op&on

var res1 = openResourceA()!try {! doMore(bigRes)!}!finally {! res1.close()!}!!function doMore(bigRes) {! var res2 = openResourceB(res1)! try {! doSomething(res2)! }! finally {! res2.close()! }!}!

Using finally... And catching all excep&ons

try {! var res1 = openResourceA()! try {! var res2 = openResourceB(res1)! try {! doSomething(res2)! }! finally {! res2.close()! }! }! finally {! res1.close()! }!}!catch(Exception e) {! -handle-all(e)!}!

Using finally… and throwing an excep&on

var res1 = openResourceA()!try {! var res2 = openResourceB(res1)! try {! var x = doSomething(res2)!

!if (null == x) {! throw new DataNotFoundException(“Don’t have it message…”)! }! }! finally {! res2.close() // if it was open, it will always be closed! }!}!finally {! res1.close() // if it was open, it will always be closed!}!

Using finally… and throwing an excep&on

•  Previous slide s&ll a simplifica&on: •  Remember that excep&ons can be thrown from inside a finally block

•  The DataNotFoundExcep&on might be replaced by an excep&on thrown by one of the close() calls

•  Prevent that IF DataNotFoundExcep&on is the most important to propagate in your case (how to do it is out of scope for this presenta&on)

•  MIGHT be enough to just log the DataNotFoundExcep&on to cover the low probability event of a close() failure

Propaga&on

Ensure propaga&on! To avoid drowning excep&ons…

•  Never have an empty catch statement •  If the catch should never be reached, throw an (unsigned) excep&on from inside it, just in case (throw an ImpossibleExcep&on – you can create one)

•  If you can't handle it, let it go trough / re-‐throw (you can simply let it pass trough)

•  Remember: logging is NOT enough!!! (your sofware might even become illegal...)

Catching and replacing excep&ons try {! doSomething()!}!catch(Exception e) {! LOG.error(“Some meaningful message”, e)! throw new OtherException(“Some message”, context, e)!}!!// Or, at least, chain the exceptions…!try {! doSomething()!}!catch(Exception e) {! throw new OtherException(e)!}!!

How to propagate excep&ons...

•  Only handle excep&ons at the right level (where you can do something useful)

•  Do nothing / just use "finally" without "catch" (quite ofen, the right thing to do)

•  Re-‐throw same excep&on (careful with syntax used in order to avoid loosing stack trace)

•  Create and chain new excep&on (chain to keep stack trace / chain of events)

Valid reasons to replace excep&ons

•  Replace excep&ons to: •  Hide implementa&on details from callers •  Make the excep&on more meaningful to the caller •  Add informa&on specific to the problem in hand

•  Do NOT replace excep&ons… •  If not adding seman&c value (more meaning / addi&onal informa&on)

When replacing excep&ons...

•  Do NOT loose the stack trace

•  Do NOT previous diagnos&cs informa&on

•  CHAIN excep&ons (new excep&on refers to the one it replaces)

Recovery

Recovery

•  Real recovery: •  Just retry (for resources w/ intermi<ent failures) •  Repair (problema&c resource) and retry •  Collect missing piece of informa&on •  Etc.

•  Next best thing: •  Do has much as possible and ask the user to take further ac&on (e.g.: register request, and email results when they become available)

User ac&on request

Help the user to proceed when repor&ng a problem

•  Provide informa&on useful to fix the problem (e.g.: which fields have invalid values and why)

•  Provide support contact informa&on and a token iden&fying the problem

•  Provide an es&ma&on on when the system is expected to be working correctly again

•  Etc.

This is also important for batch processes…

•  Provide informa&on on which items failed to be processed and why (and try processing all others)

•  A systems operator is s&ll a user

Logging

Logging

•  Logging filtering / channeling mechanisms •  Namespaces •  Log level

•  Keep in mind you can have mul&ple channels with different (poten&ally overlapping) filters

Logging Levels

•  Info (usually kept visible)

•  Error (usually kept visible)

•  Warning (usually kept visible)

•  Debug (I like to keep it visible)

•  Trace •  Fatal

(Should be above Info. Is it silly? No longer present on some APIs)

Logging Levels

•  Info Important applica&on/service events (startup, shutdown, ini&aliza&on…)

•  Error Unexpected problems which might affect opera&on

•  Warning Anomalous condi&ons which might signal problems (e.g.: recoverable loss of database connec&vity)

•  Debug (Diagnos&c informa&on for poten&al app logic problems e.g.: business logic module ins and outs, including excep&ons)

•  Trace (Maniacally logging every li<le step the app takes)

•  Fatal (Fatal error – the applica&on is crashing)

Good logging prac&ces

Some aspects to consider for logging

•  Message format •  Possible automated analysis

•  Log level criteria •  Sensi&ve data logging criteria (Do not log passwords, credit card numbers, etc.)

•  Logging channels •  Log file management

Good logs

•  Tell you exactly the when, where and how •  Have each interes&ng event logged once and only once

•  Can be analyzed even without its applica&on at hand

•  Are reliable (this can be quite important, depending on its use)

•  Do not slowdown (no&ceably) the system

Events to log (could use an audit channel / custom level for some)

•  Authen&ca&on and authoriza&on [debug, audit] •  System /data access [debug, audit] •  System / data changes [debug, audit] •  Poten&al threats [warning] •  Resources at limits [warning] •  Health / Availability [info]

(Startup, shutdown, faults, delays, backup status, etc.)

Event informa&on to log

•  Timestamp (+TZ)(when) •  Component / module (where) •  Full stack traces •  Involved par&es (when communica&ng)

•  User (who) •  Ac&on (what) •  Status (result) •  Log level

(a.k.a. severity, priority, importance, etc.) •  Reason

Stack traces

•  Should list the call stack to the excep&on throw point

•  Usually present, per entry: •  Func&on name •  Source file name •  Line number

•  Should list all chained excep&ons •  Should avoid redundant entries

Reading Stack Traces

eu.codebits.somewhere.SomeSillyException! “You are wrong!!!”!eu.codebits.someplace.StrictClass.badMethod()!eu.codebits.someplace.BusinessLogic.call3()!eu.codebits.someplace.BusinessLogic.call2()!eu.codebits.someplace.BusinessLogic.call1()!…!eu.codebits.someplace.frontEndThing()!…!com.my.app.server.WeirThing.weirdCall1001()!…!org.somelanguage.engine.Thing.rootCall()!…!

Logging prac&ces I like

Prac&ces I like (some controversial)

•  Log all applica&on input and output (user input, DB access, external service calls, etc.)

•  Use DEBUG level in produc&on for custom code (remember: premature op&miza&on is the root of all evil) (Actually, I am using DEBUG like an AUDIT level/channel)

•  Place DEBUG and excep&on logging at module boundaries, to avoid log redundancy

•  Automate logging at module boundaries •  Iden&fy each request / transac&on with an UUID

(Reported to the user in case of error)

Automa&ng logging at module boundaries

•  Use AOP or Dynamic Proxies + Introspec&on to intercept calls and apply automated logging

•  Can use JSON serializa&on to present func&on/method arguments and results

•  AOP or Dynamic Proxies mechanisms available on Java, Javascript, Ruby, JRuby, Perl, Python, Delphi, PHP, C# (.Net???), some C++

Logging at Module Boundaries (example)

Business Logic

Presentation

Database

External Service

LOG

LOG

LOG

Iden&fy each request / transac&on with an UUID

•  Hides gory details while giving the user a direct handle to the problem

•  Allows matching a problem to its request /transac&on’s log entries (even w/ simple log file + “less”)

•  Depends on an intercep&on point at the “top” of the request / transac&on

•  Depends on having a Thread Local mechanism (any plaqorm with synchroniza&on mechanisms and thread iden&fica&on should do)

•  Thread Local mechanisms available on Java, Ruby, JRuby, Perl, Python, Delphi, PHP, C# (.Net???), C++, javascript seems to have it as NodeJS

@paulogaspar7 (twi<er)

[email protected] (email and G+)

Q & A

Exception+Logging=Diagnostics 2011

Technology

Transcript of Exception+Logging=Diagnostics 2011