(Almost) Four Years On: Metrics, ROI, and Other Stories from a Mature DITA CMS Installation

(Almost) Four Years On: Metrics, ROI, and Other Stories from a Mature DITA CMS InstallationKeith Schengili-Roberts | November 15, 2010

2

Agenda

• Intro + ROI• Things We Didn’t Expect• Measuring Productivity: Uses of Metadata

3

Who is This Guy?

Keith Schengili-Roberts• Manager for documentation and

localization for AMD’s Professional Graphics division (formerly ATI) Prior to becoming manager of the

group, was its information architect

• Lecturer at University of Toronto’s Professional Learning Center since 1999, teaching courses on information architecture and content management (sample slide decks available from: http://www.infoarchcourse.com/)

• Author of four titles on Internet technologies; last title was “Core CSS, 2nd Edition” (2001)

http://www.infoarchcourse.com/�

4

ROI Executive Summary

Proven return on investment (ROI) benefits from using a CMS-based DITA over the previous toolchain: Productivity/output increases

– Somewhere between 2.3 and 3 times more efficient

Can “do more with what we’ve already got”– Minimalism and content re-use goes a long way– We have fewer writers than when we started while our

output rate continues to increase

Localization cost savings– Localization budget is now less than half of what we

needed from the year before we started using the DITA CMS

– We are much more productive

5

What We Do

Documentation & Localization Group at AMD's Graphics Product Group (GPG) Formerly ATI

Based in Markham, Ontario

4 writers, 2 process engineers, 2 localizers, 1 manager

CMS: DITA CMS from Ixiasoft (www.ixiasoft.com)

Responsible for: End-user documentation, including online help (20%)

Engineering documentation for ODM/OEM partners (60%)

Technical training documentation for partners (20%)

Localize in up to 25 languages (mostly end-user and UI)

Primary outputs are PDF and XHTML

6

Where We Started (i.e., “The Bad Old Days”)

Circa 2003-2006:

• Used unstructured FrameMaker Localization costs very high

Code page issues made localization QA work hard

Could not reliably keep in sync with major software releases (monthly cadence required for online help; could only do it twice a year)

Writers were deeply siloed Very little content shared

Content re-use (especially between different docs) very low

Output was efficient but quality was highly variable

7

Where We Are Now

Have been using Ixiasoft’s DITA CMS in production since February 2007Have published more than 2,200 documents in that time

46% in English 54% in the languages to which we localize (21 maximum)

Writers and the documentation process are more nimble Any writer can take on another’s projects Content re-use rate is good (slightly more than 50% monthly) Quality is uniformly better; re-used topics are edited topics

Localization process is streamlined, with more time now available to focus on QA than on administration or fixing formatting issues

8

Getting ROI by Doing More with What We’ve Already Got

• Using the old toolchain, we spent about 50% of our time formatting content; this equates to an almost equal boost in productivity using the DITA CMS.

• We automate things that can (and should) be automated; no more TOCs or Indexes built by hand.

• Through attrition, we have fewer personnel writing/localizing content; despite this, our output rate has increased. An information architecture content audit of existing materials

emphasized minimalism and re-use within and between document types.

Content re-use is considerable; now, de-siloed writers are more flexible on what they can work on.

We continue our effort to find out what customers find useful, and to give them only the information they require.

9

ROI: Doing More with Less

Comparative numbers from 2007:

• Numbers show equivalent work on engineering docs (size types/sizes of docs/product release cycle)

• DITA CMS made us faster

• More than doubled output using the same headcount while taking on an expanded range of document types

10

ROI: Doing More with Less (cont.)

What’s happened since 2007?

Presenter

Presentation Notes

Dips and peaks are in line with our product release cycles Numbers cited all for *all* docs produced (which accounts for seeming jump over previous figures cited, which looked only at engineering docs) Values are cumulative for that quarter, so Q4 2009 shows 121 English docs published, and 103 Localized docs published for a total of 224 docs published Overall trend is still going up!

11

ROI: Doing More with Less (cont.)

In 2009, 4 writers were responsible for 366 docs.• On average, each writer produced 91.5 docs in a year = ~23 per

writer per quarter This figure does include revisions; however, on average, we do same

number of revisions as we did under the old toolchain (we just do them faster).

• Compare this to some roughly equivalent numbers from another Tech Writing team cover a similar subject area using our old toolchain: They produced 360 docs using 9 over the course of a year; their docs

roughly the same size, type and having a similar release cadence

This = 40 docs per writer per year, or 10 per writer per quarter– By these numbers, use of the DITA CMS improves efficiency by 2.3 times

(your own results may vary)

• The two localization coordinators were responsible for producing 432 docs in the system during 2009.

Presenter

Presentation Notes

Comparative numbers: 360 docs produced by 9 writers using old toolchain, or 40 per writer per year, or 10 per quarter

12

ROI: Localization Cost Savings

• Content re-use in English corresponds directly to translated content re-use

• Eliminated desktop publishing (DTP) charges• As a result, we are able to produce publications

more quickly and reliably and less expensively than with our old toolchain: One example is our Catalyst Control Center online help:

prior to the DITA CMS, we could only hope to do this at most every 6 months; now, we can keep up with the monthly software release cycle.

Presenter

Presentation Notes

Catalyst Control Center online help is now simultaneously shipped in 21 languages along with English; simply not possible without the DITA CMS in place

13

CMS-based DITA and Localization Costs

Blue line= localization budget for quarter; Red line= actual localization spend

Our annual localization budget is now 2.5 times less than the year before we started using the CMS (2006)

• DITA CMS has more than paid for itself based only on reduced localization costs

The volume of localized content has increased over this time period

“Bad Old Days”

Content audit +Single-sourcing

CMS ROI

14

DITA Advantages from a Writer’s Perspective

Moving and implementing DITA is typically a management decision, but there are advantages for the writers:

Learning a new and valued skill (I've had two writers hired out from under me by another firm looking to "do DITA").

As content re-use increases over time, the writers act more as editors, so have a higher "value-add" to the content process.

Significant topic re-use means that writers learn more about other subjects using other writers’ topics, effectively de-siloing the writing team.

Programmatic skills increasingly called into play because there is a need for people who understand XSL and text-parsing languages (such as Python) and also understand publishing.

Things We Didn’t Expect

• Need for a “house” DITA Style Guide Also found ways to help enforce it

• Conrefs vs. Cloning• More nimble options available for doing localization• Use of tracking-based metadata allows us to do

thorough productivity measures And allows us to measure useful things we had not

initially anticipated

16

How Much DITA Do You Need?

In terms of the number of tags you need to use, it may be less than you think:

Our initial approach was evolutionary; writers could use any tag they felt necessary, and over time DITA tagging styles were established and made uniform (DITA Style Guide).

Using fewer tags decreases formatting issues/clashes when creating XSL output types.

In all, we actively use fewer than half of all DITA 1.1 tags.

Presenter

Presentation Notes

I say “evolutionary”, in reality there were many long discussions by the writing staff with the Information Architect on what should be used in a given context (i.e. simple tables or “regular tables” for figure captions)

17

Cloud of Relative Tag Usage

• 67 tags displayed, with a threshold of +20 min. usage• Tags not included because they are auto-populated/included in

our topic templates: othermeta, metadata, prolog, searchtitle, shortdesc, titlealts, navtitle

• Created using “Wordle” from www.wordle.net

Presenter

Presentation Notes

Much of the “top” tags are those automatically added by CMS upon topic creation (such as othermeta, navtitle and searchtitle) many of which are then auto-filled by the system where possible Based on this, it seems likely that there is some “tag overuse” going on given the over-prominence of the <b> and <i> tags

18

Creating a DITA Style Guide

A recommendation for any tech docs group that uses DITA extensively: Helps new writers/contributors come up to speed Usefully narrows the scope of the XSL work that needs to

be done Many things are “legal” in DITA but may be poor from a

“house style” standpoint, for example:– Can have unformatted block content between a header and a table

in a section

– Tables and figures do not have to have a title

– Can have unlimited nested lists

– Alpha lists can contain more than 26 items

– Lists can contain only a single item

19

Schematron Can Help Enforce DITA Style

What is Schematron? “Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees.” (www.wikipedia.org)

We use Schematron to point out to the writers potential errors/lapses in our DITA House Style:

Text between a section and table not wrapped in block tags:

A list ought to have more than one item (otherwise, why make it a list?):

Presenter

Presentation Notes

Schematron is invoked upon topic creation, and the “red X” appears as the writer types; the message only appears if the writer cursors over the “X”. Is invoked automatically upon a new topic’s creation with the following line: <?oxygen SCHSchema="../../system/schemas/client/AMDSchematron.sch" ?> Sample code being used here: <sch:pattern id="unenclosed.phrase"> <sch:title/> <sch:rule context="section|fig|ul|ol|linklist|context|table"> <sch:let name="strcat" value="string-join(text(),'')"/> <sch:assert test="not(normalize-space($strcat)) and string-length(normalize-space($strcat))=0">Unenclosed text in element '<sch:value-of select="name(.)"/>': Item(s): "<sch:value-of select="normalize-space($strcat)"/>"</sch:assert> </sch:rule> </sch:pattern> <sch:pattern id="list.short"> <sch:rule context="ol|ul"> <sch:let name="prev" value="name(..)"/> <sch:let name="numitems" value="count(.//li)"/> <sch:assert test="not($numitems=1) or $prev='li'">List should have more than one item. (<sch:value-of select="$prev"/>,<sch:value-of select="$numitems"/>)</sch:assert> </sch:rule> </sch:pattern>

20

XSL Can Help Enforce DITA House Style

We have a DITA house style that says nested lists should be no more than two levels deep.

Here’s Schematron doing it’s job:

And here is the result if you try to output it:

21

Conrefs vs. Cloning

At a very early stage we decided not to use conrefs in our DITA content• Made localization programmatically complicated/inefficient

• Creating a localization kit would mean finding all conrefs in a doc (however many levels they are nested) and then “flattening” them; leads to inefficient segment-matching

• Did not seem cost-effective from an author’s perspective• Would seem to limit reuse as conref targets become “fixed”; dare

not change without affecting many docs

• Searching and then defining a single phrase or paragraph to reuse not always an efficient use of time

22

Conrefs vs. Cloning

• We instead chose a “clone” approach to topic re-use:• Essentially, make a copy of an existing topic and use only the

parts that you need in your current document

• Original topic and cloned are completely separate (though trackable; parent/child relationship is retained in CMS)

• Cloning is only done when the amount of change is sufficient that the original topic cannot accommodate it

• Writers can more freely re-use existing topics for their own needs

• When a localization kit is made, the segment matching process is efficient

23

Nimble Localization Processes with DITA XML

Under the old toolchain, localizing a 200+ page document to a single language within a week (without huge expense) was impossible.DITA XML allows us to be more nimble: for critical large documents, we can send the localization firm finished “parts” as we get them (“70/20/10”):

When roughly 70% of a large document is done, we send it off for translation, followed a week or two later with another 20% of new and updated material, then the last 10% when we complete it.

While this process does cost more than sending in a whole document at once, it reduces the turnaround time from weeks to days, and quality is much improved because it is not done in a rush.

This approach was simply not feasible using our old toolchain; ultimately, the new toolchain is still cheaper and much faster.

Presenter

Presentation Notes

This approach is also detailed in a 2009 Aberdeen Group Case Study “Translating Product Documentation”

Measuring Productivity: Uses of Metadata

There are three main purposes for metadata: Retrieval

Re-use Tracking

• Everyone who has used a search engine is familiar with the “Retrieval” part.

• Authors can add their own metadata to topics to aid in later retrieval for re-use. Topic and map dependencies can be checked, and

associated topics re-used in other publications.

Presenter

Presentation Notes

This is pretty much lifted from my slide deck for the Enterprise Content Management course I teach at the University of Toronto’s Professional Learning Centre

25

Tracking Metadata

Tracking metadata (in our case, mainly dates, author, and topic/map status) is used for understanding trends and managing workflow.The types of questions we can readily answer include: Who created the content (author)? When was it created (date)?

Who modified it (editor)? Who reviewed it (reviewer/approver)?

Where has it been re-used (map relation)? Has it been published or translated (status/language)?

26

How We Measure Productivity

Metric we use is a combination of topics created + topics modified in a monthly/quarterly timeframe:

Each new topic created counts as 1. Modified topics are also counted, though again only as 1. Subsequent revisions to the same topic in a given

timeframe are not counted.Provides us with a very good view of ongoing work, and the numbers align with known product release cycles.Works both as an aggregate measure (total output per month), and as a measure of a writer’s individual productivity.

Maps are also tracked, but are not as good for measuring productivity since they come in many sizes and have widely varying development timelines.

Presenter

Presentation Notes

We use topics produced by the writing team as the “atomic unit” of measurement; aggregate value turns out to be a good measure of productivity Maps are also good and their numbers are tracked, but maps are not a good “unit” measure since they come in many sizes

27

Topics Created/Modified (Monthly)

Presenter

Presentation Notes

This graph was created using the Business Intelligence Reporting Tool (BIRT) It shows the values of new topic creation and topic modification month by month by the writers, with the dotted line showing overall aggregate total

28

Topic Production Matches Product Cadence

Product Release Cycle

#1


#2


#3

• Regular peak of production in Q3, typically followed by secondary peak in Q1M

ain Peak

Main Peak

Main Peak

Secondary Peak

Secondary Peak

Secondary Peak

Presenter

Presentation Notes

Extra “dip” in Q2 2009 may be due to the effects of being down one person at the time

29

Localization Segments Auto-translated within CMS Monthly

• Portion in orange is the percentage that were 100% matches, and were never sent to a localization vendor = pure ROI!

• From July 2008 to July 2009, an avg. of 54% of segments were auto-translated within the system.

Presenter

Presentation Notes

First peak of new segments came when we did a large revision of an existing software project; second peak of new segments came with first Windows 7-related software release Blue represents new segments created, and red the number of segments that were 100% matches and therefore “auto-translated” within the DITA CMS; the percentage figure looks at the relative rate of reuse for *that month*

30

Sample Topic Reuse Rate (Monthly)

From Jan 2008 to June 2009, average monthly topic reuse rate = 53.53%

Presenter

Presentation Notes

Reuse Ratio = the number of published topics referenced by more than one map in Authoring (for a given month). / The number of topics published (for a given month) * 100 Objects used more than once = the number of published topics referenced by more than one map in Authoring (for a given month). Objects published = the number of topics published (for a given month)

31

An Interesting Trend: Topic Ratios

Except in year one, reference topics steadily make up ~74% of all topics used

Presenter

Presentation Notes

Based on “active” topics contained in repository which are available for reuse Excludes connectors/connectormaps and “image topics” Would love to know what other group’s ratios are; though I *suspect* that the results will be similar to this for our “style” of documentation (i.e. engineering + end-user docs with little to no marketing content)

32

What is the Average Size of a Topic?

Maps avg. = 3.47 kb

Concepts avg. = 2.46 kb

References avg. = 7.88 kb

Tasks avg. = 3.20 kb

1 byte = 1 character

1000 bytes (1 kb) = 1000 characters

• Concepts avg. 0.65 of a page of Lorem ipsum text in Word

• References avg. 2.6 pages Smallest: half a page

Largest: ~200 pages

• Tasks avg. 1 page

Presenter

Presentation Notes

Please note that this is a logarithmic chart, so the range in size depicted is actually larger than it seems at first glance This is a question I kept hearing at the early conferences I attended on DITA; I always thought that the proper answer to that was “whatever size it needs to be” and this chart pretty much confirms that.

33

Questions & Answers

(Almost) Four Years On: Metrics, ROI, and Other Stories from a Mature DITA CMS Installation

Technology

Transcript of (Almost) Four Years On: Metrics, ROI, and Other Stories from a Mature DITA CMS Installation