Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

41
DITA METRICS IN PRODUCTION: How, When, Where, and Why (and How Much?) Keith Schengili-Roberts, Manager Documentation & Localization June 16, 2011

description

An update to an earlier presentation that talks about DITA Metrics looking at my experiences while at AMD, and looking at production metrics as well as ROI.

Transcript of Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

Page 1: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

DITA METRICS IN PRODUCTION:How, When, Where, and Why (and How Much?)Keith Schengili-Roberts, Manager Documentation & LocalizationJune 16, 2011

Page 2: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

2 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

BIOGRAPHICAL INFORMATION | a.k.a. Who is this Guy?

Keith Schengili-Roberts– Manager for AMD’s Documentation and

Localization department for the Professional Graphics division (formerly ATI); formerly its Information Architect (migrated team to DITA)

– Award-winning lecturer at University of Toronto’s Professional Learning Center since 1999, teaching courses on information architecture and content management

– Author of four titles on Internet technologies; last title was Core CSS, 2nd Edition (2001)

– Co-Founder of the Semiconductor DITA Implementer’s Group (SDIG) that holds monthly free talks/webinars on using DITA for our domain: http://tech.groups.yahoo.com/group/semidita

– DITA blog at: www.ditawriter.com

Page 3: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

3 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

WHAT MY GROUP DOES

Documentation & Localization Group at AMD's Graphics Product Group (GPG)

– Formerly ATI, based in Markham, Ontario

– 4 writers, 1½ process engineers, 2 localization co-ordinators, 1 manager

– CMS: DITA CMS from Ixiasoft (www.ixiasoft.com)

– Have been using DITA in production for over four years

Responsible for:

– End-user documentation, including online help

– Engineering documentation for ODM/OEM partners

– Technical training documentation for partners

– Localize in up to 25 languages (mostly end-user and UI)

– Primary outputs are PDF and XHTML

Page 4: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

4 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

DITA METRICS AND R.O.I.

Most typical context in which one hears about metrics in relation to DITA:

– Scenario: documentation team is trying to justify expense of a DITA-based Content Management System to upper management. The two key costing arguments are: Higher re-use rates = more efficient process → faster delivery/lower costs

Separation of form from content = more efficient work process, greatly reduced localization costs

– If estimated $ R.O.I. > $ CMS and attendant costs, get CMS

Page 5: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

5 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

PROVING THE PRODUCTIVITY R.O.I. ARGUMENT

How?

– Take metrics before switching over to CMS in order to set a baseline

– Here are comparative numbers from 2007, covering two quarters immediately pre- and post-CMS implementation:

Demonstrates that using the DITA CMS made us faster, and that we more than doubled output (docs were comparable)

Page 6: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

6 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

PROVING THE PRODUCTIVITY R.O.I. ARGUMENT (CONT.)

What’s happened since 2007?

Page 7: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

7 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

PROVING THE REDUCED LOCALIZATION COST ARGUMENT

Track localization budget and compare to actual amount spent; any positive difference goes towards R.O.I.

Blue line = localization budget for year, Red line = actual localization spend

“Bad Old Days”

Content audit +Single-sourcing

CMS ROI$

Page 8: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

8 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

R.O.I. PROVEN, NOW WHAT?

In our case the Ixiasoft DITA CMS had demonstratively paid for itself in terms of increased efficiency and reduced localization cost by the end of the second year

– Any additional savings from that point on is “gravy”

Metrics now take on a related but different purpose:

– Focus is now on relative production and quality using the CMS; or in other words: “how do we track what we are doing?”

“how do we do what we do better?”

“what is the ongoing cost of production?”

Page 9: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

9 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

DITA + CMS = ABILITY TO MEASURE OUTPUTS

DITA CMS has capability to process large amounts of data in its repository, so managers can answer the following questions that relate to production in a DITA-based environment:

– What is the average size of a topic?

– How much content was created?

– How much does it cost to create a topic?

– What is the topic reuse rate?

– What DITA tags are most/least used? Are we using them efficiently?

– What is the ratio of topic types used?

– How much of the same document type have we produced year over year?

Page 10: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

10 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

WHAT NOT TO MEASURE | DITA Makes these More Problematic

Technical Writing metrics that are of questionable use:

– Hours of days to produce a pageWhat exactly is a “page” in XML?

Different output formats change outcome (double-spaced output anyone?)

Documents released per writer Differences in size (Datasheets vs. Databooks)

Documents with high reuse rates vs. low reuse rates

Measure aggregate values rather than individual effort

– Otherwise you run the risk of getting more what you measure

Page 11: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

11 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

GOOD BACKGROUND MATERIAL ON DOC METRICS

Pre-DITA, good general overviews:

– Documentation Metrics: What Do You Really Want to Measure, by Donald S. Le Vie Jr. http://www.stc.org/intercom/PDFs/2000/200012_06-09.pdf

– Practical and Effective Metrics, by Geoffrey J.S. Hart http://www.stc.org/PDF_Files/myjob/geofferyHart.pdf

DITA –specific:

– What is the Best Metric to Measure the Success of Your Reuse of DITA Topics? by Bill Hackos http://www.infomanagementcenter.com/enewsletter/200806/third.htm

– DITA Metrics: Cost Metrics, by Mark A. Lewis http://dita.xml.org/resource/dita-metrics-cost-metrics (look for PDF link)

– The Illusive, Writing Productivity Metric: Making Unit Cost a Competitive Advantage, by Mike Eleder (“Best Practices”, Feb 2011)

Page 12: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

12 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

DITA TOPICS AS “ATOMIC” UNIT OF MEASURE

Each DITA topic can be thought of an “atomic” unit of production; while they vary in size, that difference can be measured and averaged in a meaningful way

– Different topic types have their own characteristics; graph shows average, max, min, and count for concepts, references and tasksin our repository

– References average = 2.6 pages of lorem ipsum text in MS Word; are roughly twice the size of concepts and tasks

Page 13: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

13 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

ONE EXAMPLE: TOPIC PRODUCTION PER QUARTER

Tracked items:

– # of topics created per author

– # of topics modified per author

Results then collated, showing amount of work being done in DITA CMS

Sample search:Authoring cycle selected (current work),All document types selected Select a date range for topic

creation, then select a person

Page 14: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

14 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

TOPIC PRODUCTION PER QUARTER | One View of the Resulting Data

Collated results show all topic production over a quarter

– Results show ongoing work necessary to meet product release cycle An effective tool with upper management; shows productivity and

predicted trends based on previous cycles

# of Topics

Page 15: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

15 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

TOPIC PRODUCTION PER QUARTER | Another View of the Resulting Data

Can further break this down to show how much groups are contributing:

– Same total, but this shows relative contribution per group Uses document type/author to group topic outputs

# of Topics

Page 16: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

16 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COST-PER-TOPIC | Identifying the cost of production

Idea came from “Best Practices” article by Mike Eleder (“The Illusive, Writing Productivity Metric: Making Unit Cost a Competitive Advantage”)

Basically:

cost per topic = monthly tech writer team cost

topics produced monthly

Is a unit cost measure, producing monthly average cost for producing topics

– Can estimate the cost of future work based on result

– Original article suggests using only published topics; I use topics created + modified (i.e. using the same figures used in previous two charts) as those numbers are effectively normalized and better reflect ongoing work

Page 17: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

17 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COST-PER-TOPIC | Example chart

Initial version doesn’t provide a clear picture; cost varies based on production figures (dips in August of both years represent busy times: more topics created in a given timeframe = lower cost per topic)

$

Page 18: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

18 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COST-PER-TOPIC | Example chart with trend line added

The trend line records the average cost-per-topic over the time measured

This shows an overall downward trend: Avg $28 per topic in 2009, and $25 per topic in 2010

$

Page 19: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

19 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COST-PER-TOPIC | Example chart based on published data

This graph shows what is published (i.e. what goes out the door) each month; release cycle is more variable than our production schedule; it does show costs of publishing content (High = $81.60, Low = $6.61, Avg = $17.79)

Previous topics created/modified charts provide a more accurate result as it reflects producing topic content as a running cost

$

Page 20: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

20 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COST-PER-LOCALIZED-TOPIC | Example chart

This looks at the cost per localized topic; compares total localized topics produced against actual localization costs invoiced for that month

Invoicing cycles don’t always match our localization cycles, so same caveat applies here as with published doc

Removing outliers, results are interesting: High = $28.15, Min = $0.02, Avg = $5.25

$

Page 21: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

21 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

REUSE RATES | What to Measure?

On the surface, this seems like a simple question to answer: if a topic is used again, that equals 100% reuse, doesn’t it?

– True, but that’s not the whole picture; topics are often copied (cloned) and then only a small portion of the new topic is changed (a sentence is added, a numerical value is changed, etc)

– So to get a true value for reuse, you need to go past the topic level

Localization tools provides us with a usable option: segments

Page 22: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

22 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

SEGMENTS AS “QUARKS” OF MEASUREMENT FOR REUSE

If topics are the “atomic” unit of production measurement, then segments are “sub-atomic”

– Topic reuse can take place at the whole topic level, but more often it happens sub-topic level, as topics are copied and then modified

– Segmentation is a concept from the localization realm: a segment is a sentence or (more usually) a portion of a sentence Defined by segmentation rules, which ensures that you are always comparing

like to like

Localization tools look for changes (the “delta”) at this level

– Data within our CMS enables us to do a look at this

– If you don’t have a CMS, try a localization workbench

Page 23: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

23 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

SAMPLE OF SEGMENTATION AS APPLIED TO REUSE

Average monthly topic reuse rate for this time period = 53.53%

% # of topics

Page 24: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

24 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

OTHER QUESTIONS RELATED TO PRODUCTIVITY IMPROVEMENTS

Constraints added to DITA 1.2 spec allow you to limit the number of tags you use

– This is useful, because it helps enforce a “house style” of DITA usage, where authors can more readily find the tags that they need to use

– But how do you usefully determine what you should and should not keep? If you have an existing base of content, examine how often tags are used and

then set a threshold below which tags are excluded

Page 25: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

25 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

WORDCLOUD SHOWING RELATIVE TAG USAGE

Produced by doing a search on the frequency of use of each DITA tag (e.g. # of times <note> is used in all topics) in our repository, and then using a WordCloud app (wordle.com) to produce the results

In this case it is not the “big” words in which we are interested, but the smallest, which ought to be excluded from ongoing use

Page 26: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

26 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

EXAMINING THE OUTLIERS

In this case we are looking at the tags that are the least used

– These are candidates for exclusion in the constraint file

– This type of search does not reveal tags with a count = 0; these would need to be identified and excluded as well Note that at 73 tags total, we are

using well under half all DITA tags available

Based on final results, an effective DITA 1.2 constraint can be created

Page 27: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

27 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

TRACKING TOPIC TYPE USAGE | Tracking History

Year-over-year comparisons of topic types used provides an idea as to the composition of a “typical” DITA-based document

– While I expected a high percentage of reference topics, I wondered whether there were more topics that ought to be tasks which were instead done as references

Page 28: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

28 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

TRACKING TOPIC TYPE USAGE | Making Change

Asked writers to be more diligent about writing task topics where they might be temped to write them as references instead

– Result was a measurable increase in the percentage of task topics created over the course of the year

– A larger number of our documents now include a procedural, step-by-step approach to the subject, which is beneficial to the end-user

Page 29: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

29 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COMPARING LIKE PRODUCTION YEAR OVER YEAR

Comparing the number of datasheets vs. databooks not useful, but comparing how many of those are made over several years is

– Provides insight into relative production efficiency

# of docs producedper year

Page 30: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

30 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

LOOKING MORE CLOSELY AT YEARLY PRODUCTION

This was created by outputting a list of all maps published within the year, and then manually sorting them by type of publication

Page 31: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

31 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

BREAKING PRODUCTION NUMBERS DOWN FURTHER

Page 32: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

32 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

PRODUCTION AND LOCALIZATION NUMBERS

Following was produced by outputting a list of all non-English maps created within the year, and then sorting (and counting) by language

Page 33: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

33 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

COMPARING ENGLISH TO LOCALIZED PUBLISHED MAPS

A year-over-year comparison of published map output

Page 34: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

34 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

OTHER POSSIBILITIES | Looking at Quality

Another avenue we are looking at is automated quality metrics

– Are actively investigating the Flesch Reading Ease metric It follows a simple formula, which could be applied to the content of any topic

We do a lot of engineering documentation, so low scores (indicating words using many syllables aimed at those with more than a basic education) is expected

However, topics containing basic information that score especially high (such as most concepts) may serve as a guide to the writer to re-examine a topic to make it easier to read

Page 35: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

35 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

OTHER POSSIBILITIES | Localization

There are further gains to be had on the localization side:

Ixiasoft (DITA CMS) and XML Int’l (XTM Suite) have integrated the CMS with a translation memory (TM) suite that will allow us to use TM within CMS– Currently use segmentation matching, producing 100% matches that still need

to be verified (i.e. 100% matches which still need to be verified); with integrated TM, only new content will be sent to localization vendor.

Page 36: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

36 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

SLIGHT TANGENT #1: CREATING A DITA STYLE GUIDE

A recommendation for any tech docs group that uses DITA extensively:

– Helps new writers/contributors come up to speed

– Usefully narrows the scope of the XSL work that needs to be done

– Many things are “legal” in DITA but may be poor from a “house style” standpoint, for example: Can have unformatted block content between a header and a table in a

section

Tables and figures do not have to have a title

Can have unlimited nested lists

Alpha lists can contain more than 26 items

Lists can contain only a single item

Page 37: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

37 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

SCHEMATRON CAN HELP ENFORCE DITA STYLE

What is Schematron? “Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees.” (Wikipedia)

We use Schematron to point out to the writers potential errors/lapses in our DITA House Style:

Text between a section and table not wrapped in block tags:

A list ought to have more than one item:

Page 38: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

38 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

XSL CAN ALSO HELP ENFORCE DITA HOUSE STYLE

We have a DITA house style that says nested lists should be no more than two levels deep.

Here’s Schematron doing it’s job:

And here is the result if you try to output it:

Page 39: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

39 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

SLIGHT TANGENT #2: AUTOMATION EFFORTS

General rule: if a process can be automated, do so.We have a dedicated programmer who builds tools and automation processes for our group. Sample tools built include: “Table-izer”, app that creates DITA XML table from clipboard Automated T.O.C. DITA Topic Editor, a tag-less editor aimed at SMEs Automated index based from a list of keywords Change-bar mechanism that diffs output Dynamic trademark attribution

Page 40: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

40 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

QUESTIONS AND ANSWERS

Page 41: Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

41 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public

Trademark Attribution

AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.

©2011 Advanced Micro Devices, Inc. All rights reserved.