Reports and DITA Metrics IXIASOFT User Conference 2016

51
Keith Schengili-Roberts, DITA Information Architect, IXIASOFT Nathalie Laroche, Lead Technical Writer, IXIASOFT Dustin (Dusty) Clark, Lead DITA Architect, Intel Reports and DITA Metrics

Transcript of Reports and DITA Metrics IXIASOFT User Conference 2016

Page 1: Reports and DITA Metrics IXIASOFT User Conference 2016

Keith Schengili-Roberts, DITA Information Architect, IXIASOFT Nathalie Laroche, Lead Technical Writer, IXIASOFT

Dustin (Dusty) Clark, Lead DITA Architect, Intel

Reports and DITA Metrics

Page 2: Reports and DITA Metrics IXIASOFT User Conference 2016

Agenda •  Introduction •  DITA Metrics for Production Purposes •  DITA Metrics DITA CMS •  Agile, the DITA CMS and DITA Metrics •  Topic Type Usage/Ratios •  DITA Structural Metrics •  DITA Metrics for Checking Consistency •  Migrating to DITA 1.3 •  DITA Reuse Metrics •  How To Use the DITA CMS Reports Feature (Live Demo)

[Nathalie] •  The DITA QA Plugin (Live Demo) [Dusty] •  Q/A

Page 3: Reports and DITA Metrics IXIASOFT User Conference 2016

Introductions Keith Schengili-Roberts, DITA Specialist, IXIASOFT What I do: •  Liaison with OASIS; on DITA Adoption and Technical Committees •  Industry researcher •  DITA evangelist •  Have 10+ years of experience with DITA XML

Nathalie Laroche, Lead Technical Writer and Product Owner, IXIASOFT What I do:

•  Technical Writer for more than 20 years •  Working in DITA for 7+ years •  Now also Product Owner for the IXIASOFT web tools

Dusty Clark, Lead DITA Architect, Intel Corporation What I do: •  Information architect for a distributed team •  Tools developer and systems integrator •  10+ years authoring experience, 6+ years DITA experience

Page 4: Reports and DITA Metrics IXIASOFT User Conference 2016

Why Measure Documentation Production?

Provides the ability to: • Set more accurate project

estimates • Justify need for more resources

(tools/people) • Understand quality of production •  It’s also an opportunity to

measure value

Page 5: Reports and DITA Metrics IXIASOFT User Conference 2016

Documentation Metrics: The Bad Old Days

Prior to the advent of structured content, documentation managers were limited in what they could easily measure, mainly limited to: • How many pages/publications produced over time •  Individual writer productivity • Painstaking reviews of quality

Page 6: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA and Return on Investment (ROI)

• This has been a primary focus of much work on DITA metrics •  Topic-based nature of DITA

lends itself to cost-based measures

• The book DITA Metrics 101 (2013) looks at this aspect almost exclusively, focusing on justifying the cost of investing in DITA + CMS

Page 7: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA Metrics for Production Purposes

• But not everything is about ROI: § What if you have a mature DITA environment and have

already established your ROI? § Or, are simply looking for ways to use DITA + metrics to

measure things that are not cost-related?

•  DITA metrics can be used to guide managers, information architects and writers on how to improve their content

Page 8: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA Production Metrics without a CMS

• DITA metrics outside of a CMS are limited to information contained within the DITA files + file system § Can search for text strings within the XML, and also use

date/time info from filename… and that’s about it §  Make no mistake though, there’s plenty of information there to be

mined

• But more options are available within the IXIASOFT DITA CMS

Page 9: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA Production Metrics with the DITA CMS •  The DITA CMS captures additional information which can be

used for metrics, including: § Author information § Workflow status § How many times a topic has been modified/versioned § Topic/map dependencies § Word count …and much more!

Page 10: Reports and DITA Metrics IXIASOFT User Conference 2016

• At Scrum meetings doc manager can report on topics assigned to their group and report on how “done” they are

• DITA CMS enables you to capture a snapshot of how “done” (i.e. workflow status) the topics/images/objects in your map

Agile and DITA Metrics

Page 11: Reports and DITA Metrics IXIASOFT User Conference 2016

Time and Workflow Metrics

• This can be done with the DITA CMS, and it tracks who is responsible for which topic production, and whether it is on schedule

• This is possible because workflow data is an associated object to map/topics; not possible with DITA alone

Page 12: Reports and DITA Metrics IXIASOFT User Conference 2016

Content Types and Document Make-up Looks at the topic types that go into maps •  Why would this matter? It can provide you with an idea as

to whether content is being properly “typed”. It ensures that writers are writing/structuring content properly. Some examples: § A typical “Installation Guide” ought to be made up

primarily of task topics § APIs ought to have a lot of reference topics

§ Would generally expect to have more maps than bookmaps

Page 13: Reports and DITA Metrics IXIASOFT User Conference 2016

Content Types within a Single Document • Single document: § Joe Gollner and Eliot Kimber

have uploaded an excellent set of sample DITA demo files at: github.com/gnostyx/dita-demo-content-collection

§  It is a User Guide for a fictional software application called “Thunderbird”

§  It’s a User Guide, but there doesn’t seem to be a lot of task topics to help people use the product…

Page 14: Reports and DITA Metrics IXIASOFT User Conference 2016

Counting String Instances in Excel

•  Do a search for each topic type contained the map, then count the results

•  If you can output results to Excel, simply select column and use COUNTIF with string you are looking for, in this case:

=COUNTIF(B2:B100, "concept")

Page 15: Reports and DITA Metrics IXIASOFT User Conference 2016

Thunderbird Document Metrics

•  I would argue that a user-oriented document ought to have a more even balance of concepts and tasks than we see here

• My direction to the Thunderbird technical writers: check that all possible tasks a user might encounter are explained Count: 87

s

Page 16: Reports and DITA Metrics IXIASOFT User Conference 2016

Content Types within All Documents Over a Year

•  This chart looks at the DITA topic breakdown for all documentation produced by IXIASOFT in 2015

•  Documentation consists of User/Admin Guides for our DITA CMS and TEXTML software

•  Good ratio of concept to task topics

•  When I showed this to our Lead Tech Doc person, she immediately wanted to investigate the 3% of generic topic types §  Nice practical example of how

DITA metrics can improve quality! Count: 1307

Page 17: Reports and DITA Metrics IXIASOFT User Conference 2016

Tracking Topic Type Usage Over Several Years •  These charts look at several years-worth of semiconductor documents

•  While I expected a high percentage of reference topics, I wondered whether there were more topics that ought to be tasks which were instead done as references

Page 18: Reports and DITA Metrics IXIASOFT User Conference 2016

Tracking Topic Type Usage Directing Change

• Asked writers to be more diligent about writing task topics where they might be temped to write them as references instead

• Result was a measurable increase in the percentage of task topics created over the course of the following year

Page 19: Reports and DITA Metrics IXIASOFT User Conference 2016

Ratio of Structural Elements

•  Look at the ratio of structural elements, such as ditamaps and maps

•  Why? Provides an idea as to how content is being structured

•  If, for example, you use maps as “sub-maps”, would expect to see more maps than bookmaps § That’s exactly what we see

here

Count: 118

Page 20: Reports and DITA Metrics IXIASOFT User Conference 2016

Readability Metrics

• Readability statistics provides an idea as to how easy or hard a document is to read

• Why? The need for clarity and simplicity. “Most users prefer clear, simple language, [web]site visitors with poor reading skills need it.” (Nielsen & Loranger)

•  In documentation you want to aim at or below the likely reading level of your audience

• One of the most widely-accepted readability metrics is the Flesch-Kincaid reading ease and grade level tests

• Can either do this topic-by-topic or document-by-document

Page 21: Reports and DITA Metrics IXIASOFT User Conference 2016

Other Possible DITA Consistency Checks

•  If you have a house style that recommends against certain tags (for example: <b>, <i> or <u>) search for topics containing those tags

•  If you want to optimize use of relationship tables, look at the ratio between the number of topics in a map and the number of topicrefs within the relationship table

• Are you adding short descriptions to your topics?

Page 22: Reports and DITA Metrics IXIASOFT User Conference 2016

A Couple of Sample Results

•  Clearly Thunderbird is doing something right! ;) •  List of non-compliant IXIASOFT topics merits further

investigation

Page 23: Reports and DITA Metrics IXIASOFT User Conference 2016

Preparing to Move Content to DITA 1.3

• DITA 1.3 opens up many new possibilities for structuring and describing content

• Using new elements/features opens up new possibilities • A couple of easy examples:

§ New XML Mention domain means that you can replace angle brackets for tags (i.e. &lt; and &gt;) with a pair of “<xmlelement>” tags §  This is the most common example, and there are other entities in this

domain for describing attributes, parameters, numeric characters and more

§ With new Troubleshooting topic type, look for obvious candidates for topic conversion containing the word “troubleshoot”

Page 24: Reports and DITA Metrics IXIASOFT User Conference 2016

Results from Search for Angle Bracket Entities

• 40 matches were found in 1502 topics from 2015

Sample DITA file full of &lt;*&gt; examples

Page 25: Reports and DITA Metrics IXIASOFT User Conference 2016

Results from Search for “trouble*” in Topics

•  38 file matches from 1502 topics; each would need to be investigated

•  Example above is a solid troubleshooting topic candidate

Page 26: Reports and DITA Metrics IXIASOFT User Conference 2016

Other DITA 1.3 Possibilities

• Search all maps for the names of keys and look for those that have the same value (“name”) §  Introduction of keyscopes in DITA 1.3 allows you to

share keys (and the values) across maps; identifying key matches suggests opportunities for key scoping

• Search for instances of MathML or SVG graphics § DITA 1.3 has MathML and SVG “baked in”, so you can

insert code directly or partition them off as referenced topics

§  In most instances search for content contained with <foreign> tags for likely candidates

Page 27: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA Reuse Metrics

• Arguably the most influential article on this topic is Bill Hackos’ “Reuse of DITA Topics? What is the Best Metric to Measure the Success of Your Reuse of DITA Topics?” (http://ow.ly/X7mzM)

Page 28: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA Reuse Metrics

• Bill Hackos proposed “Percent Repository Words Reused in Context” (PRWRC) where: PRWRC = (Words in All Produced Content – Words in the Repository)/(Words in the Repository) §  From his example:

§  Document1 – 25,413 words §  Document2 – 23,069 words §  Document3 – 26,366 words §  Total number of words in the produced documents – 74,848 words §  Total number of words in the repository – 40,060 words

PRWRC = (74,848 – 40,060)/40,060 = 87%

Page 29: Reports and DITA Metrics IXIASOFT User Conference 2016

Example Based on IXIASOFT DITA Documents

Based on 2015 numbers from IXIASOFT documentation:

• Total number of words in the repository: 268,663 • Words in All Produced Content: 623,078 • PRWRC = (623,078 – 268,663)/268,663 • PRWRC = 354,415 / 268,663 = 132%

Page 30: Reports and DITA Metrics IXIASOFT User Conference 2016

How is a +100% Value Possible? •  Easy: ditaval •  Though ditaval is not mentioned in original article, Bill

Hackos does talk about +100% values being entirely possible

•  We have number of publications that are created based on a series of ditaval values, as much as 21 per bookmap

Page 31: Reports and DITA Metrics IXIASOFT User Conference 2016

How To Use the IXIASOFT DITA CMS Reports Feature

• DITA CMS contains a tremendous amount of info: § Workflow § Authors § Number of revisions § Creation and modification dates § Versions § Labels § Conditions § Localization § Reviews

• DITA CMS Reports Feature: Data mining tool

Page 32: Reports and DITA Metrics IXIASOFT User Conference 2016

The DITA CMS Reports Feature

3 steps:

1.  Create a query: What information do you want to extract from the Content Store?

2.  Create a viewpoint: How do you want to organize the information?

3.  Create the report: Associate a query with a viewpoint

Page 33: Reports and DITA Metrics IXIASOFT User Conference 2016

Running the Report

§ The DITA CMS runs the query, organizes the results according to the viewpoint specified, and then uses an XSL file to transform the data

§ By default, it generates an HTML report (this can be configured)

§ Two ways of generating report: •  Manually: HTML report •  Scheduler: HTML report + .tsv file of the results

And that’s where the fun begins J

Page 34: Reports and DITA Metrics IXIASOFT User Conference 2016

Step 1: Create a Query

• Search all topics • Save query as xml

(e.g., “All topics”)

Page 35: Reports and DITA Metrics IXIASOFT User Conference 2016

Step 2: Create a Viewpoint

Page 36: Reports and DITA Metrics IXIASOFT User Conference 2016

Step 3: Create the Report

Page 37: Reports and DITA Metrics IXIASOFT User Conference 2016

Open .tsv in Data Mining Tool

Page 38: Reports and DITA Metrics IXIASOFT User Conference 2016

Other Examples

• Reports of maps and topics that were created in the last release cycle: §  Is the # of topics what you expected?

§  # is higher: Not enough reuse? Unplanned features? §  # is lower: Overestimates? Length of topics?

§ Look at topic titles: Can you see possibilities for reuse? (e.g., two topics with the same title, yet another « log in » topic)

Page 39: Reports and DITA Metrics IXIASOFT User Conference 2016

Other Examples

• Report of topics that were reused in the release cycle: § Search for topics that were created before the start of the

release and were modified during the release § Compare with topics that were created during the

release to get a ratio • Report of modified topics over a week: § Performance issues? Look at modification dates; are

your writers all checking in at the same time?

Page 40: Reports and DITA Metrics IXIASOFT User Conference 2016

Why Use Reports?

• Reports can be scheduled at specific intervals (e.g., weekly and monthly reports)

• Queries can be complex (Think once, do many times)

• Process can be reproduced systematically (you know it’s always the same query that gets run)

• Reports can be discussed and planned before the start of the work

Page 41: Reports and DITA Metrics IXIASOFT User Conference 2016

The DITA QA Plugin

Page 42: Reports and DITA Metrics IXIASOFT User Conference 2016

Overview

• DITA Open Toolkit Plugin • Part of the DITA Community project on GitHub • Generates: § HTML dashboard overview § Detailed CSV report § XML data file

• Checks can be customized § Structural § Terminology § Count metrics

Page 43: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA CMS – QA Output Type

Page 44: Reports and DITA Metrics IXIASOFT User Conference 2016

DITA QA Plugin

• Download: https://github.com/dita-community/org.dita-community.qa

•  IXIASOFT documentation: QA Plugin setup

• Ditanauts blog: http://ditanauts.org/tag/qa

Page 45: Reports and DITA Metrics IXIASOFT User Conference 2016

Setting up the QA Plugin

• Follow instructions in IXIASOFT documentation to set up

• Reminder: chunk attribute must be set on root map § Set using xmltask (for IXIASOFT CMS) § Set on map itself § Set using setchunk parameter

Page 46: Reports and DITA Metrics IXIASOFT User Conference 2016

Output

• HTML report – dashboard overview • CSV – detailed list of violations • Output map – violations ditamap • Database (.dita) file – database of all collected

values

Page 47: Reports and DITA Metrics IXIASOFT User Conference 2016

Database File Output

Page 48: Reports and DITA Metrics IXIASOFT User Conference 2016

Creating Rules

• Rules are XPATH if statements

• Add rules to xsl/qachecks/_qa_checks.xsl • Hint: there’s a compiler tool that allows you to

maintain your checks in DITA

Page 49: Reports and DITA Metrics IXIASOFT User Conference 2016

Compiler Tool – Keep Rules in DITA

Page 50: Reports and DITA Metrics IXIASOFT User Conference 2016

QA Rule Best Practices

• Supply specific resolutions for each violation • Keep the list of violations short and impactful •  Be as specific as possible (minimize false positives) • Match on the @class value instead of the element

name

Page 51: Reports and DITA Metrics IXIASOFT User Conference 2016

QA