DocTrain East, October 19, 2007

74
DocTrain East, October 19, 2007 Painless XML Authoring? How DITA Simplifies XML Bob Doyle [email protected] [email protected] m 617-876-5676 Skype: bobdoyle

description

 

Transcript of DocTrain East, October 19, 2007

Page 1: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Painless XML Authoring?How DITA Simplifies XML

Bob Doyle

[email protected]

[email protected]

617-876-5676

Skype: bobdoyle

Page 2: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

A brief poll. Who’s heard of…

• Structured writing? Information Mapping?• Task-oriented Documentation? vs. ? • Minimalism? John Carroll?• Single-source publishing? vs. Reuse?• Component Content Management?• Topic-based authoring? • Bob Horn, John Brockmann, JoAnn

Hackos, Ginny Redish, Ruth Clark?

Page 3: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

All heard of DITA?

• Information Typing

• Topics: Concept, Task, and Reference

• DITA Maps

• DITA Open Toolkit

• DITA is Simplified XML

• Specialization

Page 4: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

A brief surevy of tools

• PTC Arbortext (Epic)

• JustSystems XMetal

• Adobe FrameMaker

• Word to DITA (in.vision, Info Mapping)

• XML Spy, oXygen

Page 5: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Heard of me?

Ph.D. Astrophysics, Harvard, 1968

Collaborative Observing Program, NASA Skylab 1970-72

Super8 Sound, 1973-78

Merlin and 5 other computer games– 1977-81

iXO Telecomputer – 1980-87

MacPublisher – 1984-1987

Digital Video Editor, New Media Magazine -1993-1999

Page 6: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Parker Brothers Games

Page 7: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

iXO Telecomputer

• Computer-initiated dialogues (AI)

• Yes, No, Help, Repeat keys• “Operators are standing by”• Stock trades, airline

reservations, bill paying.• Hearing-impaired• Powered from phone line• Venture capital $13 million• Never developed the backend

database services• Huge NOL carry-forward

Page 8: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

MacPublisher

• First Desktop Publishing Program

• 11th Certified Mac Developer• Shipped in 1984• Laserwriter in 1985• First “spot color” text on Apple

Imagewriter• First rotated text/gaphics• Sold 20,000 copies• MacIndexer• Mac-Hyphen• Sold to Letraset in 1987

Page 9: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Doing What RecentlyCEO, skyBuilders.com

Editor, CMS Review related websites – CMS Wiki, CMS Forum, CMS News, CMS Calendar, CMS Glossary, CMSML, CMS Boston, Open Internet Lexicon, TaxoTips

Founder, CM Professionals

Contributing Editor, EContent Magazine

Founder, DITA Usersrelated websites – DITA Infocenter, DITA News, DITA Newsletter, DITA Blog, DITA Wiki, and DITA Tutor

Page 10: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

The First Podcast - 2003

• Christopher Lydon (NPR’s “The Connection”)• Dave Winer• Adam Curry

• Bloggercon• BlogAudio.org• Lydon’s “Open Source” Show

Page 11: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

EContent Magazine

• Contributing Editor

• 6 columns per year

• XML Authoring Tools Review

• 12 online columns per year

• EC100 selection

Page 12: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Joined OASIS - 2006

• Organization for the Advancement of Structured Information Standards

• Member – DITA Technical Committee• Member – Learning and Content SC• Member – Help SC• Observer – Translation SC• Member – Editorial Board• Organizer – Boston DITA User Group

Page 13: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Users – Launched in March

• DITA Users is an international membership organization • ~400 members from 21 countries.• Members learn topic-based structured writing. • Author DITA with DITA Storm browser-based editor • Deliverables for web (XHTML), print (PDF), Help

(Eclipse) from single-source documents. • Members have a personal workspace folder.• Finished work on web to show colleagues and clients. • Member directory has contact information. • Discounts on major DITA conferences, on tools (?), on• DITA tutorials and workshops, and on the DITA Report.

Page 14: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Infocenter – Launched April

• DITA Infocenter is Eclipse-based Online Help• DITA Architectural Specification (1.0 and 1.1)• DITA Language Specification (1.0 and 1.1)• Open Toolkit User Guide (1.3.1)• Full-text search• Index of keywords • Table of contents• Generated from DITA files with Open Toolkit

Page 15: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA News – Launched June

• Aggregates blog posts from DITA bloggers.• Extensive listings of DITA tools from A to Z.• Events calendar with conference listings, • Websites, Publications, Webinars.• Glossary of DITA terms. • Content syndicated to other websites • Single-source publishing tools.

Page 16: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Blog – Launched July

• Group blog

• Anyone may join

• RSS feeds syndicate to DITA News

Page 17: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Wiki – Launched July

• Resources with comments and discussions. • Mediawiki software (Wikipedia)• Architectural and Language specifications • Vendors and Products• Professional Services• Edited directly by the vendors• User comments• People section - major DITA players• Glossary of terms

Page 18: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Newsletter – Launched September

• Monthly summary of DITA news

• Industry mailing list for press releases.

• DITA Mentor Awards

• Next month’s events listings

• Member discount offers

Page 19: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Tutor – Launched September

• Learning management system (Moodle LMS) • Self-paced online tutorials• Instructor-led online workshops • Powerpoint presentations • Some with audio recording• Recorded webinars • Courses in DITA techniques • Certificates of completion.

Page 20: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA User Groups

[email protected]

• http://dita.xml.org/user-groups

• Encouraging remote attendance

• Recording meeting presentations

• Archiving to DITA Tutor

• Possibly repurpose as eLearning

• What collaboration tools should we use?

Page 21: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Structured Writing – 1960’s and 70’s

• Structured writing requires an analysis of content and a reorganization into the smallest possible coherent topics. Decades of research on such analysis and organization have been done by Information Mapping™, who identified common document types, information types, and information blocks (chunks or topics) in use in education and commerce.

• The reduction in structured authoring time may be offset by the increased time needed to analyze the content and break it into reusable chunks. There is no doubt that granular content, with well-defined purposes for each paragraph and sentence, is easier to author than linear content. But you may need skilled (i.e., more expensive) information developers to chunk your material.

Page 22: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Task-oriented Documentation – 1980’s

• Task-oriented docs have replaced system-oriented or product-oriented docs - the old comprehensive user manual.

• ROI - The number of calls per month to the help desk on a product will almost certainly change when product documentation is task oriented and minimalist. And task-oriented content can feed directly into help-desk scripts.

Page 23: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Minimalism – 1990’s

• Minimalism aims to provide just what the impatient user is looking for. Remember, the web surfer is always just one click away from going to your competition's website. Your job is to strip away unnecessary content and get to the point. You can measure the return by pre-testing and post-testing content that has been re-architected along minimalist principles.

• Minimalism appears to promise reduced costs for the simple reason that there is so much less content in well-prepared minimalist material. But it takes talented people to write succinct, action-oriented procedures that get users to understand quickly what they need to know and successfully do it. And minimalist material is best when it is tested for effectiveness, adding to costs.

Page 24: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Single-source Publishing – 1990’s

• The original definition of single-source publishing was providing multiple output formats like Web, Print, and Online Help from the original documents.

• When you have one source for each piece of content, you get the astonishing ability to change it in one place and have the change propagate everywhere. A product name change becomes much more manageable. Your business-critical marketing messages are standardized everywhere. Some call single source a "single source of truth" because you are assured that your customers are not getting mixed messages that can confuse them, reduce sales, and increase the need for tech support.

Page 25: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Single-source plus Reuse

• Reusable content has a single source, of course, but reuse generally refers to content originally developed for one context that can be reused in another. This requires content that is topic-based and written for reuse by avoiding explicit references to context.

• The cost savings associated with reuse of content increase greatly when your content goes through a workflow with distinct review and approval stages, for example legal approval. Content that is reused generally can avoid all or most of the extra steps in the workflow that involve accuracy of content. You will still need design approval of the in-context appearance of the reused content.

Page 26: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Component Content Management

• The latest buzzword in CMS is "component." Most web content management (WCMS) segment content at the web page. While this may be adequate for simple websites written by one or a few content contributors, it is not acceptable for websites whose pages act as portals to diverse kinds of interactive content.

• Modern corporate pages pull content in from multiple sources. Each content block is filled with a content component managed independently of all the other blocks on the page. A component has its own versioning and scheduling, its own writers, reviewers, and approval process.

Page 27: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Topic-based authoring

• A topic is a unit of information with a title and some form of content, short enough to be specific to a single subject or answer a single question, but long enough to make sense on its own and be authored as a unit.

• A topic aims to be context-free, so it contains no links to other topics.

• In DITA, the topic is the basic unit of authoring and of reuse.

• A topic is a content component

Page 28: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Why Concept, Task, and Reference?

• Remember Macintosh doc guidelines?• Learning MacPaint, Using MacPaint, the

MacPaint Reference.• Today’s O’Reilly Books – Learning PHP,

Programming PHP, PHP – the Definitive Reference

• Concept = What is it?• Task = How do I do…?• Reference = All the details.

Page 29: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

What’s a DITA Map?

• The DITA Map provides context for your context-free topics – the content.

• You can have many maps, each one arranging the topics for different requirements – a reference manual, a tutorial, a help desk.

• The map is like a table of contents that rebuilds the book dynamically.

Page 30: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

What’s the DITA Open Toolkit?

• The Open Toolkit is an open-source end-to-end single-source publishing system.

• It takes your topics and your maps and generates multiple output format deliverables, like print (PDF), web (HTML), and Help.

• It is free and has been integrated into leading DITA editing and CMS tools.

Page 31: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Why Simplified XML?

• DITA is XML.

• XML is way harder than HTML and most writers want no part of HTML.

• So how can DITA be easier than XML?

• Because XML separates content from presentation

• And it also separates content from structure

Page 32: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

What Is Content Anyway?

• It’s not the Presentation or the Structure!

• Separate Presentation Layer from Content

• Structure the Content

• Tag Content with Meaning (semantics) by Metadata

Page 33: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Three Kinds of Markup

• The three layers use different “markup”

• Style - <font>, <b>, <i>

• Structure - <p>, <ol>

• Semantics <name>, <price>, <product>

Page 34: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Three Kinds of XML

• The three layers use different technologies

• XSLT Stylesheets (CSS)

• XML Schemas (DTDs)

• XML/DITA Documents

Page 35: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Three Different Professions

• The three layers are the work of different professionals

• Designers for Style

• Architects for Structure

• Authors for Content and metadata

Page 36: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Simplified XML again

• The DITA Open toolkit is XML with a starter set of stylesheets (XSLTs) and schemas (DTDs) so your organization does not have to invest in months or years of development

• But simplified can be too simple…

Page 37: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA is not for writers alone..

• Without style designers… (XSLTs)

• Without structural architects… (DTDs)

• DITA sucks!

• It’s like publishing your annual report in Notepad text!

• Although topics are components, they don’t have the metadata needed to assemble them intelligently.

Page 38: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

So what’s the benefit for writers?

• Your work can feed into the dynamic assembly of complex information products

• Websites, Help systems, Custom Print Documentation, Mobile snippets

• You are an assembly line writer in the age of information automation!

• Love it or hate it?

Page 39: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Topics are Content Components

• Even subtopic elements can be reusable components

• Elements just need unique IDs

• Then they can be conref’d (content referenced) which means you can include them by reference in other topics.

• Specialized topics have metadata created by the structure architects.

Page 40: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

So what is specialization?

• You can specialize structures

• You can specialize element names

• Then valid topics can be written in DITA-compliant authoring tools without knowing anything about the underlying XML

• And they can be assembled automatically using the metadata implicit in the specialization.

Page 41: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Three examples of specialization

• Concepts are specialized topics

• Tasks are specialized topics

• References are specialized topics

• By understanding those specializations, you will know how specialization works

• But remember that specialization is the work of document architects and information designers

Page 42: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

A close look at a topic

• A topic has only three required elements.

• an id attribute in the main topic tag (for reuse)

• a title

• a body

Page 43: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

A close look at a topic…

• It can have dozens of optional elements, many of which are very familiar HTML elements, like paragraphs <p>, lists <ul>, and tables <table>

Page 44: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

A close look at a topic…

• Elements are shown schematically as colored boxes in a hierarchy.

• They are actually XML tag structures, properly nested and well formed.

• <topic id="1">• <title>My Topic</title>• <shortdesc>About my

topic...</shortdesc>• <body>• <p>Some content</p>• <p>Some more content</p>• </body>• </topic>

Page 45: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

The Concept Type

• The concept type specializes topic element names and topic structure.

• The root element is renamed concept and the body element is renamed conbody.

• Any number of paragraphs, lists, tables, etc. may appear, but none of these are allowed after the first section or example.

• Sections and examples can then appear in any order.

Page 46: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

The Task Type• The task type specializes topic

element names and topic structure.

• The root element is renamed task and the body element is renamed taskbody.

• One task prerequisite and one context (both specializations of section) are followed by steps (a specialization of ordered list).

• Each step must have a command, then optional info, a step example, choices, and a step result.

• The set of steps is followed by the task result, examples, and any task postrequisite.

Page 47: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

The Reference Type

• The task type specializes topic element names and topic structure.

• The root element is renamed reference and the body element is renamed refbody.

• The refbody includes a properties element (a specialization of simpletable) a three-column table of property types, values, and descriptions.

• The element refsyn (reference syntax) is a specialization of the section element.

Page 48: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Thank you.

• Contact Bob Doyle• [email protected]

[email protected]

• Read my EContent articles• www.econtentmag.com/About/AboutAuthor.aspx?AuthorID=155

• Please join DITA Users• www.ditausers.org/membership/how_to_join

• Merlin lives!• www.theelectronicwizard.com

• This presentation is online at:• www.ditausers.org/users/bobdoyle/DocTrainEast2007.ppt

Page 49: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Users Network – 2007

• DITA Blog

• DITA Infocenter

• DITA News

• DITA Newsletter

• DITA Tutor

• DITA Users

• DITA Wiki

Page 50: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Report - November

• Coming November 2007

• Based on my XML Editors Review

• Marketplace analysis

• Vendors and Products Evaluated

• Strategies from 1 to 100s of writers

• Online tour of authoring tools

Page 51: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

XML Editors• Altova XML Spy• Cladonia Exchanger• Stylus Studio• SyncRO Soft <oXygen/>• Adobe FrameMaker• Arbortext Editor• XMetal Author• Syntext Serna

Eight top XML Editorswere studied

Chosen from 65 in CMS ReviewEditor Listings

Published inthe June issue ofEContent Magazine

Extended version -XML Editors Report

Page 52: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Which Editors Do You Use?

• A quick poll of your experience

Page 53: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

The XML Editors Report

• Personal use license

• Corporate license

• One year of release versions

• Online consulting included

• Screen share to look at interfaces

Page 54: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Pros Best Practices

Page 55: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CMS Trends

• Open Source (and Open Documents)

• Online (ASPs and Web Services)

• Offshore? (Globalization)

• Enabling technologies (XML, Javascript)

• AJAX, Web 2.0

Page 56: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Information Architecture and Content Management.

• Two Kinds of Information Architecture

• IA of document sets, books in a library, a website, the World Wide Web – organization, cataloging, metadata tagging, accessibility, findability.

• IA of a single document - page structure, allowed navigation elements and reusable content components.

Page 57: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Defining Content Management

• What is a CM System?

• What Is Content Management?

• What Is Content?

Page 58: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

What is a CM System?

• It is humans using computers and software to assist in managing content.

• It has two main parts:– The user interface.– The database (content repository).

• Everything else is magic middleware.

• It helps manage the content lifecycle.

Page 59: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

What Is Content Management?

• Content management is the whole process from creation and capture of original content to the delivery of different versions to many publishing channels:

• Print

• Web

• Cellphone

• Etc.

Page 60: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

The Content Lifecycle

• 7 stages

• Organize• Rules• Create• Storage• Assembly• Publish• Archive

• Context• Users• Content

Page 61: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Brown Television (BTV)

Doug Liman

Page 62: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Hi-8 Users Group

Funded Videomaker Magazine, Hi-8 Group became Desktop Video Group in 1992

Page 63: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

HRTV and Quad Sound

Harvard-Radcliffe Film Workshop was in the basement of Holmes Hall (North/Pforzheimer House) where the old Radcliffe Radio Station and Morse Music Library were located. In the mid-80’s it became HRTV and the radio broadcast booth and adjoining sound rooms became Quad Sound Studios.

Page 64: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CMS Review

Page 65: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Other CMS Review Sites

• CMS Forum

• CMS Wiki

• CMSML

• CMS News

• CMS Calendar

• CMS Glossary

• CMS Boston

• Memography

• Open Internet

• Lexicon

• TaxoTips

• List-2-Web

Page 66: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CMS Review Glossary

Page 67: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

Finding a CMS• The CMSML project at CMS Review and CM Pros

Select two CMS or entersearch terms to find CMSthat match your criteria.

The directory is a faceted classification scheme.

Click compare to getthe results below...

Page 68: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Professionals

• Nearly 1000 members in 2006

• Website (7/10 Google PageRank)

• Benefits - Mail, Member Directory

• Glossary, Resource Library, Calendar

• Communities - CMSML, DITA, Global

• News, Blog aggregation

• Globalization, Personalization

Page 69: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Professionals

Page 70: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Pros Member Directory

Page 71: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Pros Calendar

Page 72: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Pros Videos• Eighty hours of video from

Gilbane Conferences, IA Summit, OSCOM, Bloggercons at Harvard.

Bob Boiko interviews Shino

Page 73: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

CM Pros Communities

• CMS Markup Language

• (and Faceted CMS Directory)

• Globalization – website in 10 languages

• (translations by volunteers)

• DITA

• (JoAnn Hackos, Scott Abel, others)

Page 74: DocTrain East, October 19, 2007

DocTrain East, October 19, 2007

DITA Island

• Second Life meetings on DITA