Unstructured Content vs Structured Content: A Basic Overview

14

Click here to load reader

Transcript of Unstructured Content vs Structured Content: A Basic Overview

Page 1: Unstructured Content vs Structured Content: A Basic Overview

Unstructured vs. Structured Content(say what...?)

Note: No pretty graphics here (sorry, Slideshare best-practices police) and this was written to be read, not as a presentation deck for some conference (not that there’s...actually there is something wrong with that…why would you post a deck that requires a presenter? Seriously. Oh, I guess I can see when...never mind))

Page 2: Unstructured Content vs Structured Content: A Basic Overview

What is ‘content’?In the context of this subject, content is any technical or business documentation. Stories, how-tos, tutorials, manuals, regulations, parts lists, specifications, help desk info, marketing docs, slide decks, PDFs, etc.

Stuff people read or watch to learn things.

Page 3: Unstructured Content vs Structured Content: A Basic Overview

What is ‘unstructured’ content?Unstructured content is content that exists in a silo. A Word doc residing on the computer it was created on. A PDF. A PPT slide deck on someone’s laptop on a plane. A doc attached to an email for review.

This content is not typically found in a central database and must be accessed by opening it after it has been shared. It is important to note that when you share an unstructured document, there are now two versions of that document. Changes made to one are not reflected in the other.

Page 4: Unstructured Content vs Structured Content: A Basic Overview

Why is this a problem?Imagine an organization that has many pieces of information that are unstructured. They are shared by attaching them to emails, copying and pasting or printing and being passed around. Each time this is done a new version is created, meaning you have content that is duplicated and often changed but still exists in another place in the old version. This creates the problem of ‘version control’, i.e. which one is the right one or the most current one? Which one has been finalized, edited, approved, etc.?

If you have just a few docs this may be manageable, but if you have more than a few, figuring out what is what becomes a serious resource hog.

Page 5: Unstructured Content vs Structured Content: A Basic Overview

Why is this a problem? Part Two: Creation IssuesErrors and omissions.

Let’s look at an unstructured workflow for document creation and distribution. A writer gets instructions to create a User Guide. She writes it in Word, formats it according to company standards and then shares it with an editor via email attachment. The editor marks it up and sends it back. The writer accepts the changes and forwards it to her boss. The boss eventually opens it, but not right away because it got buried in her inbox. She copies and pastes a section she doesn’t understand into an email and sends it back. This goes on until the doc is approved and finalized.

How many versions are now out there?

Page 6: Unstructured Content vs Structured Content: A Basic Overview

What is the problem? Part Three: Distribution issuesYour doc is now approved. Done.

Not quite. It is widely useful content so it is decided that it needs to be on the company website and optimized for desktop and mobile viewing. A downloadable PDF is created for those who like to print things or view offline. The info needs to be repurposed for a Help application. And Marketing wants to grab a concept section for use in a sales collateral piece.

This is a publishing problem. Each use requires reformatting, often extensive. And every time that happens another version is created and potentially altered.

That’s the problem. And there are others but this is enough to get us started.

Page 7: Unstructured Content vs Structured Content: A Basic Overview

The Problem Part Four: ReuseYou created valuable content. Some parts of it can be reused in other docs. Maybe you have a list that tells how to do a common operation that is used with multiple products. You want to take that content out of your doc and put it into a new one. So…

● You open the original● You copy the list● You paste it into the new doc

This is what you do every time you need to reuse the content. Then someone changes the original. And you have to find all instances of that content and...fix!

Page 8: Unstructured Content vs Structured Content: A Basic Overview

What is the alternative?The solution to this problem is to move your creation and distribution process to a ‘structured content’ solution. Structured content is created and saved in a central database (typically utilizing the XML mark-up language) that can easily be organized and published to multiple formats without extensive reformatting.

Any reviews and edits done on document are tracked in one version and those changes are recorded in a document history.

Page 9: Unstructured Content vs Structured Content: A Basic Overview

Example of a structured content applicationDisclaimer: I work for a software company, Jorsek LLC, that makes a structured content application called easyDITA. There are several options, however easyDITA is the one I’m most familiar with so I will use its functionality as an example.

A structured content platform typically consists of two parts. One is a Content Management System or CMS. The other is an XML database with a front-end that allows a user to take full advantage of this format without coding knowledge. Content is created in the CMS but resides in the database. easyDITA is a Component CMS or CCMS that combines the two. It may also be used with other CMSs that may be optimized for certain kinds of uses- like developing learning materials.

Page 10: Unstructured Content vs Structured Content: A Basic Overview

What is DITA?DITA stands for Darwin Information Typing Architecture. Yes, that is a mouthful, and I’m not going to break down its meaning here. To learn more visit easyDITA.com for our DITA Primer. (yes, shameless plug)

DITA was developed by IBM to replace their outdated and overly complex document management system. Realizing that the problem of unstructured documentation was shared by a wide range of companies and organizations, they decided to release the standard as an open-source application, which was released to the world in 2005. It is a structured content platform.

Page 11: Unstructured Content vs Structured Content: A Basic Overview

Why use DITA?When you create a document in a DITA CMS it is an XML formatted piece of information. XML is a mark-up language not unlike HTML, which underlies the web.

XML describes attributes such as relationships between pieces of content or the type of content. It can be read by machines which means content can be assigned formatting tags that transfer when publishing to various formats.

DITA gives a content manager the ability to define different ‘types’ of content within a doc (hence the ‘Information Typing’ piece of the acronym). Types may be Topics, Concepts, Tasks, or References and specializations for these types.

Page 12: Unstructured Content vs Structured Content: A Basic Overview

Wait, why would I want to ‘type’ things?Let’s say we have five documents about a product line, produced for different variations of the product. We want to pull out any references to Parts Lists and compile them into a new document for use by the purchasing department.

With DITA, the sections of each document that are Parts Lists are marked with the type Reference, Parts List. Because of this, the writer can search through their DITA database and pull everything tagged with ‘Parts List’ into a new doc, without any copy/pasting and without changing the original in any way.

Types include Topics (what is this about?), Concepts (this works like this), Tasks (first do this, next do this) and References (specifications, lists, references to other docs).

Page 13: Unstructured Content vs Structured Content: A Basic Overview

OK, I’m starting to get it...If you create or manage content, a structured system like DITA can and will change your life. If nothing else, it is a big time saver, but that is just the beginning.

Version control, collaborative editing and review, publishing to multiple formats, eliminating errors and omissions, and even eliminating redundant translation into other languages, are all greatly improved when you use a structured content solution.

Page 14: Unstructured Content vs Structured Content: A Basic Overview

Learn more...I am relatively new to this whole structured content thing, but as a writer I’ve been looking for something like this, without knowing there was a whole world out there dedicated to making this work. This slide deck is my attempt to answer a few questions I had when I first encountered the concepts. I hope it helps.

Thanks,

Martin

[email protected]

http://www.easydita.com