0 wordprocessing ml overview

38
Introduction to WordprocessingML A high-level overview of the structure of a word processing document Ecma/TC45/2006/010 (Rev.)

Transcript of 0 wordprocessing ml overview

Page 1: 0   wordprocessing ml overview

Introduction to WordprocessingML

A high-level overview of the structure of a word processing

document

Ecma/TC45/2006/010 (Rev.)

Page 2: 0   wordprocessing ml overview

The ‘Document’

• A WordprocessingML document file is a collection of multiple ‘subdocuments’, formally called stories:– The main story

– Header(s) / Footer(s)

– Footnote(s) / Endnote(s)

– Subdocuments

– Frame(s)

– Comment(s)

Page 3: 0   wordprocessing ml overview

Shared Story Properties

• All stories* in a document share a common set of properties:

– Style information

– Numbering definitions

– Font information

– Document settings

*with one exception, which we’ll discuss later

Page 4: 0   wordprocessing ml overview

Style Information

• A style defines a specific set of formatting properties– For example, the Normal style in Word 2003 is

defined as:• Font = Times New Roman

• Font Size = 12 point

• Font Language = anguage of Word (English (US) for me)

• Justification = Left

• Line Spacing = Single

Page 5: 0   wordprocessing ml overview

Style Types

• Word supports six different types of styles:

– Paragraph styles

– Character styles

– Linked styles (paragraph + character)

– Table styles

– Numbering styles

– Default paragraph and character properties

Page 6: 0   wordprocessing ml overview

Style Cascading/Inheritance

• Multiple style ‘types’ can be applied to the same part of a file, so properties are applied in a specific order.

• The properties set by one type can be removed or supplemented by following types.

• As well, styles of any given type can inherit from other styles of that type.

– e.g. The Heading 1 paragraph style inherits properties from the Normal paragraph style

Page 7: 0   wordprocessing ml overview

Style Application

Table Characters Paragraph List Item

Table

Paragraph

Character

Direct Formatting

Numbering

Ap

plic

atio

n o

rde

r

Document Defaults

Page 8: 0   wordprocessing ml overview

Style Example

• Styles are then applied to text via the style’s ID

Page 9: 0   wordprocessing ml overview

Numbering Definitions

• A numbering definition consists of nine levels, each of which have formatting properties

– Paragraph properties (e.g. margins)

– Number properties (e.g. number text, justification, character formatting, etc.)

• A numbered paragraph is specified in two parts:

– The numbering definition instance

– The numbering level

Page 10: 0   wordprocessing ml overview

Abstract Numbering Definition

• The abstract numbering definition specifies the properties for any or all of the nine levels in the list

• A numbering definition instance specifies the properties for a specific numbering definition by inheritance:

– References an abstract list definition

– Provides overrides for zero or more levels in the numbering definition

Page 11: 0   wordprocessing ml overview

Numbering Example

• Paragraphs are associated with a particular numbering definition instance and level.

Page 12: 0   wordprocessing ml overview

Font Information

• The font information stores two distinct pieces of information:– Embedded fonts (when the producer chooses to

embed them)

– Font type data

• The latter provides characteristics of the font which are used to find a suitable replacement when the specified font is unavailable

Page 13: 0   wordprocessing ml overview

Document Settings

• All settings pertinent to the document are stored in separate parts

• These settings can be divided into two groups:

– Those which affect presentation• Web settings (e.g. HTML <DIV> and <FRAMESET> data)

• Compatibility options

– ‘Pure’ settings• View, zoom state

• Defaults

• User preferences (i.e. ‘don’t ask me this again’)

Page 14: 0   wordprocessing ml overview

Story Content

• Within each story is the actual content, which consists of what are formally called block level structures:

– Paragraphs

– Tables

– Custom Markup (structured document tags, custom XML)

– Range Permissions

Page 15: 0   wordprocessing ml overview

Story Content

• Within each paragraph are what is formally called inline structures:

– Runs

– Custom Markup (structured document tags, custom XML)

– Annotations (comments, tracked changes, bookmarks)

– DrawingML elements

– Fields

– Hyperlinks

Page 16: 0   wordprocessing ml overview

Basic Structural Rules

• All text in a word processing document is contained within runs

– A run is a region of text with a common set of properties

• All runs must be contained within a paragraph

– A paragraph is a collection of one or more runs that is displayed as a unit (analogous to the HTML <P> tag)

Page 17: 0   wordprocessing ml overview

Example

• A basic paragraph with three different text formats:

Page 18: 0   wordprocessing ml overview

Basic Structural Rules

• A paragraph may itself be at any location which allows block level content:

– At the top-most level within a story (e.g. header, footer, main document)

– Nested within a table cell

– Nested within a structured document tag or annotation markers

Page 19: 0   wordprocessing ml overview

Tables

• Similar to HTML tables, a Word table consists of the table; properties; rows; and cells.

Properties

Row

Cell

Page 20: 0   wordprocessing ml overview

Tables

• Individual table cells can themselves contain any block level content

– This means that tables can be nested arbitrarily, etc.

Nested table

Page 21: 0   wordprocessing ml overview

Custom Markup

• Custom markup can be applied within the contents of any story in a document

• These tags can take one of three forms:

– Smart tags

– Custom XML markup

– Structured document tags

Page 22: 0   wordprocessing ml overview

Custom Defined XML

• A facility for embedding arbitrary user XML within the document at either block or inline levels

Page 23: 0   wordprocessing ml overview

Structured Document Tags

• Provide granular semantics at either the block or inline levels

– e.g. region can/cannot be edited; region can/cannot be deleted; region should show a date picker/drop-down list/textbox

– Do not affect layout

• Similar to custom XML - without the XML schema semantics; with presentation data and more granular properties

Page 24: 0   wordprocessing ml overview

Sections

• Sections in a word processing document specify:– Page properties

• Page size

• Page orientation

• Margins

– Header/footer references

– Footnote/endnote properties

– Column properties

Page 25: 0   wordprocessing ml overview

Sections

• Sections specify (cont'd):– Line numbering

– Text direction (RTL vs. LTR; top-to-bottom vs. bottom-to-top)

Page 26: 0   wordprocessing ml overview

Sections

• Four types of sections:

– Continuous

– Next page (start on next page)

– Even (start on next even page)

– Odd (start on next odd page)

Page 27: 0   wordprocessing ml overview

Annotations

• Annotations in a word processing document store markup information:

– Tracked revisions (insertion, deletion, move)

– Comments

– Bookmarks

Page 28: 0   wordprocessing ml overview

Annotations

• Annotation markup can be represented in three states:

1- Inline

Page 29: 0   wordprocessing ml overview

Annotations

2 - ‘Non-wellformed’

• The markup not encapsulate the content –there is a start and end marker.

Page 30: 0   wordprocessing ml overview

Annotations

3 – Property

• The deletion of a paragraph mark is in the paragraph’s property set

Page 31: 0   wordprocessing ml overview

Headers/Footers

• There are three types of headers and footers in Word:

– Odd page header

– Even page header (optional)

– First page header (optional)

• If one of the optional types is not specified, the odd page header is used

Page 32: 0   wordprocessing ml overview

Headers/Footers

• Headers and footers are stored in separate parts– one per header or footer

• Each section refers to its header(s)/footer(s) by an explicit relationship reference:

Page 33: 0   wordprocessing ml overview

Headers/Footers

• The type is declared in the header/footer part:

Page 34: 0   wordprocessing ml overview

Footnotes/Endnotes

• All footnotes are stored in a single part

– Same applies to all endnotes

• Footnote references are positioned by a special tag in run content, which specifies the footnote to reference:

Page 35: 0   wordprocessing ml overview

Footnotes/Endnotes

• Within the footnotes part, the actual footnote story content is found via the ID:

Page 36: 0   wordprocessing ml overview

Glossary Document

• Remember that exception to the ‘all stories share the same data’ rule?

• The glossary document is a completely distinct main story

– Specifies its own styles, lists, fonts, settings

• This story is used to store document fragments which may be inserted at a later time

Page 37: 0   wordprocessing ml overview

File Format Types

• Template (DOTX) – classic “DOT”

• Document (DOCX) – classic “DOC”

• Both utilize the same file format –differentiation is a function of the main content type and file extension only

Page 38: 0   wordprocessing ml overview

Disclaimer

This presentation is for informational purposes only, and should not be relied upon as a substitute or replacement for Microsoft formal file format documentation, which is available at the following website: https://msdn.microsoft.com/en-us/library/cc313118(v=office.12).aspx. Any views or opinions presented in this material are solely those of the author and do not necessarily represent those of Microsoft. Microsoft disclaims all liability for mistakes or inaccuracies in this presentation.