0 wordprocessing ml overview
-
Upload
shawn-villaron -
Category
Software
-
view
224 -
download
0
Transcript of 0 wordprocessing ml overview
Introduction to WordprocessingML
A high-level overview of the structure of a word processing
document
Ecma/TC45/2006/010 (Rev.)
The ‘Document’
• A WordprocessingML document file is a collection of multiple ‘subdocuments’, formally called stories:– The main story
– Header(s) / Footer(s)
– Footnote(s) / Endnote(s)
– Subdocuments
– Frame(s)
– Comment(s)
Shared Story Properties
• All stories* in a document share a common set of properties:
– Style information
– Numbering definitions
– Font information
– Document settings
*with one exception, which we’ll discuss later
Style Information
• A style defines a specific set of formatting properties– For example, the Normal style in Word 2003 is
defined as:• Font = Times New Roman
• Font Size = 12 point
• Font Language = anguage of Word (English (US) for me)
• Justification = Left
• Line Spacing = Single
Style Types
• Word supports six different types of styles:
– Paragraph styles
– Character styles
– Linked styles (paragraph + character)
– Table styles
– Numbering styles
– Default paragraph and character properties
Style Cascading/Inheritance
• Multiple style ‘types’ can be applied to the same part of a file, so properties are applied in a specific order.
• The properties set by one type can be removed or supplemented by following types.
• As well, styles of any given type can inherit from other styles of that type.
– e.g. The Heading 1 paragraph style inherits properties from the Normal paragraph style
Style Application
Table Characters Paragraph List Item
Table
Paragraph
Character
Direct Formatting
Numbering
Ap
plic
atio
n o
rde
r
Document Defaults
Style Example
• Styles are then applied to text via the style’s ID
Numbering Definitions
• A numbering definition consists of nine levels, each of which have formatting properties
– Paragraph properties (e.g. margins)
– Number properties (e.g. number text, justification, character formatting, etc.)
• A numbered paragraph is specified in two parts:
– The numbering definition instance
– The numbering level
Abstract Numbering Definition
• The abstract numbering definition specifies the properties for any or all of the nine levels in the list
• A numbering definition instance specifies the properties for a specific numbering definition by inheritance:
– References an abstract list definition
– Provides overrides for zero or more levels in the numbering definition
Numbering Example
• Paragraphs are associated with a particular numbering definition instance and level.
Font Information
• The font information stores two distinct pieces of information:– Embedded fonts (when the producer chooses to
embed them)
– Font type data
• The latter provides characteristics of the font which are used to find a suitable replacement when the specified font is unavailable
Document Settings
• All settings pertinent to the document are stored in separate parts
• These settings can be divided into two groups:
– Those which affect presentation• Web settings (e.g. HTML <DIV> and <FRAMESET> data)
• Compatibility options
– ‘Pure’ settings• View, zoom state
• Defaults
• User preferences (i.e. ‘don’t ask me this again’)
Story Content
• Within each story is the actual content, which consists of what are formally called block level structures:
– Paragraphs
– Tables
– Custom Markup (structured document tags, custom XML)
– Range Permissions
Story Content
• Within each paragraph are what is formally called inline structures:
– Runs
– Custom Markup (structured document tags, custom XML)
– Annotations (comments, tracked changes, bookmarks)
– DrawingML elements
– Fields
– Hyperlinks
Basic Structural Rules
• All text in a word processing document is contained within runs
– A run is a region of text with a common set of properties
• All runs must be contained within a paragraph
– A paragraph is a collection of one or more runs that is displayed as a unit (analogous to the HTML <P> tag)
Example
• A basic paragraph with three different text formats:
Basic Structural Rules
• A paragraph may itself be at any location which allows block level content:
– At the top-most level within a story (e.g. header, footer, main document)
– Nested within a table cell
– Nested within a structured document tag or annotation markers
Tables
• Similar to HTML tables, a Word table consists of the table; properties; rows; and cells.
Properties
Row
Cell
Tables
• Individual table cells can themselves contain any block level content
– This means that tables can be nested arbitrarily, etc.
Nested table
Custom Markup
• Custom markup can be applied within the contents of any story in a document
• These tags can take one of three forms:
– Smart tags
– Custom XML markup
– Structured document tags
Custom Defined XML
• A facility for embedding arbitrary user XML within the document at either block or inline levels
Structured Document Tags
• Provide granular semantics at either the block or inline levels
– e.g. region can/cannot be edited; region can/cannot be deleted; region should show a date picker/drop-down list/textbox
– Do not affect layout
• Similar to custom XML - without the XML schema semantics; with presentation data and more granular properties
Sections
• Sections in a word processing document specify:– Page properties
• Page size
• Page orientation
• Margins
– Header/footer references
– Footnote/endnote properties
– Column properties
Sections
• Sections specify (cont'd):– Line numbering
– Text direction (RTL vs. LTR; top-to-bottom vs. bottom-to-top)
Sections
• Four types of sections:
– Continuous
– Next page (start on next page)
– Even (start on next even page)
– Odd (start on next odd page)
Annotations
• Annotations in a word processing document store markup information:
– Tracked revisions (insertion, deletion, move)
– Comments
– Bookmarks
Annotations
• Annotation markup can be represented in three states:
1- Inline
Annotations
2 - ‘Non-wellformed’
• The markup not encapsulate the content –there is a start and end marker.
Annotations
3 – Property
• The deletion of a paragraph mark is in the paragraph’s property set
Headers/Footers
• There are three types of headers and footers in Word:
– Odd page header
– Even page header (optional)
– First page header (optional)
• If one of the optional types is not specified, the odd page header is used
Headers/Footers
• Headers and footers are stored in separate parts– one per header or footer
• Each section refers to its header(s)/footer(s) by an explicit relationship reference:
Headers/Footers
• The type is declared in the header/footer part:
Footnotes/Endnotes
• All footnotes are stored in a single part
– Same applies to all endnotes
• Footnote references are positioned by a special tag in run content, which specifies the footnote to reference:
Footnotes/Endnotes
• Within the footnotes part, the actual footnote story content is found via the ID:
Glossary Document
• Remember that exception to the ‘all stories share the same data’ rule?
• The glossary document is a completely distinct main story
– Specifies its own styles, lists, fonts, settings
• This story is used to store document fragments which may be inserted at a later time
File Format Types
• Template (DOTX) – classic “DOT”
• Document (DOCX) – classic “DOC”
• Both utilize the same file format –differentiation is a function of the main content type and file extension only
Disclaimer
This presentation is for informational purposes only, and should not be relied upon as a substitute or replacement for Microsoft formal file format documentation, which is available at the following website: https://msdn.microsoft.com/en-us/library/cc313118(v=office.12).aspx. Any views or opinions presented in this material are solely those of the author and do not necessarily represent those of Microsoft. Microsoft disclaims all liability for mistakes or inaccuracies in this presentation.