Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian...

22
Assembling, Repurposing And Assembling, Repurposing And Manipulating Document Content Manipulating Document Content Using The New Office File Using The New Office File Format Format Brian Jones Brian Jones OFF 304 OFF 304 Program Manager Program Manager Microsoft Corporation Microsoft Corporation

Transcript of Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian...

Page 1: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Assembling, Repurposing And Assembling, Repurposing And Manipulating Document Content Manipulating Document Content Using The New Office File FormatUsing The New Office File Format

Brian JonesBrian JonesOFF 304OFF 304Program ManagerProgram ManagerMicrosoft CorporationMicrosoft Corporation

Page 2: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

AgendaAgenda

Overview of the new formatsOverview of the new formats

Role of XML in documentsRole of XML in documents

Evolution of MS Office file formatsEvolution of MS Office file formats

Microsoft Office Open XML format architectureMicrosoft Office Open XML format architecture

Components of the new formatsComponents of the new formats

Reference schemas and custom defined schemasReference schemas and custom defined schemas

Developing against the formatsDeveloping against the formats

Visual Studio Tools for Office (VSTO) supportVisual Studio Tools for Office (VSTO) support

Sample solution scenariosSample solution scenarios

Demos throughoutDemos throughout

Page 3: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Microsoft Office Open XML FormatsMicrosoft Office Open XML Formats

New Default Formats: New XML file formats for Word, Excel and New Default Formats: New XML file formats for Word, Excel and PowerPointPowerPoint

New file type extensionsNew file type extensions

Interoperable: Open, transparent format improves interoperabilityInteroperable: Open, transparent format improves interoperabilityPublished file format specification with royalty-free licensePublished file format specification with royalty-free licenseTransparent, XML format enables new integration scenarios for documents Transparent, XML format enables new integration scenarios for documents and LOB systemsand LOB systems

Added Benefits: compact and robustAdded Benefits: compact and robustZIP container allows for standard compression on all files without user ZIP container allows for standard compression on all files without user effort (Dramatic file size improvements)effort (Dramatic file size improvements)Significantly more robust files to help minimize data lossSignificantly more robust files to help minimize data loss

Backward Compatible: Office 2000, Office XP, Office 2003 will all Backward Compatible: Office 2000, Office XP, Office 2003 will all support the new formatssupport the new formats

Patches for compatibility available by launchPatches for compatibility available by launchOpen, edit and save new formatsOpen, edit and save new formats

Legacy support: Current Office 97-2003 binary file formats supportedLegacy support: Current Office 97-2003 binary file formats supportedSupport for XML formats from Office 2003, Office XP continuedSupport for XML formats from Office 2003, Office XP continued

Developers: Endless potential for developers Developers: Endless potential for developers Build solutions to read, write, and modify Office files (without the need to Build solutions to read, write, and modify Office files (without the need to run run Office APIs)Office APIs)

Page 4: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Office Open XML FormatsOffice Open XML Formats

Brian JonesBrian JonesProgram ManagerProgram ManagerMicrosoft WordMicrosoft WordDemo 1 & 2Demo 1 & 2

Page 5: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

The Role Of XML With The Role Of XML With DocumentsDocumentsScenarioScenario ExampleExampleDocument AssemblyDocument AssemblyServer-based or user-assisted Server-based or user-assisted construction of documents from archived construction of documents from archived content or database contentcontent or database content

Create sales reports from financial and Create sales reports from financial and forecast data stored in a CRM systemforecast data stored in a CRM system

Content ReuseContent ReuseMuch easier to move content between Much easier to move content between documents, including different document documents, including different document typestypes

Apply content stored in Word documents Apply content stored in Word documents to Web pages quickly and efficientlyto Web pages quickly and efficiently

Content TaggingContent TaggingAdd domain-specific metadata to Add domain-specific metadata to document content to enable custom document content to enable custom solutions solutions

Tag presentations using a specific Tag presentations using a specific taxonomy to improve knowledge taxonomy to improve knowledge management efficiency management efficiency

Document InterrogationDocument InterrogationQuery document repositories based on Query document repositories based on custom data, content types or document custom data, content types or document metadatametadata

Search for all documents containing a Search for all documents containing a specific company name or sales contactspecific company name or sales contact

Document SanitizationDocument SanitizationRemove unwanted content like Remove unwanted content like comments or embedded code from your comments or embedded code from your document when appropriatedocument when appropriate

Remove all tracked changes and Remove all tracked changes and comments from a Word document before comments from a Word document before it is publishedit is published

Page 6: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Open XML Formats Open XML Formats ArchitectureArchitecture

User view: single Office User view: single Office “file”“file”

Questionnaire.Questionnaire.docxdocx

File ContainerFile Container

Document PropertiesDocument Properties

CommentsComments

ChartsCharts

Embedded code / macrosEmbedded code / macros

Images, video, soundImages, video, sound

Custom-defined XMLCustom-defined XML

WordML / SpreadsheetML, etc.WordML / SpreadsheetML, etc.Document PartsDocument Parts

Most parts are XMLMost parts are XML

Each XML part is a discreet, compressed Each XML part is a discreet, compressed componentcomponent

Can add, extract and modify individual Can add, extract and modify individual parts without using Office programsparts without using Office programs

Corruption or absence of any part would Corruption or absence of any part would not prohibit the file from being openednot prohibit the file from being opened

Developer view: modular Developer view: modular filefile

Page 7: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Create A Document From Create A Document From ScratchScratch

Brian JonesBrian JonesProgram ManagerProgram ManagerMicrosoft WordMicrosoft WordDemo 3Demo 3

Page 8: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Components Of The New Components Of The New FormatsFormats

We make heavy use of the Open Packaging ConventionsWe make heavy use of the Open Packaging ConventionsThese are the same conventions used by the XPS guys, and you These are the same conventions used by the XPS guys, and you can leverage the same APIs for accessing Office filescan leverage the same APIs for accessing Office files

Package – ZIP ContainerPackage – ZIP Container

Part – The “files” inside the ZIPPart – The “files” inside the ZIP

Content Types – Each part has a content type that is enforced Content Types – Each part has a content type that is enforced on openon open

Relationships – Any part that references another part must do Relationships – Any part that references another part must do so via a relationshipso via a relationship

Document Properties

Application Properties

Custom Doc. Props.

Workbook

Sheet 2

Sheet 3

Sheet 1 Styles

Chart

Strings

Relationship

...

...

Page 9: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Modifying An Excel Modifying An Excel SpreadsheetSpreadsheet

Brian JonesBrian JonesProgram ManagerProgram ManagerMicrosoft WordMicrosoft WordDemos 4 & 5Demos 4 & 5

Page 10: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

The Role Of XMLThe Role Of XMLReference and custom-defined schemasReference and custom-defined schemas

XML Reference XML Reference SchemasSchemas

Display-oriented (e.g. Display-oriented (e.g. Bold, Italics, Tables, Bold, Italics, Tables, Paragraphs, Styles)Paragraphs, Styles)

Open Document FormatOpen Document Format

Enable Archival & File Enable Archival & File Formats InteroperabilityFormats InteroperabilityCustom-defined Custom-defined

SchemasSchemasData-oriented (e.g., Data-oriented (e.g., Price, Invoice)Price, Invoice)

Represents the business Represents the business information stored in the information stored in the documentdocument

Enable System Enable System IntegrationIntegration

Page 11: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

The Role Of XMLThe Role Of XMLReference and custom-defined schemasReference and custom-defined schemas

XML Reference XML Reference SchemasSchemas

Display-oriented (e.g. Display-oriented (e.g. Bold, Italics, Tables, Bold, Italics, Tables, Paragraphs, Styles)Paragraphs, Styles)

Open Document FormatOpen Document Format

Enable Archival & File Enable Archival & File Formats InteroperabilityFormats Interoperability

<w:p> <w:r> <w:rPr><w:b /></w:rPr> <w:t>John Doe</w:t> </w:r> <w:r> <w:rPr><w:i /></w:rPr> <w:t>Health Agency</w:t> </w:r></w:p>

Page 12: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

<ConferenceReport> <Date>3/24/2004</Date> <Summary> <Keyword>XML Conference (Europe)</Keyword> <Abstract>Role of XML on the Desktop</Abstract> </Summary> <Attendees> <Attendee Name=“John Doe”> <Department>Health Agency</Department> <Potential> <Sales>100</Sales> <Growth>25%</Growth> …

</Attendee>

The Role Of XMLThe Role Of XMLReference and custom-defined schemasReference and custom-defined schemas

Custom-defined Custom-defined SchemasSchemas

Data-oriented (e.g., Price, Data-oriented (e.g., Price, Invoice)Invoice)

Represents the business Represents the business information stored in the information stored in the documentdocument

Enable System Enable System IntegrationIntegration

Page 13: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

XML Data StoreXML Data Store

Brian JonesBrian JonesProgram ManagerProgram ManagerMicrosoft WordMicrosoft WordDemo 7Demo 7

Page 14: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Developing Against The Developing Against The FormatsFormats

More Reliable SolutionsMore Reliable Solutions33rdrd party tools were main cause of document corruptions party tools were main cause of document corruptions

Fully Documented FormatsFully Documented FormatsFreely available for download with a royalty free licenseFreely available for download with a royalty free licenseOffice file format schemas - Office file format schemas - Used to validate content for a given part Used to validate content for a given part

Samples, samples, samples Samples, samples, samples In the form of code “snippets” for easier use and integration into your In the form of code “snippets” for easier use and integration into your VSTO solutions VSTO solutions

WinFx Packaging APIs WinFx Packaging APIs Office Open XML Formats use the Open Packaging ConventionsOffice Open XML Formats use the Open Packaging ConventionsAccess/maintain parts and relationships within a fileAccess/maintain parts and relationships within a fileTakes care of all ZIP level functionalityTakes care of all ZIP level functionality

XPath XPath Navigation within content Navigation within content

XML DOM XML DOM Manipulating content Manipulating content

Office Open XML Resource KitOffice Open XML Resource KitTools for constructing and deconstructing the new file formatsTools for constructing and deconstructing the new file formatsDesign time Validation toolDesign time Validation tool

Parses a file and reports on schema, relationship errors and warnings Parses a file and reports on schema, relationship errors and warnings Runtime serialization tool Runtime serialization tool

Flattens package into a single file for ease of development in simple construction Flattens package into a single file for ease of development in simple construction scenarios scenarios

Page 15: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Programming Against The FormatsProgramming Against The Formats

Brian JonesBrian JonesProgram ManagerProgram ManagerMicrosoft WordMicrosoft WordDemo 8Demo 8

Page 16: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

VSTO Support For XML VSTO Support For XML FormatsFormats

VSTO application manifest becomes a VSTO application manifest becomes a partpart

Enables easier deployment and Enables easier deployment and redeployment of VSTO solutionsredeployment of VSTO solutions

Cached data feature of VSTO will be Cached data feature of VSTO will be fully supported in new file formatsfully supported in new file formats

VSTO’s ServerDocument object will VSTO’s ServerDocument object will be able to manipulate the new file be able to manipulate the new file formats without starting Office formats without starting Office applicationsapplications

Page 17: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Sample Solution ScenariosSample Solution Scenarios

Data interoperabilityData interoperability

Content manipulationContent manipulation

Content sharing and reuseContent sharing and reuse

Document assemblyDocument assembly

Document securityDocument security

Managing sensitive informationManaging sensitive information

Document stylingDocument styling

Document profiling Document profiling

Page 18: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Next StepsNext Steps

Schemas: Sneak peak at the Office Schemas: Sneak peak at the Office “12” schemas“12” schemas

We will provide an initial draft of the We will provide an initial draft of the schemas by the end of this week. See schemas by the end of this week. See my blog for my blog for more detailsmore details

Beta 1Beta 1Register for Beta 1 which comes out in Register for Beta 1 which comes out in the fourth quarter of this yearthe fourth quarter of this year

Page 19: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

Recommended Sessions & Recommended Sessions & LabsLabs

Upcoming SessionsUpcoming SessionsOFF 316OFF 316: Word 12: Integrating Business Data into : Word 12: Integrating Business Data into Documents using XML-based Data/View Separation and Documents using XML-based Data/View Separation and ProgrammabilityProgrammability

Tristan Davis - Room 406AB (Thursday @ 11:30)Tristan Davis - Room 406AB (Thursday @ 11:30)PRS 333PRS 333: Advances in Document Workflow, Securing, : Advances in Document Workflow, Securing, Viewing, and Printing Your ContentViewing, and Printing Your Content

Gregg Brown - Room 502AB (Wednesday @ 3:15)Gregg Brown - Room 502AB (Wednesday @ 3:15)OFF 322OFF 322: Building a Solution Using a Spreadsheet in : Building a Solution Using a Spreadsheet in Server-Based ScenariosServer-Based Scenarios

Danny Khen - Room 404AB (Friday @ 8:30)Danny Khen - Room 404AB (Friday @ 8:30)

Prior SessionsPrior SessionsOFF 201OFF 201: Office “12”: Introduction to the Programmable : Office “12”: Introduction to the Programmable Customization Model for the Office “12" User Experience Customization Model for the Office “12" User Experience (Part 1)(Part 1)

Jensen Harris - Room 515AB (Today @ 1)Jensen Harris - Room 515AB (Today @ 1)OFF 302OFF 302: Office “12'': Developing with the Programmable : Office “12'': Developing with the Programmable Customization Model for the Office “12" User Experience Customization Model for the Office “12" User Experience (Part 2)(Part 2)

Andy Himberger - Room 402AB (Today @ 2:45)Andy Himberger - Room 402AB (Today @ 2:45)DAT 304DAT 304: Unleashing the Power of XPS-Based File Formats : Unleashing the Power of XPS-Based File Formats for your Applicationfor your Application

Jesse McGatha - Room 408AB (Today @ 2:45)Jesse McGatha - Room 408AB (Today @ 2:45)

Page 20: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

ResourcesResources

Office Preview Site: http://www.microsoft.com/office/preview/

Brian Jones’s Blog: http://blogs.msdn.com/Brian_Jones/

Office 2003 Reference Schema Information: http://www.microsoft.com/office/xml/

Office Developer Center: http://msdn.microsoft.com/office/

Page 21: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

QuestionsQuestions

Page 22: Assembling, Repurposing And Manipulating Document Content Using The New Office File Format Brian Jones OFF 304 Program Manager Microsoft Corporation.

© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.