ISO/IEC JTC 1/SC 32 N 1761
Transcript of ISO/IEC JTC 1/SC 32 N 1761
ISO/IEC JTC 1/SC 32 N 1761 Date: 2008-05-25
REPLACES: --
ISO/IEC JTC 1/SC 32
Data Management and Interchange
Secretariat: United States of America (ANSI)
Administered by Farance Inc. on behalf of ANSI
DOCUMENT TYPE Information from JTC1 Secretariat TITLE Efficient Binary representation of XML - Presentation SOURCE JTC1 Secretariat PROJECT NUMBER 1.32. STATUS In accordance with JTC 1 Gold Coast resolution 45, the attached document,
presented at the Technology Watch meeting on the Gold Coast, is forwarded to SC 6, SC 29, SC 32 and SC 34 to access potential opportunities.
REFERENCES ACTION ID. ACT REQUESTED ACTION
DUE DATE Number of Pages 27 LANGUAGE USED English DISTRIBUTION P & L Members
SC Chair WG Conveners and Secretaries
Dr. Timothy Schoechle, Secretary, ISO/IEC JTC 1/SC 32 Farance Inc *, 3066 Sixth Street, Boulder, CO, United States of America Telephone: +1 303-443-5490; E-mail: [email protected] available from the JTC 1/SC 32 WebSite http://www.jtc1sc32.org/ *Farance Inc. administers the ISO/IEC JTC 1/SC 32 Secretariat on behalf of ANSI
ISO/IEC JTC 1 SWG for Technology Watch Secretariat: US (ANSI)
ISO/IEC JTC1 TW0043 2008-03-14
Document Type: Presentation
Document Title: Efficient Binary representation of XML
Document Source: Dr. Raymond Wong, Australia, National ICT Australia (NICTA)
Project Number:
Document Status: Final
Action ID: ACT or FYI
Due Date:
Distribution: TWG and JTC 1
No. of Pages: 25
Note:
The imagination driving Australia’s ICT future
Efficient Binary Representation of
XMLRaymond Wong and Bill Shui
mContext Project
National ICT Australia
The imagination driving Australia’s ICT future
Even you have a large flash memory card
• Runtime footprint will be huge!!!
• e.g., Runtime footprint = 10 x original storage size
=> = 50 x compressed doc size
So size of memory footprint is critical !!!
Decompression
Runtime footprint
The imagination driving Australia’s ICT future
Problem of simply compressing XML
• When reading the compressed data
– Need decompression
– Need space for (compressed + decompressed) data
Compression Decompression
+
The imagination driving Australia’s ICT future
Solutions
• Binary XML to improve processing and space
efficiency
• Benefits:
– Still maintain existing XML effort in managing and storing
information.
– Prevent branching of multiple alternative formats.
The imagination driving Australia’s ICT future
Minimal requirements
• MUST SUPPORT– Directly Readable and Writable
– Transport Independence
– Compactness
– Human Language Neutral
– Platform Neutrality
– Integratable into XML Stack
– Royalty Free
– Fragmentable
– Streamable
– Roundtrip Support
– Generality
– Schema Extensions andDeviations
– Format Version
– Identifier
– Content Type Management
– Self Contained
• MUST NOT PREVENT– Processing Efficiency
– Small Footprint
– Widespread Adoption
– Space Efficiency
– Implementation Cost
– Forward Compatibility
The imagination driving Australia’s ICT future
Existing proposals
• Efficient XML (EXI) a proposed W3C Standard for BinaryXML (EXI)
• ASN.1 X.694 with BER (Basic Encoding Rules)
• ASN.1 X.694 with PER (Packed Encoding Rules)
• XML + gzip
• Fast Infoset (Sun Microsystems)
• FXDI (Fujitsu Binary)
• Xebu
• ASN.1 X.694 with PER + Fast Infoset
• Efficiency Structured XML (esXML)
• BiM (from MPEG 7)
• WBXML (Wireless Binary XML or WAP Binary XML)
The imagination driving Australia’s ICT future
Problems when XML data are edited
• Higher CPU Usage for re-packaging and compressing
the entire dataset.
• More runtime space usage:
– Runtime Storage Required = old version + newly compressed
version.
• Non of the proposed standards supports efficient update
operations.
The imagination driving Australia’s ICT future
mContext binary XML
• Meets both MUST haves and NOT PREVENTS.
• Works with and without Schema Information.
• Small and constant runtime footprint.
• It is not a compressed format lower CPU usage.
• Fast update, navigation and access of XML nodes
regardless of size.
• Can directly map to existing SAX and DOM interfaces.
• Already able to link to MSXML and Xerces.
The imagination driving Australia’s ICT future
mContext binary XML
• Compatible with existing algorithms for efficient XPath,
XQuery and XSLT processing.
• Extensible for third party text compression schemes.
• No more XML parsing.
• Tested up to 16GB of XML data, more than 770million
nodes.
• API available in C/C++, Java and C#.
• Works on mobile devices, desktop and server
environments.
The imagination driving Australia’s ICT future
Summary
• Data size of XML documents will increase.
• Binary XML is needed to secure the extensive usage of
XML for large and small computing devices.
• mContext Binary XML enables
– Satisfies requirements of existing standards group on binary
XML formats.
– Fast update, navigation and random access of XML data.
– Succinct structure without compression better utilisation of
processing resources.
– API ready to adapt to existing XML based infrastructures.
The imagination driving Australia’s ICT future
The End
For further information, please contact us at
The imagination driving Australia’s ICT future
mContext Succinct Binary XML
• It is published in World Wide Web Conference 2007.
– http://www2007.org/htmlpapers/paper794/
The imagination driving Australia’s ICT future
An Example Result
100M Data (public domain) Commercial
software lib
(MSXML)
mContext
Memory footprint 329MB 67MB
Loading time 17.8s 0.67s
Runtime footprint
(search)
333MB 67MB
Processing time (search) 1.814s 0.143s
The imagination driving Australia’s ICT future
ASN.1 X.694
• BER
– Uses ASN.1 with BER for encoding.
– Uses X.694 for mapping XSD to ASN.1.
– Advantage:
• Binary tokens and binary texts smaller size than XML
– Disadvantage:
• Requires schema information for the encoding.
• PER
– Same as above, but with higher compression ratio.
– However, still suffer from the same disadvantage.
The imagination driving Australia’s ICT future
Fast Infoset
• Failed the compactness test.
• Performs badly without the knowledge of schema.
The imagination driving Australia’s ICT future
Fujitsu XML Data Interchange (FXDI)
• Fails the version info test by W3C
• However, it is also heavily dependant on the knowledge
of schema information.
• High compression is achieved when schema information
is provided.
• Uses separate encoding and decoding API for reading
and writing binary XML. However, performs much worse
in processing time when schema is used.
The imagination driving Australia’s ICT future
Xebu
• Splits into Xebu and Xebu-S.
• http://www.w3.org/XML/EXI/eval/xebu-evaluation.html
• Fails the compactness test.
• Designed only mainly for mobile phones. Not well tested
on larger systems.
• Not totally self-contained.
The imagination driving Australia’s ICT future
Efficiency Structured XML (esXML)
• Fails compactness test.
• Uses pointer based layers in its info-set.
The imagination driving Australia’s ICT future
XML in Enterprise Systems
Effective service-oriented
architecture needs efficient
XML handling
Almost all
documents in
XML format