XML-binary Optimized Packaging

Editors Copy $Date: 2004/02/25 18:11:18 $ 27 January 2004

This version:
http://www.w3.org/TR/2004/WD-soap12-xop-20040127
Latest version:
http://www.w3.org/TR/soap12-xop
Editors:
Noah Mendelsohn, IBM
Mark Nottingham, BEA
Hervé Ruellan, Canon

Abstract

This document defines the XML-binary Optimized Packaging (XOP) convention, a means of more efficiently serializing XML Infosets that have certain types of content.

Status of this Document

This document is an editors' copy that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is an editor's copy of the XML-binary Optimized Packaging document. It has been produced by the XML Protocol Working Group (WG), which is part of the Web Services Activity.

Discussion of this document takes place on the public xml-dist-app@w3.org mailing list (public archive) under the email communication rules in the XML Protocol Working Group Charter .

Comments on this document are welcome. Send them to xmlp-comments@w3.org mailing list (public archive). Note that all outstanding issues against this document are documented in the Working Group Issues List.

Patent disclosures relevant to this specification may be found on the Working Group's patent disclosure page.


Short Table of Contents

1. Introduction
2. XOP Packages
3. XOP Infosets Constructs
4. XOP's Processing Model
5. Identifying XOP Documents
6. Security Considerations
A. References
B. Change Log (Non-Normative)


Table of Contents

1. Introduction
    1.1 Terminology
    1.2 Example
    1.3 Notational Conventions
2. XOP Packages
    2.1 MIME Multipart/Related XOP Packages
3. XOP Infoset Constructs
    3.1 xop:Include Element Information Item
    3.2 href Attribute Information Item
    3.3 xop-mime:content-type Attribute Information Item
4. XOP's Processing Model
    4.1 Creating XOP Packages
    4.2 Interpreting XOP Packages
5. Identifying XOP Documents
6. Security Considerations

Appendices

A. References
B. Change Log (Non-Normative)


1. Introduction

This specification defines the XML-binary Optimized Packaging (XOP) convention, a means of more efficiently serializing XML Infosets that have certain types of content.

A XOP package is created by placing a serialization of an XML Infoset inside of an extensible packaging format (such as MIME Multipart/Related, see [RFC 2387]) and then re-encoding selected portions of its content alongside it, while marking their locations in the XML with a special element that links to the packaged data using URIs.

Optimization in XOP is limited to the content of those elements which contain characters that can be interpreted as the canonical lexical representation of the XML Schema base64Binary datatype (see [XML Schema Part 2] 3.2.16 base64Binary and Errata in XML Schema, E2-54). Attributes, non-base64-compatible character data, and data not in the canonical representation of the base64Binary datatype cannot be successfully optimized by XOP.

Editorial note: HR 
Track change of any XML Schema spec new edition incorporating the Erratas, to replace the double reference by only one.

This specification uses terminology from the XML Infoset when discussing XML content and structure. This is only a convention for the clear specification of XOP's behaviour. When doing so, this specification abbreviates some Information Item types for clarity; for example Element, when used in this specification, refers to an Element Information Item, and Attribute refers to an Attribute Information Item.

The remainder of this specification is organized in the following fashion:

1.1 Terminology

The following terms are used in this specification:

  • Original XML Infoset - An XML Infoset to be optimized.
  • Extracted Content - Optimized content which has been removed from the Infoset.
  • XOP Infoset - The Original Infoset with any Extracted Content removed and replaced by xop:Include elements.
  • XOP Document - A serialization of the XOP Infoset using XML 1.0.
  • XOP Package - A package containing the XOP Document and any Extracted Content. As a whole, the XOP Package is an alternate serialization of the Original Infoset.
  • Reconstituted XML Infoset - An XML Infoset that has been constructed from the parts of a XOP Package.
Architecture of the XOP framework

Figure 1: Architecture of the XOP framework

1.2 Example

Example 1 shows an XML Infoset prior to XOP processing. Example 2 shows the same Infoset, serialized using the XOP format in a MIME Multipart/Related package. The base64-encoded content of the m:photo and m:sig elements have been replaced by a xop:Include element, while the binary octets have been serialized in separate MIME parts.

Example 1: XML Infoset prior to XOP processing
<soap:Envelope
    xmlns:soap='http://www.w3.org/2003/05/soap-envelope' 
    xmlns:xop='http://www.w3.org/2003/12/xop/include' 
    xmlns:xop-mime='http://www.w3.org/2003/12/xop/mime'>
  <soap:Body>
    <m:data xmlns:m='http://example.org/stuff'>
      <m:photo xop-mime:content-type='image/png'>
        /aWKKapGGyQ=
      </m:photo>
      <m:sig xop-mime:content-type='application/pkcs7-signature'>
        Faa7vROi2VQ=
      </m:sig>
    </m:data>
  </soap:Body>
</soap:Envelope>
Example 2: XML Infoset serialized as a XOP package
MIME-Version: 1.0
Content-Type: Multipart/Related;boundary=MIME_boundary;
	      type=text/xml;start="<mymessage.xml@example.org>"
Content-Description: An XML document with my picture and signature in it

--MIME_boundary
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-ID: <mymessage.xml@example.org>

<soap:Envelope
    xmlns:soap='http://www.w3.org/2003/05/soap-envelope'
    xmlns:xop='http://www.w3.org/2003/12/xop/include'
    xmlns:xop-mime='http://www.w3.org/2003/12/xop/mime'>
  <soap:Body>
    <m:data xmlns:m='http://example.org/stuff'>
      <m:photo xop-mime:content-type='image/png'>
        <xop:Include href='cid:http://example.org/me.png'/>
      </m:photo>
      <m:sig xop-mime:content-type='application/pkcs7-signature'>
        <xbinc:Include href='cid:http://example.org/my.hsh'/>
      </m:sig>
    </m:data>
  </soap:Body>
</soap:Envelope>

--MIME_boundary
Content-Type: image/png
Content-Transfer-Encoding: binary
Content-ID: <http://example.org/me.png>

// binary octets for png

--MIME_boundary
Content-Type: application/pkcs7-signature
Content-Transfer-Encoding: binary
Content-ID: <http://example.org/my.hsh>

// binary octets for signature

--MIME_boundary--

1.3 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119].

This specification uses a number of namespace prefixes throughout; they are listed below. Note that the choice of any namespace prefix is arbitrary and not semantically significant.

Table 1: Prefixes and Namespaces used in this specification.
Prefix Namespace
Notes
xop "http://www.w3.org/2003/12/xop/include"
A normative XML Schema [XML Schema Part 1], [XML Schema Part 2] document for the "http://www.w3.org/2003/12/xop/include" namespace can be found at http://www.w3.org/2000/xp/Group/3/06/Attachments/include.xsd.
xop-mime "http://www.w3.org/2003/12/xop/mime"
[TBD]
xs "http://www.w3.org/2001/XMLSchema"
The namespace of XML Schema data types [XML Schema Part 2].

2. XOP Packages

XOP is capable of using a variety of underlying packaging mechanisms. This section specifies how a particular packaging mechanism, MIME Multipart/Related, is used, but does not preclude the use of other packaging mechanisms with the XOP convention.

2.1 MIME Multipart/Related XOP Packages

This section describes how MIME Multipart/Related packaging (as specified in [RFC 2387]) is used with XOP.

The root MIME part is the root part of the package, and MUST be an XML 1.0 serialization [XML 1.0] of the XOP Infoset, as defined below, and MUST be identified with the [ TBD ] media type.

Editorial note: HR 
Need to define the media type.

Except for purposes of determining the root MIME part, as specified by [RFC 2387], ordering of MIME parts MUST NOT be considered significant to XOP processing or to the construction of the XOP Infoset.

Part metadata is reflected in MIME header fields. Specifically, if the URI used in the value of a xop:Include element's href attribute has a 'cid' scheme, the corresponding MIME part's Content-ID header field MUST have a corresponding field-value. Otherwise, the MIME part's Content-Location header field MUST have a field-value identical to the URI in the value of the href attribute.

Furthermore, if a xop-mime:content-type header is found (as described in 4. XOP's Processing Model), it SHOULD be reflected in the MIME Content-Type header's field-value.

3. XOP Infoset Constructs

XOP operates by transforming the supplied Original Infoset into a more compact XML representation, which is achieved by removing the Character children of Elements to be optimized and replacing them with an Element named xop:Include . The xop:Include Element contains an Attribute with a link to the structure that is created to carry a binary representation of the data removed from the original Element. Details of the construction and processing of XOP serializations are provided in 4. XOP's Processing Model.

The Infoset used as input to XOP processing MUST NOT contain any Element with a namespace name of http://www.w3.org/2003/12/xop/include and a localname of Include. Infosets containing such Elements cannot be serialized using XOP.

The following subsections provide formal definitions for the Elements and Attributes used to construct a XOP serialization.

3.1 xop:Include Element Information Item

The xop:Include Element property values are as follows:

  • nanemspace name MUST be http://www.w3.org/2003/12/xop/include.
  • localname MUST be Include.
  • children MUST NOT contain any Information Item.
  • There MAY be more than one Attributes comprising attributes. Among these MUST be the following:
  • Other properties such as base-uri, parent nad in-scope namespaces MUST be set according to the context.
Editorial note: Gudge 
Should not allow other children either.

3.2 href Attribute

The href Attribute has the following Infoset property values:

  • namespace name MUST be empty.
  • localname MUST be href.
  • normalized value MUST be a representation of a URI referencing the part of the package containing the data logically included by the parent Element (i.e., the xop:Include Element).
  • owner element MUST be the xop:Include Element which is parent of the Attribute.

3.3 xop-mime:content-type Attribute Information Item

The xop-mime:content-type Attribute has the following Infoset property values:

  • namespace name MUST be http://www.w3.org/2003/12/xop/mime.
  • localname MUST be content-type.
  • normalized value MUST be the content-type of the binary data represented as base64 encoded data in the Element parent of this Attribute.
  • owner element MUST be set according to the context.
Editorial note: HR 
Write the corresponding schema.

4. XOP's Processing Model

This section describe XOP's Processing Model, both for creating XOP Packages and interpreting XOP Packages. Unless otherwise stated, processing of XOP Packages MUST be semantically equivalent to performing the specified steps separately, and in the order given.

4.1 Creating XOP Packages

To create a XOP Package from an Original XML Infoset:

  1. Ensure that the Original Infoset contains no Element with a namespace name of http://www.w3.org/2003/12/xop/include and a localname of Include. As discussed in 3. XOP Infoset Constructs, Infosets with such Elements cannot be represented using XOP.
  2. Create an empty package.
  3. Identify within the Original Infoset the Elements to be optimized. Such Information Items' children MUST only be characters representing the canonical lexical form of base64Binary as described in Errata in XML Schema, E2-54.
  4. Create a XOP Infoset which is a copy of the Original Infoset, but with the children of each Element identified in the previous step replaced by a xop:Include Element (see 3.1 xop:Include Element Information Item) constructed as follows:
    1. Transform the replaced characters into binary data by processing them as base64-encoded data.
    2. Serialize the binary data into a new part of the package, with appropriate metadata corresponding to the normalized value of the href Attribute of the xop:Include Element (see 3.2 href Attribute Information Item).
    3. If the Information Item being optimized (i.e., the parent of the newly inserted xop:Include Element) has a xop-mime:content-type Attribute, its value SHOULD be reflected appropriately in the part's metadata.
  5. Serialize the resulting XOP Infoset into the package as XML 1.0 and identify it as the root part according to the packaging mechanism's convention.

Additional parts MAY be added to the package to satisfy application specific requirements. Other content-specific metadata MAY be reflected in the packaging metadata as appropriate.

If content cannot be successfully encoded into the XOP Infoset, implementations SHOULD behave as if that portion of the Original Infoset was not nominated for optimization.

4.2 Interpreting XOP Packages

To create a Reconstituted Infoset from a XOP Package:

  1. Parse the root part of the package as an XML 1.0 document to construct an XML Infoset (see [XML InfoSet]).
  2. Using that Infoset, for each Element which has as its children a xop:Include Element (as defined in 3.1 xop:Include Element Information Item):
    1. Locate the part of the package corresponding to the URI in the xop:Include's href Attribute (i.e., corresponding to the URI encoded in the Attribute's normalized value).
    2. Replace the Element's children with characters containing the canonical base64 encoding of the entity body of the identified package part (i.e., effectively replace the xop:Include Element with the data reconstructed from the package part).

5. Identifying XOP Documents

[TBD]

6. Security Considerations

[TBD]

A. References

[XML 1.0]
W3C Recommendation "Extensible Markup Language (XML) 1.0 (Second Edition)", Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, 6 October 2000. (See http://www.w3.org/TR/2000/REC-xml-20001006.)
[Namespaces in XML]
W3C Recommendation "Namespaces in XML", Tim Bray, Dave Hollander, Andrew Layman, 14 January 1999. (See http://www.w3.org/TR/1999/REC-xml-names-19990114/.)
[XML InfoSet]
W3C Recommendation "XML Information Set", John Cowan, Richard Tobin, 24 October 2001. (See http://www.w3.org/TR/2001/REC-xml-infoset-20011024/.)
[XML Schema Part 1]
W3C Recommendation "XML Schema Part 1: Structures", Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn, 2 May 2001. (See http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/.)
[XML Schema Part 2]
W3C Recommendation "XML Schema Part 2: Datatypes", Paul V. Biron, Ashok Malhotra, 2 May 2001. (See http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/.)
[XML Schema Part 2 Errata]
W3C Internal Working Draft 7 March 2003 Id: datatypes-with-errata.xml,v 1.5 2003/03/07 19:54:00 (See http://www.w3.org/XML/Group/2002/09/xmlschema-2/datatypes-with-errata.html.)
[RFC 2119]
IETF "RFC 2119: Keywords for use in RFCs to Indicate Requirement Levels", S. Bradner, March 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
[RFC 2387]
IETF "The MIME Multipart/Related Content-type", E. Levinson, August 1998. (See http://www.ietf.org/rfc/rfc2387.txt.)

B. Change Log (Non-Normative)

Table 2: Changes since first draft.
Who When What
HR 20040224 Changed children property of xop:Include to be empty.
HR 20040129 Changed include.xsd location.
HR 20040129 Removed starting "Note" from second paragraph of 3. XOP Data Models Constructs.
HR 20040129 Removed Ednote in 1.1.
HR 20040127 Added examples.
HR 20040127 Added request for comments on xmlp-comments@w3c.org.
HR 20040126 Misc editorial changes.
HR 20040126 Corrected usage of Data Model terms.
HR 20040123 Implemented Noah's proposed changes.
HR 20040122 Changed MIME/Multipart to MIME Multipart/Related in accordance with RFC2387.
HR 20040121 Converted from html to xml.