2003-11-28
This document specifies the MIFFY format, a means of serializing XML infosets that contain certain types of content more efficiently.
A MIFFY document is created by placing a serialization of the XML infoset inside of an extensible packaging format and then re-encoding selected portions of its content alongside it, while marking their locations in the XML with a special element that links to the packaged data using URIs.
Specifically, MIFFY optimizes those XML elements that have base64Binary encoded content, and does so by packaging them in the MIME Multipart/Related format.
The following terms are defined and used by this document;
The keywords "MUST", , "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119].
This specification uses a number of namespace prefixes throughout; they are listed below. Note that the choice of any namespace prefix is arbitrary and not semantically significant (see XML Infoset [XML InfoSet]).
This specification refers to XML constructs using the XML Infoset [XML Infoset] terminology, as well as that defined by the XML Query Data Model [XML Query Data Model].
Infosets capture the tree structure, the names of the elements, the character content of elements and attributes, and so on. The Infoset does not model schema data types, such as integer, and thus provides no association between character strings and values.
The purpose of this format is to optimize certain XML-based structures by relying on type information that may be available at serialization time. MIFFY does not specify any particular means by which such type information is to be determined: schema validation is one possibility, but serializers MAY determine or establish types using other means. Type information need be provided only for element nodes that are to be optimized.
Unlike the Infoset, the XQuery 1.0 and XPath 2.0 Data Model ([XML Query Data Model] ... hereinafter referred to as the "data model") provides a model that carries type and value space information for each element and attribute. Accordingly, MIFFY is expressed in terms of that data model. A precondition for use of this format is therefore availability of a data model for the structure to be serialized. Details of the correspondence between Infosets and data models are provided in A. Mapping between Infosets and Data Models. The data model introduces accessors such as dm:string-value, dm:type and dm:typed-value, which are used in this specification.
Many applications of XML do not, in general, include schema type information. Accordingly, this specification does not require that dm:type or dm:typed-value be reconstructed. This format thus implements a "lossy" model, in which type information available to the serializer may be used for purposes of optimization, but need not in general be provided to the deserializer (except insofar as necessary to perform deserialization). The data models at both ends are therefore identical in overall structure, dm:string-value, and dm:children content, but not necessarily with respect to dm:type and dm:typed-value.
A MIFFY document is constructed as a MIME Multipart/Related package with an XML root part, which is a serialization of the Optimized Infoset. These constructs are specified in detail below.
MIFFY Documents MUST be valid MIME Multipart/Related documents, as specified by [rfc2387]. Ordering of MIME parts MUST NOT be considered significant to MIFFY processing or to the construction of the Target Infoset.
The root MIME part MUST be an XML 1.0 serialization [xml1.0] of the Optimized Infoset, and MUST be identified with the [ TBD ] media type.
The Optimized Infoset MAY contain any information item, and SHOULD contain xbinc:Include element information items. Information items other than those defined below MUST be ignored for the purposes of MIFFY processing.
The xbinc:Include element node accessor values are as follows:
The href attribute node has the following data model accessor values:
[ TBD ]
Unless otherwise stated, processing MUST be semantically equivalent to performing the specified steps separately, and in the order given.
To create a MIFFY Document from an XML Infoset;
To create an XML Infoset from a MIFFY document;
Optimization in MIFFY is limited to the content of those element information items which contain characters that can be interpreted as base64-encoded data. Attributes and non-base64-compatible character data cannot be successfully optimized by MIFFY.
Because optimization candidates are transformed to binary data, and then re-encoded as canonical base64, care should be taken in selecting them. In particular, if the lexical form of the base64 data is important to preserve (e.g., a whitespace-sensitive signature algorithm is being used), it is important to ensure that either the form in the Target Infoset is canonical, or that such content is not selected as an optimization candidate.
If an optimization candidate cannot be successfully encoded into the optimized infoset, implementations SHOULD behave as if that portion of the Target Infoset were not identified as an optimization candidate.
[ TBD, depending on media type feedback ]
[ TBD ]
This specification uses the XQuery 1.0 and XPath 2.0 Data Model to augment the information available in Infosets with typing information, which is used as the basis for optimization. This Appendix sets out in detail the correspondence between Infosets and data models, for purposes of implementation of this specification.
The [XML Query Data Model] provides a normative mapping from the Post Schema Validation Infoset to a data model. Except as specified here, that mapping is used to construct data models from infosets during serialization. The differences are as follows:
EDNOTE [NRM]: Should xdt:untypedAtomic be used for leaf nodes with only text content? Seems preferable to me, but for some reason the dm is looser.
The [XML Query Data Model] provides a normative mapping from a Data Model to an Infoset. That mapping is used to construct an infoset during deserialization. Note that this mapping makes use only of dm:string and text node dm:children: in no case is the dm:type or dm:typed-value used to construct the Infoset. Thus, this mapping enforces the goal of this feature, which is to use type information as a means of optimization, without affecting application semantics.