MTOM: Whitespace handling and canonical forms for base64binary

As some of you know, I have a "to do" from the data model task force to
attempt a redraft based on the query data model.  While working through
that I noticed an issue that I believe relates to all MTOM formulations.

Specifically, the Schema Recommendation part 2 [1] defers to RFC 2045 for
the definition of base64Binary [2].  That in turn provides lattitude in the
use of whitespace in base64 serializations, implying that there are
multiple lexical representations for the type (I.e. differing in embedded
whitespace).  Furthermore, the Recommendation compounds the complication by
failing to call out that fact, or to define a preferred canonical
representation.

The schema WG has acknowledged these issues and has prepared an erratum, a
draft of which is at [3].

I believe we need to open an MTOM issue, as transmission of the "binary"
form of a base64binary info item is not in general sufficient to
reconstruct its lexical form.  Dsigs could break, etc.

Tentatively, I propose a resolution along these lines:  not only must items
identified for optimized transmission in MTOM be in base64binary, they MUST
be known to be in the canonical representation defined in the schema
erratum (which is lines of exactly 76 characters terminated by #xA, with
special case handling for possibly short last lines).  Alternatively, we
could mandate no whitespace, which is a bit more compact but non-canonical.
I suspect that sticking with canonical will maximize compatibility with
other tools, and is worth the 1.3% overhead.   Anyway, I think we have to
do something.

For the moment, I will include the resolution above in the DM version of
the MTOM draft, primarily as a placeholder.

Noah

[1] http://www.w3.org/TR/xmlschema-2/#base64Binary
[2] http://www.ietf.org/rfc/rfc2045.txt
[3]
http://www.w3.org/XML/Group/2002/09/xmlschema-2/datatypes-with-errata.html#base64Binary


------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------

Received on Thursday, 21 August 2003 20:30:36 UTC