- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 10 Oct 2003 10:36:01 -0400
- To: rsalz@datapower.com
- Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, "xml-dist-app@w3.org" <xml-dist-app@w3.org>
Sorry, but I think there's been some confusion here. The current
discussion bears no immediate relation to XML c14n, DSig, etc. It's
actually more fundamental to any use of SOAP with MTOM, independent of
whether XML DSig or the associated c14n Recs are to be used. In brief:
we've been referring to canonical forms of schema datatypes, as defined in
the datatypes recommendation, as opposed to the term canonical as
introduced by the c14n recs that are used in conjunction with DSig. The
following explains in more detail.
The trick in MTOM is basically to say that for data known to be in a
lexical form corresponding to xsd:base64Binary, sending the value (in the
sense of XML schema value space) is sufficient to reconstruct the lexical
form. This would be like saying for integers that you can reconstruct the
three character sequence '1' '2' '3' by sending the value that in java
would be int i = 123. The point is that, in the case of integers, that's
true only if you know that the integer has no leading zero (or that it
invariably has one leading zero, or whatever.) In short, if the lexical
and value forms are exactly 1-to-1, then this trick works.
The problem is that the lexical forms for base64Binary, as proposed in the
schema erratum, allow for variability in whitespace in the lexical form.
So, if you just send the 'value', you can't be sure whether or not the
original characters had whitespace embedded or not, as the same value
corresponds to more than one lexical form.
The rules of the SOAP Recommendation apply before you even consider use of
XML c14n and/or DSig: they state that any legal SOAP binding must
faithfully transmit the infoset, which means leading zeros if present for
integers, whitespace in base64Binary, etc. Indeed, the Infoset and thus
SOAP envelopes are not type aware: at the level of SOAP envelopes there
is no such thing as an integer, just character sequences. I therefore
believe that the MTOM "trick" can be applied only to one lexical form for
each base64Binary value, and I have suggested that it be the form called
out as "canonical" in the erratum to the schema datatypes specification.
This is a different business than the particular c14n Recs that have been
built to aid DSig, I think. While it would be plausible to invent new
ones that were datatype-aware and that, for example, stripped leading
zeros on integers and put base64Binary in canonical forms, I don't believe
the current c14n rec does that. Whether it should is a separate
discussion, and not something on which I (or anyone else in this
discussion as far as I can tell) has offered a recommendation. FWIW, I
think we should always tread slowly when considering making XML type
aware. MTOM does it purely for purposes of optimization. Query and
schema do it for reasons that I think are important (e.g. so I can talk
about all the age attributes that have a value>50...you presumbably want
to do such comparisons numerically). SOAP has carefully stayed away from
anything that normatively depends on schema validation, and even the
encodings on SOAP 1.2 only assign type names, not value spaces and
semantics. The only reason I can see for doing type-aware c14n for dsig
is if it proves valuable for user applications, or perhaps in conjunction
with XML Query. Certainly nothing in this discussion was meant to relate
directly to the c14n Rec or to dsig. It's merely been to decide which
lexical forms are subject to MTOM optimization. Thanks!
------------------------------------------------------------------
Noah Mendelsohn Voice: 1-617-693-4036
IBM Corporation Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Rich Salz <rsalz@datapower.com>
10/10/03 10:00 AM
To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
cc: Noah Mendelsohn/Cambridge/IBM@Lotus, "xml-dist-app@w3.org"
<xml-dist-app@w3.org>
Subject: Re: New XMLP Issue Relating to Canonical Forms
> XML canonicalization does not perform Unicode normalization on text,
No, but it will add whitespace (a newline) if there are PI or comment
nodes before or after the first element node.
/r$
--
Rich Salz Chief Security Architect
DataPower Technology http://www.datapower.com
XS40 XML Security Gateway http://www.datapower.com/products/xs40.html
XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html
Received on Friday, 10 October 2003 10:40:52 UTC