- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 10 Oct 2003 10:36:01 -0400
- To: rsalz@datapower.com
- Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, "xml-dist-app@w3.org" <xml-dist-app@w3.org>
Sorry, but I think there's been some confusion here. The current discussion bears no immediate relation to XML c14n, DSig, etc. It's actually more fundamental to any use of SOAP with MTOM, independent of whether XML DSig or the associated c14n Recs are to be used. In brief: we've been referring to canonical forms of schema datatypes, as defined in the datatypes recommendation, as opposed to the term canonical as introduced by the c14n recs that are used in conjunction with DSig. The following explains in more detail. The trick in MTOM is basically to say that for data known to be in a lexical form corresponding to xsd:base64Binary, sending the value (in the sense of XML schema value space) is sufficient to reconstruct the lexical form. This would be like saying for integers that you can reconstruct the three character sequence '1' '2' '3' by sending the value that in java would be int i = 123. The point is that, in the case of integers, that's true only if you know that the integer has no leading zero (or that it invariably has one leading zero, or whatever.) In short, if the lexical and value forms are exactly 1-to-1, then this trick works. The problem is that the lexical forms for base64Binary, as proposed in the schema erratum, allow for variability in whitespace in the lexical form. So, if you just send the 'value', you can't be sure whether or not the original characters had whitespace embedded or not, as the same value corresponds to more than one lexical form. The rules of the SOAP Recommendation apply before you even consider use of XML c14n and/or DSig: they state that any legal SOAP binding must faithfully transmit the infoset, which means leading zeros if present for integers, whitespace in base64Binary, etc. Indeed, the Infoset and thus SOAP envelopes are not type aware: at the level of SOAP envelopes there is no such thing as an integer, just character sequences. I therefore believe that the MTOM "trick" can be applied only to one lexical form for each base64Binary value, and I have suggested that it be the form called out as "canonical" in the erratum to the schema datatypes specification. This is a different business than the particular c14n Recs that have been built to aid DSig, I think. While it would be plausible to invent new ones that were datatype-aware and that, for example, stripped leading zeros on integers and put base64Binary in canonical forms, I don't believe the current c14n rec does that. Whether it should is a separate discussion, and not something on which I (or anyone else in this discussion as far as I can tell) has offered a recommendation. FWIW, I think we should always tread slowly when considering making XML type aware. MTOM does it purely for purposes of optimization. Query and schema do it for reasons that I think are important (e.g. so I can talk about all the age attributes that have a value>50...you presumbably want to do such comparisons numerically). SOAP has carefully stayed away from anything that normatively depends on schema validation, and even the encodings on SOAP 1.2 only assign type names, not value spaces and semantics. The only reason I can see for doing type-aware c14n for dsig is if it proves valuable for user applications, or perhaps in conjunction with XML Query. Certainly nothing in this discussion was meant to relate directly to the c14n Rec or to dsig. It's merely been to decide which lexical forms are subject to MTOM optimization. Thanks! ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------ Rich Salz <rsalz@datapower.com> 10/10/03 10:00 AM To: Elliotte Rusty Harold <elharo@metalab.unc.edu> cc: Noah Mendelsohn/Cambridge/IBM@Lotus, "xml-dist-app@w3.org" <xml-dist-app@w3.org> Subject: Re: New XMLP Issue Relating to Canonical Forms > XML canonicalization does not perform Unicode normalization on text, No, but it will add whitespace (a newline) if there are PI or comment nodes before or after the first element node. /r$ -- Rich Salz Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html
Received on Friday, 10 October 2003 10:40:52 UTC