- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 10 Oct 2003 12:01:25 -0400
- To: mgudgin@microsoft.com
- Cc: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>, rsalz@datapower.com, xml-dist-app@w3.org
Martin Gudgin writes: > Isn't it the case that in MTOM, assuming you actually > started with the binary ( which is the reality in most > cases ), then there is no way to tell what, if any, > whitespace was present in the base64 characters, > because you didn't have them. No, I don't think this is true. Even in the case of MTOM with a binary source, SOAP is Infoset and thus characters. I think we need to distinguish what you must know in principle from what you must burn cycles computing if nobody actually needs to see it. SOAP says you must have an Infoset, which means that if asked you must know what characters are in the Infoset, and a receiver must be capable of reproducing those characters. For example, as I think you say in your note, an MTOM message might leave an MTOM sender and be signed using the already published exclusive c14n (which is type unaware.) That's a case where you will actually have to take the trouble to compute the characters (or at least to compute a signature that will the same as if you had gone through the character form.) The c14n result and the signature will be different, according to whether you decide that the lexical form had the whitespace between pieces of the binary representation or not. Furthermore, if that message goes through a second non-MTOM hop, you will have to convert to actual characters for transmission in, e.g. the SOAP 1.2 HTTP binding. Presumably you will check the dsig based on the actual characters transmitted, so you better specify precisely what they are. For all these reasons, I believe we must always be able to say what the characters are in the Envelope infoset, even if the source was binary in the implementation. Then again, I completely agree that there are many interesting use cases in which these characters will not be explicitly computed. The simple case where a value starts in binary, is sent through MTOM, and is processed by a receiver using a binary API is such an example, and an important one. That suggests why we need to know the character form of optimized values. I think the use cases that prove that there must be only one such form are in some sense the converse. Let's say that some sender for whatever reasons does have an element containing characters, and has reason to know that those characters are (for whatever reason) not in what the Schema erratum calls canonical form. I.e. the whitespace is not where the canonical form says it should be. My claim is that you MUST NOT MTOM encode such an element, because when received or relaying through a non-MTOM binding you will not reconstruct it correctly. That's the crucial use case, and I think it's important. I really don't want to ignore the SOAP Rec's fundamental requirement that bindings be capable of reconstructing the Infoset. > IF we were using new C14N algorithms that were MTOM > aware, we could dispense with the base64 chars > altogether, although that would require the algorithm > to emit a byte stream ( rather than an Xpath node set > ). Alternatively we could define a transform that > converts the base64 content of optimized elements into > some known form. Agreed, that deals with DSIG and doing a type- (or at least MTOM-aware) c14n may be a good idea. I don't think, however, that deals directly with the case where you are going through a second hop using a non-MTOM binding (it handles the signature, but nothing else), and I don't think it eliminates the need to faithfully transmit non-canonical forms if the application has explicitly provided them. Those are the primary reasons that I think that MTOM has to be viewed as "canonical lexical representation" only. Thanks! ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Friday, 10 October 2003 12:04:08 UTC