- From: <MARUYAMA@jp.ibm.com>
- Date: Thu, 15 Apr 1999 11:12:55 +0900
- To: w3c-xml-sig-ws@w3.org
Requiring well-formedness of canonical form is not only unnecessary, but in practice, it is very hard to define technically . The problem is the treatment of namespace prefix. Suppose there are two XML documents: Document 1: <root xmlns:edi='http://ecommerce.org/schema'> <edi:order> : </edi:order> </root> Document 2: <root xmlns:ec='http://ecommerce.org/schema'> <ec:order> : </ec:order> </root> If both the <edi:order> element in Document 1 and the <ec:order> element have the same contents, I think the canonicalized forms of these elements must be equal. However, simply substituting the namespace prefix ("edi:" or "ec:") by the expanded namespace (e.g., "http://ecommerce.org/schema") does not work because <http://ecommerce.org/schema:order> : </http://ecommerce.org/schema:order> is not well-formed any more. There is no simple way to work this around, because namespaces can be nested and simple renaming conventions will have a naming collision problem (the only way I can think of to take a hash of the namespace and encode it as a hexadecimal prefix, as in <537E92D39EA108FF193DA712910A53D9:order>, but I do not see any value in doing this). The problem is that namespace names can be anything and can contain illegal characters (such as colons) that cannot be used in namespace prefix. -- Hiroshi Maruyama Manager, Network Applications, Tokyo Research Laboratory +81-462-73-4576, maruyama@jp.ibm.com Also Associate Professor, Dept. of Computer Science, Tokyo Institute of Technology +81-3-5734-3953, maruyama@cs.titech.ac.jp From: "Joseph M. Reagle Jr. (W3C)" <reagle@w3.org> on 99/04/14 04:04 To: "Signed-XML Workshop" <w3c-xml-sig-ws@w3.org> cc: Paul Grosso <pgrosso@arbortext.com>, "Joel A. Nava" <jnava@adobe.com> (bcc: Hiroshi Maruyama/Japan/IBM) Subject: Canonical Form as non-XML [Note: I'm uncertain of the cross-posting and cc:'ing ettiquette, since w3c-xml-sig-ws@w3.org is a public list and syntax is not, so I just cc'd the authors of the original thread.] Interesting thread in the syntax-WG that I wanted to port over here. I've been advocating that one be able to hash multiple represenations of the data: the bits, some form of canonical XML, and additional representations (such as DOM, or RDF, or a graph notation language, etc.) But I always assumed the canonical XML would be XML. Obviously, it doesn't have to be. My first reaction as to why I would want XML canonical XML was fuzzily similar to the argument why we should do signed-XML at all -- instead of just using S/MIME. People want XML data to continue to available to all XML applications. For instance, somewhere in a document processing chain I might have an application that needs to access the data within an XML structure but it doesn't care about the signatures. Maybe the guy before or beyond him does, but there is no necessary reason to render the content opaque to a non-MIME savy XML application. So my first reaction was why make an application understand another format which a non-signature XML application might have problems with? But this is a half-thought. If an application will canocilize, it will have an input and output regardless. Additionally, I expect that you can never expect the XML on the wire to be canonical: the proof is equal to the work, so you might as well do the work. Consequently, I don't see a necessary reason as to why the canonical XML output must be XML. Comments? Forwarded Text ---- Date: Tue, 13 Apr 1999 11:44:15 -0500 To: w3c-xml-syntax From: Paul Grosso <pgrosso@arbortext.com> Subject: RE: c14n [was: RE: Colons in attribute values] Status: At 08:51 1999 04 13 -0700, Joel A. Nava wrote: >If we are testing processor conformance, does that mean that >the only kind of processors we can test spit out legal XML? > >I thought processors took in legal XML, and generated different >types of output from parsing the XML. In my way of thinking >then, the C14N form is needed as input to the processor. >What am I missing here? I cannot understand how given the canonical form as *input* to an XML processor makes any sense in terms of testing. The only way I know of testing conformance is to require that all processors emit the canonical form. That is what RAST requires of all SGML processors as far as I remember. Then, you hand a (most likely non-canonical!) document with a known canonical form to a processor, and it must emit something that is byte-for-byte equivalent to the known canonical form. If the processor produces the correct canonical form for the complete test suite, it is considered a conforming processor. But note that this does not require that the canonical form be XML. As you point out, there is no requirement that an XML processor ever *emit* XML. In fact, the only emission/reporting requirements are that they give errors as required by the spec. A c14n effort must add a requirement that all XML processors conforming to the canonical testing spec also emit the canonical form. End Forwarded Text ---- ___________________________________________________________ Joseph Reagle Jr. W3C: http://www.w3.org/People/Reagle/ Policy Analyst Personal: http://web.mit.edu/reagle/www/ mailto:reagle@w3.org
Received on Thursday, 15 April 1999 00:11:10 UTC