Re: Canonical Form as non-XML from MARUYAMA@jp.ibm.com on 1999-04-15 (w3c-xml-sig-ws@w3.org from April 1999)

From: <MARUYAMA@jp.ibm.com>
Date: Thu, 15 Apr 1999 11:12:55 +0900
To: w3c-xml-sig-ws@w3.org
Message-ID: <49256754.0015DFF9.00@d22mta10.yamato.ibm.com>
Requiring well-formedness of canonical form is not only unnecessary, but in
practice, it is very hard to define technically .  The problem is the
treatment of namespace prefix.

Suppose there are two XML documents:

Document 1:
  <root xmlns:edi='http://ecommerce.org/schema'>
     <edi:order>
         :
     </edi:order>
  </root>

Document 2:
  <root xmlns:ec='http://ecommerce.org/schema'>
     <ec:order>
         :
     </ec:order>
  </root>

If both the <edi:order> element in Document 1 and the <ec:order> element
have the same contents, I think the canonicalized forms of these elements
must be equal.  However, simply substituting the namespace prefix ("edi:"
or "ec:") by the expanded namespace (e.g., "http://ecommerce.org/schema")
does not work because

  <http://ecommerce.org/schema:order>
    :
  </http://ecommerce.org/schema:order>

is not well-formed any more.  There is no simple way to work this around,
because namespaces can be nested and simple renaming conventions will have
a naming collision problem (the only way I can think of to take a hash of
the namespace and encode it as a hexadecimal prefix, as in
<537E92D39EA108FF193DA712910A53D9:order>, but I do not see any value in
doing this).

The problem is that namespace names can be anything and can contain illegal
characters (such as colons) that cannot be used in namespace prefix.


--
Hiroshi Maruyama
Manager, Network Applications, Tokyo Research Laboratory
+81-462-73-4576, maruyama@jp.ibm.com
Also Associate Professor, Dept. of Computer Science, Tokyo Institute of
Technology
+81-3-5734-3953, maruyama@cs.titech.ac.jp


From: "Joseph M. Reagle Jr. (W3C)" <reagle@w3.org> on 99/04/14 04:04

To:   "Signed-XML Workshop" <w3c-xml-sig-ws@w3.org>
cc:   Paul Grosso <pgrosso@arbortext.com>, "Joel A. Nava" <jnava@adobe.com>
      (bcc: Hiroshi Maruyama/Japan/IBM)
Subject:  Canonical Form as non-XML





[Note: I'm uncertain of the cross-posting and cc:'ing ettiquette, since
w3c-xml-sig-ws@w3.org is a public list and syntax is not, so I just cc'd
the
authors of the original thread.]

Interesting thread in the syntax-WG that I wanted to port over here. I've
been advocating that  one be able to hash multiple represenations of the
data: the bits, some form of canonical XML, and additional representations
(such as DOM, or RDF, or a graph notation language, etc.) But I always
assumed the canonical XML would be XML. Obviously, it doesn't have to be.

My first reaction as to why I would want XML canonical XML was fuzzily
similar to the argument why we should do signed-XML at all -- instead of
just using S/MIME. People want XML data to continue to available to all XML
applications. For instance, somewhere in a document processing chain I
might
have an application that needs to access the data within an XML structure
but it doesn't care about the signatures. Maybe the guy before or beyond
him
does, but there is no necessary reason to render the content opaque to a
non-MIME savy XML application. So my first reaction was why make an
application understand another format which a non-signature XML application
might have problems with? But this is a half-thought. If an application
will
canocilize, it will have an input and output regardless. Additionally, I
expect that you can never expect the XML on the wire to be canonical: the
proof is equal to the work, so you might as well do the work. Consequently,
I don't see a necessary reason as to why the canonical XML output must be
XML.

Comments?


Forwarded Text ----
 Date: Tue, 13 Apr 1999 11:44:15 -0500
 To: w3c-xml-syntax
 From: Paul Grosso <pgrosso@arbortext.com>
 Subject: RE: c14n [was: RE: Colons in attribute values]
 Status:

 At 08:51 1999 04 13 -0700, Joel A. Nava wrote:
 >If we are testing processor conformance, does that mean that
 >the only kind of processors we can test spit out legal XML?
 >
 >I thought processors took in legal XML, and generated different
 >types of output from parsing the XML. In my way of thinking
 >then, the C14N form is needed as input to the processor.
 >What am I missing here?

 I cannot understand how given the canonical form as *input*
 to an XML processor makes any sense in terms of testing.

 The only way I know of testing conformance is to require that
 all processors emit the canonical form.  That is what RAST
 requires of all SGML processors as far as I remember.  Then,
 you hand a (most likely non-canonical!) document with a known
 canonical form to a processor, and it must emit something that
 is byte-for-byte equivalent to the known canonical form.  If
 the processor produces the correct canonical form for the complete
 test suite, it is considered a conforming processor.

 But note that this does not require that the canonical form be XML.

 As you point out, there is no requirement that an XML processor ever
 *emit* XML.  In fact, the only emission/reporting requirements are
 that they give errors as required by the spec.  A c14n effort must
 add a requirement that all XML processors conforming to the
 canonical testing spec also emit the canonical form.

End Forwarded Text ----
___________________________________________________________
Joseph Reagle Jr.  W3C:     http://www.w3.org/People/Reagle/
Policy Analyst     Personal:  http://web.mit.edu/reagle/www/
                   mailto:reagle@w3.org
Received on Thursday, 15 April 1999 00:11:10 UTC