Re: Three Issues for C14N Consideration

Joseph,
My proposal on namespace handling was made to satisfy several DSig
requirements on C14N.  One of them which was implicit in my original email
was the context independence of the canonicalization.  Because Dsig
requires to be able to sign an element of  an XML document, all the element
names and attribute names must have "distinguished" name regardless of the
choice of the namespace prefix.

After following the discussion between Don and James Clark, however, I
realized that James Clark's proposal can also be made context independent
as he points out.  In the light of the increased readability as well as
avoidance of complexity of introducing the digest algorithm, I think it is
more reasonable and more preferable to have a C14N definition based on his
proposal, using generated prefixes such as "n1", "n2", ..., and so on for
EACH element.

Here is a concrete example.


Document A
==========

<?xml version="1.0" encoding="Shift_JIS"?>
<root>
 <hello xmlns="http://ibm.com/schemaA"
        xmlns:myns="http://ibm.com/schemaB">
   <myns:goodbye xmlns:yourns="http://ibm.com/schemaC" value="123"
yourns:value="456">
      <order xmlns="http://ibm.com/schemaD">
        <title>XML and Java</title>
        <publisher>Addison Wesley</publisher>
        <price unit="USD">39.95</price>
      </order>
   </myns:goodbye>
 </hello>
</root>

To do context-independent canonicalization, we need to consider all the
namespaces used in a single element (but not a subtree).  For example, see
the "myns:goodbye" element.  This element contains three namespaces
prefixed by "myns", "yourns", and the default namespace (two of them are
inherited from the surronding context). The canonicalized form
(context-independent version of JC's C14N) of this document is as follows
(for readability I added extra spaces for indentation).


<?xml version="1.0" encoding="UTF-8"?>
<root>
 <hello xmlns="http://ibm.com/schemaA">
   <goodbye xmlns="http://ibm.com/schemaB"
            xmlns:n1="http://ibm.com/schemaA"
            xmlns:n2="http://ibm.com/schemaC"
            n1:value="123"
            n2:value="456">
      <order xmlns="http://ibm.com/schemaD">
        <title xmlns="http://ibm.com/schemaD">XML and Java</title>
        <publisher xmlns="http://ibm.com/schemaD">Addison
Wesley</publisher>
        <price xmlns="http://ibm.com/schemaD" unit="USD">39.95</price>
      </order>
   </goodbye>
 </hello>
</root>


The "goodbye" element uses the three namespaces, each of which is declared
in the same document.  This is actually more readable than any of the
hexified versions (either URI or MD5 of URI).  It is not as long as the
hexified versions either if the URIs are not very long.




--
Hiroshi Maruyama
Manager, Network Applications, Tokyo Research Laboratory
+81-462-73-4576, maruyama@jp.ibm.com
Also Associate Professor, Dept. of Computer Science, Tokyo Institute of
Technology
+81-3-5734-3953, maruyama@cs.titech.ac.jp


From: "Joseph M. Reagle Jr." <reagle@w3.org> on 99/06/18 02:41 AM

To:   "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
cc:    (bcc: Hiroshi Maruyama/Japan/IBM)
Subject:  Three Issues for C14N Consideration





If you have strongly held thoughts on any of these issues, let me know (cc
the list).

1. Presently, the C14N of XML includes expanded general entities which
produces a standalone document. Obviously parameter entities aren't a
problem, but PIs might be. Do you care about PIs?
2. Should the canonical form include an XML declaration with version
number?
3. If C14N fails because of the unavailability of an external entity, do
XML Signature applications want to know?

*. Next week the Syntax WG will discuss Don and Hiroshi's n1 vs. hash
prefix
issue.
*. Dan Connolly committed to adding a requirement to the XML Schema
Requirements Document from us and the I18N (internationalization) IG that a
datatype have a normalized/C14N form.

[1]

Date: Thu, 17 Jun 1999 09:47:13 -0700
To: "XML Syntax WG" <w3c-xml-syntax-wg@w3.org>
From: Tim Bray <tbray@textuality.com>
Subject: C14n issue N2 - Processing Instructions
Status:

Suppose we have PIs in the external DTD subset.  The document can
still be standalone="yes", so in principle we can't put these in the
canonical form unless *require* processors to read the external subset.
So I see the following realistic options.  Let's hear opinions and
nail this down next Wednesday.

N2-a. Lose PIs from the canonical form entirely
N2-b. Lose PIs that happen to be in external entities
N2-c. Lose PIs that happen to be in the DTD, internal or external
N2-d. Include all PIs, requiring canonicalizers to read all external
      parts of the DTD.

Note that both options b and c require information that is not
currently provided by the infoset, so we have some liaison work to do.
I must say that the more I think about this, both selection criteria
(it happens to be in the document entity vs. it happens to be in the DTD)
seem unsatisfactory to me.

N2-a and N2-D weren't brought up in the meeting, but the more I think about
them, the more I prefer either of them to either of -b and -c.  I have
an action item to ask the Infoset group about enhancing the PI information,
but I'll hold on that in case this group suddenly develops wild enthusiasm
either for N2-a or N2-d, which would make it unnecessary. -Tim


[2]

Date: Thu, 17 Jun 1999 09:47:13 -0700
To: "XML Syntax WG" <w3c-xml-syntax-wg@w3.org>
From: Tim Bray <tbray@textuality.com>
Subject: C14n issue N4 - XML declaration?
Status:  O

Should the canonical form include the XML declaration?  Subsidiary
question: if so, which of the encoding and standalone declarations
should it include?

Let's hear opinions and nail this down next Wednesday.

Minor pro:
 XML docs with XML declarations are more robustly interchanged
Minor con:
 A couple of dozen extra bytes

Serious issue: if we include the XML declaration, that makes the
version of XML part of the canonical form. The corollory is that
should there ever be an XML 1.1 or 2.0, no XML 1.1/2.0 document
can ever be canonically equivalent to any XML 1.0 document.  Is
this a good or bad thing?

Note: if we leave the XML Decl out, and specify that this C14n
spec applies *only* to XML 1.0 documents, we can postpone the
decision about whether XML1 and XML2 docs can ever be canonically
equivalent to the time we write the XML2 c14n spec.  Sounds good
to me. -Tim


[3]

Date: Thu, 17 Jun 1999 09:47:13 -0700
To: "XML Syntax WG" <w3c-xml-syntax-wg@w3.org>
From: Tim Bray <tbray@textuality.com>
Subject: C14n issue N3 - c14n processor
Status:

Should we introduce the notion of a canonicalization processor in the
C14n WD, and define an interface to it, maybe only a return value?
One advantage is that since canonicalization can fail due to
inaccessability
of external entities, it might be nice to have a deterministic way to
find out when this happens.  On the contra side, every new thing
added to the spec increases complexity and the chance of introducing
errors, and makes conformant interoperability harder to achieve.

Let's hear opinions on this and try to nail it down next wednesday. -Tim

_________________________________________________________
Joseph Reagle Jr.
Policy Analyst           mailto:reagle@w3.org
XML-Signature Co-Chair   http://w3.org/People/Reagle/

Received on Saturday, 19 June 1999 23:05:06 UTC