Re: Request for clarification on Canonical XML

At 11:35 03/07/28 -0400, Joseph Reagle wrote:

>On Thursday 24 July 2003 16:04, Martin Duerst wrote:
> > The canonical form of an XML document is physical representation of the
> > document produced by the method described in this specification. The
> > changes are summarized in the following list:
>
>Hi Martin, had this issue come up while we were writing the spec I'm
>confident we could have provided the clarity, or maybe even an additional
>definition of a "canonical character sequence form" as Graham suggested,
>that you are seeking.

I'm not seeking the addition of a new definition. I don't think it would
be appropriate to add new definitions without starting a new WD-REC
cycle, and I don't think this is important enough to do this.


>However, I think it would be inappropriate to do such
>a definition now, and I'm not sure how to even add a "note" as an erratum.
>It doesn't quite fit into "a Caveat where subsequent experience has shown
>that a recommendation of the specification was incorrect or needs further
>qualification." [1]

I guess it comes sufficiently close, the 'further qualification'
seems quite adequate.


>I don't object to the spirit of your text, and have tweaked it below:
>
>[[[
>Note: Canonical XML is an octet sequence resulting from characters, from the
>UCS character domain, encoded in UTF-8. This is necessary for the purposes
>of XML Signature and other applications. However, some applications may
>require a canonical form of XML that is a sequence of characters, without
>concern for its encoding and representation as octets. As an example, it
>may be appropriate to choose UTF-16 rather than UTF-8 as the encoding of an
>API in a programming language using UTF-16 to represent Unicode strings,
>such as Java or Python. In such cases, applications are not prohibited from
>defining and using a canonical character sequence that corresponds to the
>characters of a Canonical XML instance.
>]]]

The current text is slightly problematic because it says 'without concern
for its encoding' and then goes straight on to mention UTF-16. UTF-16
indeed does not deal with octets, but it is still an encoding.
Also, this version of the text doesn't mention abstract modeling anymore.
It might also be better to replace 'may require' with 'may be better
served with'.


Regards,    Martin.


>I'm not sure if this is any better, and I'm not confident it should be an
>erratum, but perhaps you could use this thread in your discussions with
>others about whether the octet representation is really needed?
>
>[1] http://www.w3.org/2001/03/C14N-errata

Received on Monday, 28 July 2003 14:22:49 UTC