Re: Request for clarification on Canonical XML

On Monday 28 July 2003 13:53, Martin Duerst wrote:
> The current text is slightly problematic because it says 'without concern
> for its encoding' and then goes straight on to mention UTF-16. UTF-16
> indeed does not deal with octets, but it is still an encoding.

So your point is that the UTF-8 encoding can be restrictive because (1) one 
may not want to use any encoding (what you mean by "abstract modeling"?) or 
(2) one may want to use a different encoding. 

> Also, this version of the text doesn't mention abstract modeling anymore.
> It might also be better to replace 'may require' with 'may be better
> served with'.

I tweaked it to "might want" so as to avoid the "MAY", but be terse and not 
presume to tell them what they are better served with. <smile/> Ok, how 
about.

[[[
Note: Canonical XML is an octet sequence resulting from characters, from the 
UCS character domain, encoded in UTF-8. Creating a deterministic octet 
sequence is necessary for XML Signature and other applications. However, 
some applications might want a canonical form of XML in a different 
encoding, or one that is simply a sequence of characters, without concern 
for its encoding. For example, it may be appropriate to choose UTF-16 
rather than UTF-8 as the encoding of an API in a programming language using 
UTF-16 to represent Unicode strings, such as Java or Python. Or, one might 
want to abstractly describe an XML document as an Infoset that includes 
sequences of characters. In such cases, applications are not prohibited 
from defining and using a canonical character sequence that corresponds to 
the characters of a Canonical XML instance.
]]]

Received on Monday, 28 July 2003 16:39:03 UTC