RE: Character Encoding Question

Maybe it would be best for you to wait for Martin to clear this up.

--Paul Hoffman, Director
--Internet Mail Consortium

<john>
Yes, after all, it is Martin's sentence in the first place, so I would be
uncomfortable with anything that didn't have his buyoff.

As for the info you provided, thanks, it was very helpful.  Actually, it is
the case that we only needed to know the answer you provided because, while
I don't know a lot about encodings, I do think our question was really
simple and I have most of the resources available.  The only thing that
remained was:  Do we or do we not include UCS-2 in the list of settings for
the XML declaration's encoding attribute (plus the defaulting mechanism for
that attribute) under which we decide not to NFC when converting to the UCS
domain, as required by the XPath data model.

Unfortunately, your answer on the representation power of UCS-2 vs. UCS-4
points out another problem we didn't know about before.  The fact that UCS-2
can only encode the BMP means that there is an ambiguity in the XPath data
model when it says that everything is represented in the UCS character
domain.  Which one?  This use of UCS without specifying which one lead me to
believe there was no difference.  Clearly, your information indicates
otherwise.  Moreover, the closest I come to a 'hint' at which one in the
XPath spec is the bibliographic citation of ISO 10646, which specifically
mentions "Part 1: Architecture and Basic Multilingual Plane".  This would
seem to imply a focus on the BMP.  In contradiction though, XML should be
expressible in UTF-8, which can represent all of UCS-4.

The problem is that we now have another conformance criterion for
canonicalization and hence for signatures.
</john>

Received on Thursday, 30 November 2000 11:45:58 UTC