RE: Character Encoding Question

Hello John,

There is no problem with UCS-2 and UCS-4. The UCS is a set
(in the math sense) of characters, each with a number associated.
There is only one UCS. Just saying 'UCS', there are no assumptions whatsoever
about representation (UCS-2 and UCS-4 are both 'charset' labels), and
no assumptions about subsetting (UCS-2 can be used, in the right context,
to denote a certain subset of the UCS). So I don't see any problem.

Regards,   Martin.

At 00/11/30 08:45 -0800, John Boyer wrote:

>Maybe it would be best for you to wait for Martin to clear this up.
>
>--Paul Hoffman, Director
>--Internet Mail Consortium
>
><john>
>Yes, after all, it is Martin's sentence in the first place, so I would be
>uncomfortable with anything that didn't have his buyoff.
>
>As for the info you provided, thanks, it was very helpful.  Actually, it is
>the case that we only needed to know the answer you provided because, while
>I don't know a lot about encodings, I do think our question was really
>simple and I have most of the resources available.  The only thing that
>remained was:  Do we or do we not include UCS-2 in the list of settings for
>the XML declaration's encoding attribute (plus the defaulting mechanism for
>that attribute) under which we decide not to NFC when converting to the UCS
>domain, as required by the XPath data model.
>
>Unfortunately, your answer on the representation power of UCS-2 vs. UCS-4
>points out another problem we didn't know about before.  The fact that UCS-2
>can only encode the BMP means that there is an ambiguity in the XPath data
>model when it says that everything is represented in the UCS character
>domain.  Which one?  This use of UCS without specifying which one lead me to
>believe there was no difference.  Clearly, your information indicates
>otherwise.  Moreover, the closest I come to a 'hint' at which one in the
>XPath spec is the bibliographic citation of ISO 10646, which specifically
>mentions "Part 1: Architecture and Basic Multilingual Plane".  This would
>seem to imply a focus on the BMP.  In contradiction though, XML should be
>expressible in UTF-8, which can represent all of UCS-4.
>
>The problem is that we now have another conformance criterion for
>canonicalization and hence for signatures.
></john>

Received on Thursday, 30 November 2000 13:17:44 UTC