RE: I18N issues with the XML Specification

> From: mark.davis@us.ibm.com
> Date: lundi 10 avril 2000 20:59
>
> B. In the context of XML, I believe the corrected formulation
> should be:
>
> 2.a. If there is no BOM as the first codepoint, then "UTF-8",
> "UTF-16BE",
> "UTF-16LE", "UTF-32BE", and "UTF-32LE" are treated just like any other
> encoding. That is, they must have an XML encoding declaration

Not quite.  UTF-8 does not need an encoding declaration, it has been the
default from day one.  I agree with the others: "just like any other
encoding", decoding is fully specified by the tag alone, XML parsers are not
required to support them.

> 2.b. If there is no BOM as the first codepoint, then "UTF-16"
> is treated as
> an alias for "UTF-16BE",

I believe this is in contradiction with the spec.  If you say "UTF-16", you
MUST have a BOM to tell the endianness.  Changing that would be a
significant change, for which I don't really see a justification.

> and both "UTF-32" and "UCS-4" are treated as
> equivalent to "UTF-32BE".

This is not currently in the XML spec, but perhaps these semantics could be
added to the registrations of "UTF-32" and "UCS-4" as MIME charset tags.
Not sure it's a good idea, though.  Why not use a BOM or a specific tag?

--
François

Received on Monday, 10 April 2000 23:16:48 UTC