- From: François Yergeau <yergeau@alis.com>
- Date: Mon, 10 Apr 2000 22:51:41 -0400
- To: mark.davis@us.ibm.com, "'Tim Bray'" <tbray@textuality.com>
- Cc: "'John Cowan'" <jcowan@reutershealth.com>, "'MURATA Makoto'" <muraw3c@attglobal.net>, "'Rick Jelliffe'" <ricko@gate.sinica.edu.tw>, xml-editor@w3.org, w3c-i18n-ig@w3.org, w3c-xml-core-wg@w3.org
> From: mark.davis@us.ibm.com > Date: lundi 10 avril 2000 20:59 > > B. In the context of XML, I believe the corrected formulation > should be: > > 2.a. If there is no BOM as the first codepoint, then "UTF-8", > "UTF-16BE", > "UTF-16LE", "UTF-32BE", and "UTF-32LE" are treated just like any other > encoding. That is, they must have an XML encoding declaration Not quite. UTF-8 does not need an encoding declaration, it has been the default from day one. I agree with the others: "just like any other encoding", decoding is fully specified by the tag alone, XML parsers are not required to support them. > 2.b. If there is no BOM as the first codepoint, then "UTF-16" > is treated as > an alias for "UTF-16BE", I believe this is in contradiction with the spec. If you say "UTF-16", you MUST have a BOM to tell the endianness. Changing that would be a significant change, for which I don't really see a justification. > and both "UTF-32" and "UCS-4" are treated as > equivalent to "UTF-32BE". This is not currently in the XML spec, but perhaps these semantics could be added to the registrations of "UTF-32" and "UCS-4" as MIME charset tags. Not sure it's a good idea, though. Why not use a BOM or a specific tag? -- François
Received on Monday, 10 April 2000 23:16:48 UTC