- From: John Cowan <jcowan@reutershealth.com>
- Date: Wed, 05 Apr 2000 16:19:24 -0400
- To: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- CC: MURATA Makoto <muraw3c@attglobal.net>, xml-editor@w3.org, w3c-i18n-ig@w3.org, w3c-xml-core-wg@w3.org
Rick Jelliffe wrote: > I don't see why there is any need to ban the BOM for UTF16LE and > UTF16BE. RFC 2871 puts on an unnessary burdon here. But even if > it is banned, it does not make autodection unreliable. You have the cart before the horse. RFC 2871, like all charset and media-type RFCs, is concerned with giving standard labels to actual practice, not with standardizing the practice. People are already creating BOM-less UTF-16 content; the RFC merely specifies the charset labels needed for this content. > As in my email responding to John Cowen, where did the WG get the idea > that an external parseable entity can begin with any character? A fact of XML, if the entity is encoded in either UTF-8 or UTF-16. > Why? It is just another encoding. Why cannot this be handled merely > by updating Appendix F? It can. > I still have not seen any evidence why it is an error > against XML 1.0, strictly speaking, for an external parser entity to be > encoded in UTF16LE/BE if it has an encoding declarations (whether or not > it has a BOM). It all depends on the interpretation of the term "UTF-16" in clause 2.3.3: # Entities encoded in UTF-16 must begin with the Byte Order Mark [...]. The issue is whether "UTF-16" means only the charset so named in RFC 2871, or in the XML Rec context it is a generic term covering all three charsets named there. I myself agree with you: UTF-16BE and UTF-16LE should be supported if the appropriate encoding declaration is present. -- Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
Received on Wednesday, 5 April 2000 16:19:30 UTC