Re: I18N issues with the XML Specification

Rick Jelliffe wrote:
 
> I don't see why there is any need to ban the BOM for UTF16LE and
> UTF16BE. RFC 2871 puts on an unnessary burdon here. But even if
> it is banned, it does not make autodection unreliable.

You have the cart before the horse.  RFC 2871, like all charset and
media-type RFCs, is concerned with giving standard labels
to actual practice, not with standardizing the practice.  People
are already creating BOM-less UTF-16 content; the RFC merely
specifies the charset labels needed for this content.
 
> As in my email responding to John Cowen, where did the WG get the idea
> that an external parseable entity can begin with any character?

A fact of XML, if the entity is encoded in either UTF-8 or UTF-16.

> Why?  It is just another encoding. Why cannot this be handled merely
> by updating Appendix F?

It can.
 
> I still have not seen any evidence why it is an error
> against XML 1.0, strictly speaking, for an external parser entity to be
> encoded in UTF16LE/BE if it has an encoding declarations (whether or not
> it has a BOM).

It all depends on the interpretation of the term "UTF-16" in clause 2.3.3:

# Entities encoded in UTF-16 must begin with the Byte Order Mark [...].

The issue is whether "UTF-16" means only the charset so named in RFC 2871,
or in the XML Rec context it is a generic term covering all three charsets
named there.

I myself agree with you: UTF-16BE and UTF-16LE should be supported if the
appropriate encoding declaration is present.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Received on Wednesday, 5 April 2000 16:19:30 UTC