Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

John Cowan, Tue, 7 Jun 2011 01:52:54 -0400:
> Bjoern Hoehrmann scripsit:
> 
>> Anyone who wants the BOM to take precedence over the HTTP Content-Type
>> header, or the charset parameter within it, is welcome to make an I-D
>> to that effect that updates RFC 2616 and RFC 4288 and possibly others.
>> Trying to sneak in such changes through backdoors is unacceptable. So,
>> if "HTML5" has rules as you suggest, that is most likely an error.
> 
> I fully expect that by 2017 HTML5 will have defined its own version of
> Unicode, its own version of MIME, its own version of HTTP, and its own
> version of TCP/IP.  Compatibility with anything else will no longer be
> an issue.

 You exegesis of what XML 1.0 says on second guessing (note the 
fragment URI) when there is external encoding info, would be very 
welcome: [1]

]]
F.2 Priorities in the Presence of External Encoding Information
The second possible case occurs when the XML entity is accompanied by 
encoding information, as in some file systems and some network 
protocols. When multiple sources of information are available, their 
relative priority and the preferred method of handling conflict should 
be specified as part of the higher-level protocol used to deliver XML. 
In particular, please refer to [IETF RFC 3023] or its successor, which 
defines the text/xml and application/xml MIME types and provides some 
useful guidance. In the interests of interoperability, however, the 
following rule is recommended.
	*	If an XML entity is in a file, the Byte-Order Mark and encoding 
declaration are used (if present) to determine the character encoding.
[[

[1] http://www.w3.org/TR/xml/#sec-guessing-with-ext-info
-- 
Leif Halvard Silli

Received on Tuesday, 7 June 2011 14:40:46 UTC