I'm a little late to this discussion, so please forgive me if I'm covering
ground people have already discussed.
But focusing on advice to developers, I'd suggest replacing 6 and 7 in
http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding,
by the following 3 numbered items.
- Test if the bytes are valid UTF-8. If they are, return return that
encoding, with the
confidence<http://dev.w3.org/html5/spec/Overview.html#concept-encoding-confidence>
*tentative*, and abort these steps.
- *[include note about UTF-8 patterns, maybe reworded a bit.]*
- The user agent may attempt to autodetect the character encoding *[include
rest of #5]*
- Otherwise, return an implementation-defined or user-specified default
character encoding, with the
confidence<http://dev.w3.org/html5/spec/Overview.html#concept-encoding-confidence>
*tentative*. Due to its widespread use as a default in legacy content,
windows-1252 is recommended as a default in the absences of other
information.
Mark
On Sun, Oct 11, 2009 at 19:57, Ian Hickson <ian@hixie.ch> wrote:
> On 11 Oct 2009, at 18:39, Larry Masinter <masinter@adobe.com> wrote:
>
> Can someone please explain, again, why the discussion of default
>> configurations of a particular category of user agent in various
>> regions belongs in the definition of the HyperText Markup Language?
>>
>> What benefit can any author of a web page derive, please, from
>> knowing what the default settings of various browsers in products
>> sold into various language environments?
>>
>
> Authors aren't the only target audience of this specification. Implementors
> benefit from advice suggesting default encodings. Users benefit from
> consistency in implementations.
>
> --
> Ian Hickson
>
>