I'm a little late to this discussion, so please forgive me if I'm covering ground people have already discussed. But focusing on advice to developers, I'd suggest replacing 6 and 7 in http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding, by the following 3 numbered items. - Test if the bytes are valid UTF-8. If they are, return return that encoding, with the confidence<http://dev.w3.org/html5/spec/Overview.html#concept-encoding-confidence> *tentative*, and abort these steps. - *[include note about UTF-8 patterns, maybe reworded a bit.]* - The user agent may attempt to autodetect the character encoding *[include rest of #5]* - Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence<http://dev.w3.org/html5/spec/Overview.html#concept-encoding-confidence> *tentative*. Due to its widespread use as a default in legacy content, windows-1252 is recommended as a default in the absences of other information. Mark On Sun, Oct 11, 2009 at 19:57, Ian Hickson <ian@hixie.ch> wrote: > On 11 Oct 2009, at 18:39, Larry Masinter <masinter@adobe.com> wrote: > > Can someone please explain, again, why the discussion of default >> configurations of a particular category of user agent in various >> regions belongs in the definition of the HyperText Markup Language? >> >> What benefit can any author of a web page derive, please, from >> knowing what the default settings of various browsers in products >> sold into various language environments? >> > > Authors aren't the only target audience of this specification. Implementors > benefit from advice suggesting default encodings. Users benefit from > consistency in implementations. > > -- > Ian Hickson > >Received on Monday, 12 October 2009 04:15:27 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 12 October 2009 04:15:30 GMT