Re: Auto-detect and encodings in HTML5

On 1 Jun 2009, at 19:37, Larry Masinter wrote:

> New behavior: IF you see, say, <doctype html5> THEN assume default  
> charset
> is UTF8, rather than applying heuristics to guess charset.

If you see it how? You need to have read the encoded string to see  
such a string.

> Yes, supplying explicit charset is preferable, but what would break
> if such a new rule were supplied?

The problem is that any HTML 5 content served as text/html will be  
treated as Windows-1252 by all existing user agents and UTF-8 by new  
ones, which is problematic and will lead to problems (as people tend  
to only test in one browser, and if it works in one browser assume it  
works everywhere) as it is hence inconsistent.

Geoffrey Sneddon

Received on Monday, 1 June 2009 21:10:08 UTC