RE: [HTML5] 2.8 Character encodings

To be specific about my advice on handling of charset:

>> What the document should say, rather than having a 'willful' 
>> misinterpretation, is that ISO-8859-1 means ISO-8859-1, but that for 
>> backward compatibility with existing (broken) web content, HTTP 
>> interpreting agents SHOULD treat characters outside of the ISO-8859-1 
>> repertoire as if they were in Windows-1252.

>That's exactly what it says, as far as I can tell. Could you elaborate on 
>exactly what text in the spec you are objecting to? Maybe I don't 
>understand your request.

My request:

The definition of "charset" in the HTML 4.01 specification
is much more legible and understandable, and the current
draft's language is opaque. Readopt most of
HTML 4.01 section 5.2 text; it would be a great improvement
in legibility.

Remove the tables in 2.7 Character Encodings from
the body of the specification and put into a separate document
or appendix "Browser Implementation Compatibility Guide"
which begins with wording to the effect:

 "For compatibility with some existing legacy content deployed
 on the web in various degrees, the following implementation
 advice is provided. Conforming HTML interpreters MAY apply
 these equivalences, but conforming HTML generators and editing
 tools MUST NOT rely on these mappings.  Over time, it
 is expected that use of incorrect charset labels will decrease."

Other wording around "willful violation" should be replaced
with advice on how incompatibility should be reduced in the
future.

Larry
-- 
http://larry.masinter.net

Received on Friday, 31 July 2009 06:00:06 UTC