RE: [HTML5] 2.8 Character encodings from Larry Masinter on 2009-07-31 (public-html@w3.org from July 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Thu, 30 Jul 2009 22:43:35 -0700
To: Anne van Kesteren <annevk@opera.com>, Ian Hickson <ian@hixie.ch>, "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>
CC: HTML WG <public-html@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118D81808BF@nambx04.corp.adobe.com>

I said:

> What the document should say, rather than having  a 'willful'
> misinterpretation, is that ISO-8859-1 means ISO-8859-1, but that
> for backward compatibility with existing (broken) web content,
> HTTP interpreting agents SHOULD treat characters outside of the
> ISO-8859-1 repertoire as if they were in Windows-1252.

and Anne replied:

> The document already says this. (Though it is a MUST, not SHOULD.)

There is a world of difference between normative requirements
and implementation advice. 

The character equivalence tables in section 2.7 should
be scrapped, or put in a separate "legacy content compatibility
guide".

Broken legacy content *does* disappear, and building
the Hypertext Markup Language in which the normative
conformance requirements are restricted to those that
will be useful even in controlled environments. 

 
The advice for the few "public browser" implementors 
who feel compelled to also deal with the increasing 
their compatibility with existing web sites from 67%
to 67.2% of existing content by supporting odd, broken
character transformations.

What percentage of web sites mislabel EUC-KR as windows-949,
for this to be a MUST requirement in HTML5?

The "copy/paste" use case where broken content makes
its way into new web pages and web applications does
not apply.

The charset equivalence tables do not apply anyway
to browsers which do not support the charsets for
which equivalents are supplied.

If HTML5 only requires two charsets, then requiring
support for equivalence tables is nonsensical.


>> IMHO, the willful disregard for compatibility with other
>> specifications in the current specification reflects a consistent
>> error in judgment.

> From the above it is unclear to me whether you 
> understood the specification.

From the above it is unclear to me whether you
understood the word judgment.

Larry

Received on Friday, 31 July 2009 05:44:26 UTC