[whatwg] Character-encoding-related threads from Simon Pieters on 2012-02-14 (public-whatwg-archive@w3.org from February 2012)

From: Simon Pieters <simonp@opera.com>
Date: Tue, 14 Feb 2012 07:54:18 +0100
Message-ID: <op.v9npgssqidj3kv@device-23f190>

On Mon, 13 Feb 2012 18:22:13 +0100, Ian Hickson <ian at hixie.ch> wrote:

>> I think this is like saying that requiring <!DOCTYPE HTML> is an undue
>> burden on authors...
>
> It is. You may recall we tried really hard to make it shorter. At the end
> of the day, however, "<!DOCTYPE HTML>" is the best we could do.

It is a burden, but it's not significantly difficult or anything.

>> In practice, authors who don't declare their encoding can silence the
>> validator by using entities for their non-ASCII characters, but they
>> will still get bitten by encoding problems as soon as they want to
>> submit forms or resolve URLs with %-escaped stuff in the query
>> component, and so forth, so it seems to me authors would be better off
>> if we said that the encoding cruft is required cruft just like the
>> doctype cruft.
>
> Hm, that's an interesting point. Can we make a list of features that rely
> on the character encoding and have the spec require an encoding if any of
> those are used?
>
> If the list is long or includes anything that it's unreasonable to expect
> will not be used in most Web pages, then we should remove this particular
> "hole" in the conformance criteria.

The list may well be longer, I haven't checked, but I don't think that  
matters. The resolving URL problem is a bad problem because it means links  
will stop working for users that have a different default encoding, so  
those users leave and go to a competitor site. The form problem is a bad  
problem because it means that the database will be filled with content  
using various different encodings with no knowledge of what is what, so  
when the author realizes this and "fixes" it by declaring the encoding,  
it's already too late, the data is broken and is very hard to repair.

Letting authors get themselves in a situation where they have broken data  
even though it could have been easily prevented seems more like an undue  
burden to me.

Note that both of these features can be hidden in scripts where validators  
currently don't even look, so I think it's not a good idea to make the  
requirement conditional on these features.

-- 
Simon Pieters
Opera Software

Received on Monday, 13 February 2012 22:54:18 UTC