W3C home > Mailing lists > Public > whatwg@whatwg.org > February 2012

[whatwg] Character-encoding-related threads

From: Simon Pieters <simonp@opera.com>
Date: Mon, 13 Feb 2012 08:38:25 +0100
Message-ID: <op.v9lwubnfidj3kv@device-23f190>
On Sat, 11 Feb 2012 00:44:22 +0100, Ian Hickson <ian at hixie.ch> wrote:

> On Wed, 7 Dec 2011, Henri Sivonen wrote:
>>
>> I believe I was implementing exactly what the spec said at the time I
>> implemented that behavior of Validator.nu. I'm particularly convinced
>> that I was following the spec, because I think it's not the optimal
>> behavior. I think pages that don't declare their encoding should always
>> be non-conforming even if they only contain ASCII bytes, because that
>> way templates created by English-oriented (or lorem ipsum -oriented)
>> authors would be caught as non-conforming before non-ASCII text gets
>> filled into them later. Hixie disagreed.
>
> I think it puts an undue burden on authors who are just writing small
> files with only ASCII. 7-bit clean ASCII is still the second-most used
> encoding on the Web (after UTF-8), so I don't think it's a small thing.
>
> http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html

I think this is like saying that requiring <!DOCTYPE HTML> is an undue  
burden on authors who are just writing small files that don't use CSS or  
happen to not be affected by any quirk.

In practice, authors who don't declare their encoding can silence the  
validator by using entities for their non-ASCII characters, but they will  
still get bitten by encoding problems as soon as they want to submit forms  
or resolve URLs with %-escaped stuff in the query component, and so forth,  
so it seems to me authors would be better off if we said that the encoding  
cruft is required cruft just like the doctype cruft.

-- 
Simon Pieters
Opera Software
Received on Sunday, 12 February 2012 23:38:25 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:39 UTC