[whatwg] Character-encoding-related threads from Leif Halvard Silli on 2012-02-13 (public-whatwg-archive@w3.org from February 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 13 Feb 2012 21:48:10 +0100
Message-ID: <20120213214810084738.75206fa0@xn--mlform-iua.no>

Anne van Kesteren, Mon Feb 13 12:02:53 PST 2012:
> On Mon, 13 Feb 2012 20:46:57 +0100, Anne van Kesteren wrote:

>> The list starts with <a> and the moment you do not use UTF-8 (or UTF-16,  
>> but you really shouldn't) you can run into problems. I wonder how  
>> controversial it is to just require UTF-8 and not accept anything else.

Hear, hear!

> I guess one could argue that <a> is already captured by the requirements  
> around URL validation. That would leave <form> and potentially some  
> script-related features. It still seems sensible to me to flag everything  
> that is not labeled as UTF-8,

Indeed. Such a step would make it a must for HTML5-compliant authoring 
tools to default to UTF-8. It would also positively affect validators - 
they would have to give "mild" advices about how to, the simplest way, 
use UTF-8. (E.g. if page is US-ASCII or US-ASCII with entities, then - 
a simple move: Just at a encoding declaration.) It is likely to have 
many, many positive side effects.

> but if we want something intermediate we  
> could start by flagging non-UTF-8 pages that use <form> and maybe obsolete  
> <form accept-charset> or obsolete any other value than utf-8 (I filed a  
> bug on that feature already to at least restrict it to a single value).

The full way - all pages regardless of <form> - seems the simplest and 
best.
-- 
Leif H Silli

Received on Monday, 13 February 2012 12:48:10 UTC