Re: HTML5 template: required <meta charset="UTF-8">

On Oct 16, 2012, at 1:33 AM, Řyvind Stenhaug wrote:

> On Mon, 15 Oct 2012 20:11:27 +0200, Gérard Talbot <css21testsuite@gtalbot.org> wrote:
> 
>> Le Lun 15 octobre 2012 13:30, Ian Hickson a écrit :
>>> On Mon, 15 Oct 2012, "Gérard Talbot" wrote:
>>>> 
>>>> I believe <meta charset="UTF-8"> is required in HTML5 documents.
>>> 
>>> It's also possible (and IMHO preferred) to just put the character
>>> encoding
>>> declaration in the MIME type.
>> 
>> Yes, it is.
>> 
>> But several HTML editors will use appropriate encoding when reading
>> <meta charset="UTF-8">: eg BlueFish 2.2.3. Otherwise, by default, the
>> charset of operating system may be used. Or the predefined charset
>> setting of the HTML editor may be used. Since creation, submission of
>> tests is definitely an international effort, we should try to reduce,
>> minimize sources of errors and sources of incompatiblity at design time,
>> at source-coding time.
> 
> I think it should be up to the submitter to deal with this correctly, any potential issues should hopefully be caught in review. The test format page currently says
> 
> "The preferred submission format for CSSWG tests is either XHTML or HTML5, in UTF-8."
> 
> and
> 
> "When using any characters beyond the ASCII set, in any encoding, the character encoding must be specified properly per the specification of the source format."
> 
> This seems sufficient (there is also a mention further down about .htaccess being the way to set HTTP headers).

Yes, there is the capability to set specific HTTP headers for individual files, but doing that is for when a specific header is required for the test, not as the default way to specify the character set of the file.

> 
>> Also, if/when documents are being checked by conformance checkers or
>> validators (add-on validator), they will report missing charset.
>> Eg. Firefox 16.0.1 Error console reports it. It says incorrect
>> characters will be displayed if document contains characters outside
>> us-ascii.
> 
> I think tests (and test resources) should be validated from the server anyway, not locally or by file upload. There could be HTTP headers that matter, e.g. Content-Type overrides <meta charset>.

Note that in addition to the tests possibly being run locally, there are tools running on the server that process the test files, so being able to determine the character set of the file without relying on HTTP headers is important. There's no point to requiring tests to be run from a server (except those testing server interactions, of course), some clients may not be able to access a server...

> 
> Finally, consider tests for things like http://www.w3.org/TR/CSS21/syndata.html#charset - some of these would even *require* the absence of <meta charset> and/or @charset.

Yes, some do require that, that doesn't mean it should be the default behavior for test that don't require it.


The rules are:
1) if the test does not require any characters beyond ASCII, don't use anything beyond ASCII. In this case it is not necessary to specify the encoding, but it doesn't hurt to do so.
2) if you do use characters beyond ASCII, specify the encoding in the file. The only exception is if the test is testing handling of encoding and the encoding needs to not be specified for the test to function (or needs to be specified incorrectly). In this case, avoid non-ASCII in the metadata, or put the metadata in a sidecar file with the appropriate encoding specified in the sidecar file.
3) if the test requires non-ASCII, and the specific encoding doesn't matter for the test, use utf-8.

Received on Tuesday, 16 October 2012 17:13:23 UTC