Re: Encoding single-byte tests from Richard Ishida on 2014-08-28 (www-international@w3.org from July to September 2014)

From: Richard Ishida <ishida@w3.org>
Date: Thu, 28 Aug 2014 10:59:08 +0100
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, www International <www-international@w3.org>, Anne van Kesteren <annevk@annevk.nl>, Philippe Le Hegaret <plh@w3.org>
Message-ID: <53FEFD6C.1010506@w3.org>

On 28/08/2014 10:25, "Martin J. Dürst" wrote:
> Hello Richard,
>
> On 2014/08/24 01:39, Richard Ishida wrote:
>> I have just uploaded a set of tests for the Encoding specification. They
>> assess whether browsers support single-byte encodings as described in
>> the Encoding spec[1] indexes, both for preferred encoding labels and
>> aliases.
>>
>> See the results at
>> http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases.
>>
>> As usual you can link to the tests from there.
>>
>> I incorporated a lot of useful work done by Martin Dürst (thanks
>> Martin!) into the test format, but ended up rewriting the tests to
>> provide the flexibility I needed for extending them to the alias labels.
>
> If I can incorporate that into my stuff to help you, please let me know
> (your previous mail mentioned 'four additional lines'; my guess is this
> is more).

Thanks Martin. Yes, this is pretty much a complete rewrite, and it 
produces a single result per test, rather than one result per character. 
I just incorporated a couple of things from your version - mainly the 
comparison script and the markup structure for the table and its CSS - 
although the latter is now stored as a javascript file that is shared 
among all aliases for a given encoding (which are themselves generated 
by a small script). That javascript file is generated directly from 
Anne's indexes, and the fact that it's a separate file means that (a) I 
don't need to touch them with an editor, (b) I'm able to edit the other 
parts of the test file without touching them (eg. to update metadata, 
etc), and (c) I only have to generate the same number of files as there 
are encodings, rather than that plus one for all of the aliases.

>
>> There is another results page at
>> http://www.w3.org/International/tests/repository/encoding/indexes/results-indexes
>>
>> that shows just the preferred labels, but it has less information about
>> partial passes.
>
> What do you consider a 'partial pass'? Does it mean that the label is
> recognized but some codepoints are not transcoded correctly?

Yes. Or, put another way, some (usually most) of the codepoints are 
decoded as expected, but not all. The details about what are not matched 
are given in the test page and summarised on the results page.

>
>> Those of you who saw that page before should note that the results are
>> now slightly different. I haven't tracked down the cause, but I suspect
>> that silent codepoint changes in my editor were to blame for the initial
>> discrepancies.
>
> I have tried to find such a case. I found that for windows-1253, my
> tests give "expected "U+FFFD" but got "ª" (U+00AA)" for 0xAA, but Chrome
> is listed green for windows-1253 (incl. aliases) at
> http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases.
> My version of Chrome is "37.0.2062.94 m". I also found 8 errors for my
> tests on windows-874.
>
> I strongly recommend to not use an editor that silently changes
> codepoints for work with tests.

Yes, I agree. I didn't initially realise that my editor (TextWrangler) 
was doing that.

> Maybe I should go as far as suggesting
> to not use *any* editor for tests like these, because one never can be
> sure. The tests have to use the actual codepoints, even in the C0 range
> (except for 0x00 and 0x0D, which aren't testable).

Yes. See above where I mention that I generate the crucial bit 
automatically.

Cheers,
RI

Received on Thursday, 28 August 2014 09:59:42 UTC