- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 29 Aug 2014 12:55:02 +0100
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, www International <www-international@w3.org>, Anne van Kesteren <annevk@annevk.nl>, Philippe Le Hegaret <plh@w3.org>
On 29/08/2014 11:51, "Martin J. Dürst" wrote:
> On 2014/08/28 18:59, Richard Ishida wrote:
>> On 28/08/2014 10:25, "Martin J. Dürst" wrote:
>
>>> On 2014/08/24 01:39, Richard Ishida wrote:
>
>>>> Those of you who saw that page before should note that the results are
>>>> now slightly different. I haven't tracked down the cause, but I suspect
>>>> that silent codepoint changes in my editor were to blame for the
>>>> initial
>>>> discrepancies.
>
> The differences between your earlier version of the tests and your later
> version of the tests can be explained that way.
>
>>> I have tried to find such a case. I found that for windows-1253, my
>>> tests give "expected "U+FFFD" but got "ª" (U+00AA)" for 0xAA, but Chrome
>>> is listed green for windows-1253 (incl. aliases) at
>>> http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases.
>>>
>>>
>>> My version of Chrome is "37.0.2062.94 m". I also found 8 errors for my
>>> tests on windows-874.
>
> This difference (i.e. between my tests and your later version of the
> tests) still remains unexplained.
There are no correspondences listed in the Encoding index for those 8
codepoints. The file
http://encoding.spec.whatwg.org/index-windows-874.txt has two adjacent
lines:
90 0x0E3A ฺ (THAI CHARACTER PHINTHU)
95 0x0E3F ฿ (THAI CURRENCY SYMBOL BAHT)
and ends at line 123. This leaves an overall gap of 8 lines for which
no correspondence is listed.
Your test files provided lines for those missing from the index, eg.
<td class='test' title='0xDB->U+FFFD'></td>
<td class='test' title='0xDC->U+FFFD'></td>
<td class='test' title='0xDD->U+FFFD'></td>
<td class='test' title='0xDE->U+FFFD'></td>
and
<td class='test' title='0xFC->U+FFFD'></td>
<td class='test' title='0xFD->U+FFFD'></td>
<td class='test' title='0xFE->U+FFFD'></td>
<td class='test' title='0xFF->U+FFFD'></td>
and indicate that you expect to get U+FFFD as a result, but other
characters appear as the textContent of the td, rather than U+FFFD. I
don't know where you got those characters from.
In my test files, I use � for the textContent of the td, eg.
"<td class='test' title='0xdb->U+FFFD'>�</td>"+
"<td class='test' title='0xdc->U+FFFD'>�</td>"+
"<td class='test' title='0xdd->U+FFFD'>�</td>"+
"<td class='test' title='0xde->U+FFFD'>�</td>"+
and that appears to match the behaviour of the browsers.
I think the windows-1253 problem you mention results from the same
circumstance. The Encoding index file has no line for pointer 42.
41 0x00A9 © (COPYRIGHT SIGN)
43 0x00AB « (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK)
Your file says:
<td class='test' title='0xAA->U+FFFD'>ª</td>
Does that solve the mystery?
RI
Received on Friday, 29 August 2014 11:55:36 UTC