- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 29 Aug 2014 12:55:02 +0100
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, www International <www-international@w3.org>, Anne van Kesteren <annevk@annevk.nl>, Philippe Le Hegaret <plh@w3.org>
On 29/08/2014 11:51, "Martin J. Dürst" wrote: > On 2014/08/28 18:59, Richard Ishida wrote: >> On 28/08/2014 10:25, "Martin J. Dürst" wrote: > >>> On 2014/08/24 01:39, Richard Ishida wrote: > >>>> Those of you who saw that page before should note that the results are >>>> now slightly different. I haven't tracked down the cause, but I suspect >>>> that silent codepoint changes in my editor were to blame for the >>>> initial >>>> discrepancies. > > The differences between your earlier version of the tests and your later > version of the tests can be explained that way. > >>> I have tried to find such a case. I found that for windows-1253, my >>> tests give "expected "U+FFFD" but got "ª" (U+00AA)" for 0xAA, but Chrome >>> is listed green for windows-1253 (incl. aliases) at >>> http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases. >>> >>> >>> My version of Chrome is "37.0.2062.94 m". I also found 8 errors for my >>> tests on windows-874. > > This difference (i.e. between my tests and your later version of the > tests) still remains unexplained. There are no correspondences listed in the Encoding index for those 8 codepoints. The file http://encoding.spec.whatwg.org/index-windows-874.txt has two adjacent lines: 90 0x0E3A ฺ (THAI CHARACTER PHINTHU) 95 0x0E3F ฿ (THAI CURRENCY SYMBOL BAHT) and ends at line 123. This leaves an overall gap of 8 lines for which no correspondence is listed. Your test files provided lines for those missing from the index, eg. <td class='test' title='0xDB->U+FFFD'></td> <td class='test' title='0xDC->U+FFFD'></td> <td class='test' title='0xDD->U+FFFD'></td> <td class='test' title='0xDE->U+FFFD'></td> and <td class='test' title='0xFC->U+FFFD'></td> <td class='test' title='0xFD->U+FFFD'></td> <td class='test' title='0xFE->U+FFFD'></td> <td class='test' title='0xFF->U+FFFD'></td> and indicate that you expect to get U+FFFD as a result, but other characters appear as the textContent of the td, rather than U+FFFD. I don't know where you got those characters from. In my test files, I use � for the textContent of the td, eg. "<td class='test' title='0xdb->U+FFFD'>�</td>"+ "<td class='test' title='0xdc->U+FFFD'>�</td>"+ "<td class='test' title='0xdd->U+FFFD'>�</td>"+ "<td class='test' title='0xde->U+FFFD'>�</td>"+ and that appears to match the behaviour of the browsers. I think the windows-1253 problem you mention results from the same circumstance. The Encoding index file has no line for pointer 42. 41 0x00A9 © (COPYRIGHT SIGN) 43 0x00AB « (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK) Your file says: <td class='test' title='0xAA->U+FFFD'>ª</td> Does that solve the mystery? RI
Received on Friday, 29 August 2014 11:55:36 UTC