W3C home > Mailing lists > Public > www-international@w3.org > October to December 2015

Re: Tests for Encoding spec

From: 신정식 <jshin1987+w3@gmail.com>
Date: Sun, 18 Oct 2015 00:53:36 -0700
Message-ID: <CAE1ONj9DmJx5ivd9-EYAZMZwZfJABrWF6m9HL3Xa-B7CRzptvA@mail.gmail.com>
To: Richard Ishida <ishida@w3.org>
Cc: www International <www-international@w3.org>, Anne van Kesteren <annevk@annevk.nl>
On Fri, Oct 16, 2015 at 12:30 PM, Jungshik SHIN (신정식) <
jshin1987+w3@gmail.com> wrote:

> I was surprised to see > 200 failures for EUC-KR encoding in Chrome
> because Chrome's copy of ICU has EUC-KR table automatically generated from
> the encoding spec's index file for EUC-KR.
>
>
> http://www.w3.org/International/tests/repo/run?base=encoding&batch=encoding-dbl-byte&test=legacy-mb-korean/euc-kr/euckr-encode.html
>
> Virtually all of them are due to an NFC performed by Chrome at some point.
> For instance, U+2126 is normalized to U+03A9 before being encoded to
> EUC-KR. Most others failures are due to CJK Compatibility characters being
> mapped to their corresponding canonical characters. (Chrome bug :
> https://code.google.com/p/chromium/issues/detail?id=544242 )
>
> The same is true of Shift_JIS failures (23 out of 24).
>


How are you testing encoding (Unicode -> char encoding) ?  My test with a
form submission (below) with U+03A9 and U+2126  indicates that Chrome does
not NFC them before converting to EUC-KR. Try the following page. Submit
the form and see the URL.

http://www.i18nl10n.com/chrome/euckr_form.html


Virtually all the encoding errors in your test results for EUC-KR,
Shift_JIS, Big5 (except for a couple of them where Chrome's table has not
been updated per recent spec changes; e.g. U+2022 in Shift_JIS mentioned in
my previous email ) are due to the way your test is conducted.

With a form submission test (which is most relevant for 'encoding'), those
code points would be encoded per spec.

Jungshik


>
> The first one (U+2022) : Chrome's table was not updated to cover the
> following new change in the spec:
>  https://encoding.spec.whatwg.org/#shift_jis
>
>    If code point is U+2022, set it to U+FF0D.
>
> I'll update our SJIS table.
>
> BTW, the following summary is incorrect:
>
>
>    1. sjis-encode: Total characters tested 7,326. Firefox fails for 1, *Opera
>    and Safari *for 24. Edge fails because the test doesn't work in that
>    browser. (Characters are not converted to percent-escapes in the href
>    attribute.)
>
>
> 'Opera and Safari for 24' should be 'Chrome and Opera for 24'.
>
> Jungshik
>
>
>
>
> On Fri, Oct 16, 2015 at 8:32 AM, Richard Ishida <ishida@w3.org> wrote:
>
>> fyi, i just published two pages pointing to Encoding spec tests:
>>
>> 1. http://www.w3.org/International/tests/repo/results/encoding-sb-dec
>> moves pre-existing tests to our new i18n test framework, but also adds
>> some changes to koi8-u and a new test for koi8-ru, to conform to the latest
>> Encoding spec text.  I also drafted the results for the major desktop
>> browsers.  Apart from support for koi8-u, there have been many improvements
>> since the last time the test results were recorded.
>>
>> 2. http://www.w3.org/International/tests/repo/results/encoding-dbl-byte
>> these are tests for some double-byte encodings.  In some cases the test
>> needs some attention still, so the results are so far tentative.
>>
>> we are working on producing more tests, and would welcome any offers to
>> help.  There are a few tests provided by Anne & co for which we don't yet
>> display results, but we will try to do so. However, we also need to develop
>> more.
>>
>> ri
>>
>>
>
Received on Sunday, 18 October 2015 07:54:04 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:09 UTC