Re: Tests for Encoding spec from 신정식 on 2015-10-20 (www-international@w3.org from October to December 2015)

From: 신정식 <jshin1987+w3@gmail.com>
Date: Tue, 20 Oct 2015 01:19:50 -0700
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Richard Ishida <ishida@w3.org>, www International <www-international@w3.org>
Message-ID: <CAE1ONj_=+U3g2NWD=W2rJQzgng_TDSfvbN+=NW3Jnit0GcoJ-Q@mail.gmail.com>

On Mon, Oct 19, 2015 at 11:45 AM, Jungshik SHIN (신정식) <
jshin1987+w3@gmail.com> wrote:

>
>
> On Mon, Oct 19, 2015 at 5:27 AM, Anne van Kesteren <annevk@annevk.nl>
> wrote:
>
>> On Mon, Oct 19, 2015 at 2:03 PM, Richard Ishida <ishida@w3.org> wrote:
>> > 1. i'd be happy to change the mechanism for identifying the output of
>> > encoding if i knew how.  The problem, it seems to me, with generating
>> form
>> > submissions is that if you are not looking at the percent escapes
>> themselves
>> > (ie. comparing within the document, by which time the form submission
>> > parameter has been converted to Unicode) you are reliant on decoding to
>> work
>> > for encoding results to be reliable.  It's ok to check the odd character
>> > visually by checking the web address bar, but how to do that for tens of
>> > thousands of characters?  I'd be very happy to know if you have a
>> > suggestion.
>>
>> If you use application/x-www-form-urlencoded (the default) there will
>> be no Unicode involved. Just percent-encoded bytes. So if you have
>> something on the server that doesn't decode for you, you should be
>> able to get at the raw bytes the browser used to encode.
>>
>>
>>
> Richard, you can look at what Blink/Webkit's layout tests handle this
> issue:
>
>
> https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/LayoutTests/fast/encoding/char-encoding.html
>
> The test only checks only a handful of code points, but I guess it can be
> expanded to cover all the code points. Anyway, it can be a starting point.
>
>
>
>> > 2. i suspect that its' actually important for the mechanism of
>> converting to
>> > href values to work too, so i think that this may still be something
>> that
>> > needs fixing.  If what goes into the href value is not what the user
>> > expected, then that is presumably problematic.
>>
>> Yeah, both should definitely work in the end. Everything needs to
>> become predictable for developers.
>>
>
> I agree. After sending my last email, I took a look at Richard's test and
> found that out. I'll find out where href got wrong in Chrome and try to
> fix.
>

In Chrome's DOM Inspector JS  console, everything is fine (no NFC applied).

> var a=document.createElement("a")
undefined
a
<a></a>
> a.href="https://example.com/?x" + "樂樂" + "x"
"https://example.com/?x樂樂x"
> a.search.substr(1)
"x%E6%A8%82%EF%A4%94x"

It's also fine when the document encoding is UTF-8 (two characters above do
not lose their 'identity' folded into one).

However, in EUC-KR, the distinction between them is lost apparently because
they're subject to NFC.

I've just filed a Chrome bug :
https://code.google.com/p/chromium/issues/detail?id=545383

Jungshik



> Jungshik
>
>
>
>>
>>
>> --
>> https://annevankesteren.nl/
>>
>
>

Received on Tuesday, 20 October 2015 08:20:29 UTC