- From: 신정식 <jshin1987+w3@gmail.com>
- Date: Wed, 21 Oct 2015 23:38:47 -0700
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: Richard Ishida <ishida@w3.org>, www International <www-international@w3.org>
- Message-ID: <CAE1ONj8=v7P1B14au8GjjoJzfeEsRmF85pQ46x6dQOU9zapSfA@mail.gmail.com>
On Tue, Oct 20, 2015 at 1:19 AM, Jungshik SHIN (신정식) <jshin1987+w3@gmail.com > wrote: > > > On Mon, Oct 19, 2015 at 11:45 AM, Jungshik SHIN (신정식) < > jshin1987+w3@gmail.com> wrote: > >> >> >> On Mon, Oct 19, 2015 at 5:27 AM, Anne van Kesteren <annevk@annevk.nl> >> wrote: >> >>> On Mon, Oct 19, 2015 at 2:03 PM, Richard Ishida <ishida@w3.org> wrote: >>> > 1. i'd be happy to change the mechanism for identifying the output of >>> > encoding if i knew how. The problem, it seems to me, with generating >>> form >>> > submissions is that if you are not looking at the percent escapes >>> themselves >>> > (ie. comparing within the document, by which time the form submission >>> > parameter has been converted to Unicode) you are reliant on decoding >>> to work >>> > for encoding results to be reliable. It's ok to check the odd >>> character >>> > visually by checking the web address bar, but how to do that for tens >>> of >>> > thousands of characters? I'd be very happy to know if you have a >>> > suggestion. >>> >>> If you use application/x-www-form-urlencoded (the default) there will >>> be no Unicode involved. Just percent-encoded bytes. So if you have >>> something on the server that doesn't decode for you, you should be >>> able to get at the raw bytes the browser used to encode. >>> >>> >>> >> Richard, you can look at what Blink/Webkit's layout tests handle this >> issue: >> >> >> https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/LayoutTests/fast/encoding/char-encoding.html >> >> The test only checks only a handful of code points, but I guess it can be >> expanded to cover all the code points. Anyway, it can be a starting point. >> >> >> >>> > 2. i suspect that its' actually important for the mechanism of >>> converting to >>> > href values to work too, so i think that this may still be something >>> that >>> > needs fixing. If what goes into the href value is not what the user >>> > expected, then that is presumably problematic. >>> >>> Yeah, both should definitely work in the end. Everything needs to >>> become predictable for developers. >>> >> >> I agree. After sending my last email, I took a look at Richard's test and >> found that out. I'll find out where href got wrong in Chrome and try to >> fix. >> > > In Chrome's DOM Inspector JS console, everything is fine (no NFC > applied). > > > var a=document.createElement("a") > undefined > a > <a></a> > > a.href="https://example.com/?x" + "樂樂" + "x" > "https://example.com/?x樂樂x" > > a.search.substr(1) > "x%E6%A8%82%EF%A4%94x" > > It's also fine when the document encoding is UTF-8 (two characters above > do not lose their 'identity' folded into one). > > However, in EUC-KR, the distinction between them is lost apparently > because they're subject to NFC. > > I've just filed a Chrome bug : > https://code.google.com/p/chromium/issues/detail?id=545383 > This bug was fixed. With Chrome's canary build ('nightly' build), EUC-KR encoding is 100%. SJIS and EUC-JP encoding failed only one code point (which will be fixed when I update the Chrome's mapping table per the latest spec). Big5 encoding failed only 4 code points (ditto with SJIS/EUC-JP). Jungshik > > Jungshik > > > >> Jungshik >> >> >> >>> >>> >>> -- >>> https://annevankesteren.nl/ >>> >> >> >
Received on Thursday, 22 October 2015 06:39:17 UTC