- From: 신정식 <jshin1987+w3@gmail.com>
- Date: Tue, 20 Oct 2015 01:19:50 -0700
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: Richard Ishida <ishida@w3.org>, www International <www-international@w3.org>
- Message-ID: <CAE1ONj_=+U3g2NWD=W2rJQzgng_TDSfvbN+=NW3Jnit0GcoJ-Q@mail.gmail.com>
On Mon, Oct 19, 2015 at 11:45 AM, Jungshik SHIN (신정식) < jshin1987+w3@gmail.com> wrote: > > > On Mon, Oct 19, 2015 at 5:27 AM, Anne van Kesteren <annevk@annevk.nl> > wrote: > >> On Mon, Oct 19, 2015 at 2:03 PM, Richard Ishida <ishida@w3.org> wrote: >> > 1. i'd be happy to change the mechanism for identifying the output of >> > encoding if i knew how. The problem, it seems to me, with generating >> form >> > submissions is that if you are not looking at the percent escapes >> themselves >> > (ie. comparing within the document, by which time the form submission >> > parameter has been converted to Unicode) you are reliant on decoding to >> work >> > for encoding results to be reliable. It's ok to check the odd character >> > visually by checking the web address bar, but how to do that for tens of >> > thousands of characters? I'd be very happy to know if you have a >> > suggestion. >> >> If you use application/x-www-form-urlencoded (the default) there will >> be no Unicode involved. Just percent-encoded bytes. So if you have >> something on the server that doesn't decode for you, you should be >> able to get at the raw bytes the browser used to encode. >> >> >> > Richard, you can look at what Blink/Webkit's layout tests handle this > issue: > > > https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/LayoutTests/fast/encoding/char-encoding.html > > The test only checks only a handful of code points, but I guess it can be > expanded to cover all the code points. Anyway, it can be a starting point. > > > >> > 2. i suspect that its' actually important for the mechanism of >> converting to >> > href values to work too, so i think that this may still be something >> that >> > needs fixing. If what goes into the href value is not what the user >> > expected, then that is presumably problematic. >> >> Yeah, both should definitely work in the end. Everything needs to >> become predictable for developers. >> > > I agree. After sending my last email, I took a look at Richard's test and > found that out. I'll find out where href got wrong in Chrome and try to > fix. > In Chrome's DOM Inspector JS console, everything is fine (no NFC applied). > var a=document.createElement("a") undefined a <a></a> > a.href="https://example.com/?x" + "樂樂" + "x" "https://example.com/?x樂樂x" > a.search.substr(1) "x%E6%A8%82%EF%A4%94x" It's also fine when the document encoding is UTF-8 (two characters above do not lose their 'identity' folded into one). However, in EUC-KR, the distinction between them is lost apparently because they're subject to NFC. I've just filed a Chrome bug : https://code.google.com/p/chromium/issues/detail?id=545383 Jungshik > Jungshik > > > >> >> >> -- >> https://annevankesteren.nl/ >> > >
Received on Tuesday, 20 October 2015 08:20:29 UTC