- From: Phillips, Addison <addison@lab126.com>
- Date: Tue, 25 Aug 2015 16:31:55 +0000
- To: Anne van Kesteren <annevk@annevk.nl>, Lina Kemmel <LKEMMEL@il.ibm.com>
- CC: Martin Dürst <duerst@it.aoyama.ac.jp>, Larry Masinter <masinter@adobe.com>, "www-international@w3.org" <www-international@w3.org>
FWIW, I tried to capture the then-state of this discussion at [1]. It handwaves about why LRM/RLM and friends are not desired in a URL: it focused on documenting what the URL presentation problem is. Last I heard there was no consensus on the approaches in 3987bis or the bidi guidelines draft and browser vendors were (more or less) ignoring the problem. I don't think the goal was to prohibit the presence of bidi controls in URLs, but rather to allow for their absence (and clear a path to not requiring them). Many URLs are auto-generated from data, for example to support SEO. While the Unicode bidi controls could be valid to include in a path component, the bidi problem is not *inside* a given path component--it exists at the URL component boundaries--at URL-meaningful characters such as ./?&=# that are bidi neutral. > > Formatting characters are not visible in display, so looking at an IRI > > containing those invisible characters, one can be misled as to what > > the real content is. One of the consequences is that one would not be > > able to recreate the IRI content when typing it. What's more, the presence or absence of the characters may cause false matches/mismatches at the server. Addison [1] https://www.w3.org/International/wiki/IRIStatus > -----Original Message----- > From: Anne van Kesteren [mailto:annevk@annevk.nl] > Sent: Tuesday, August 25, 2015 7:18 AM > To: Lina Kemmel > Cc: Martin Dürst; Larry Masinter; www-international@w3.org > Subject: Re: IRIs and bidirectional formatting characters > > On Tue, Aug 25, 2015 at 2:14 PM, Lina Kemmel <LKEMMEL@il.ibm.com> > wrote: > > 1. When bidi formatting characters constitute an integral part of the > > content. > > Formatting characters are not visible in display, so looking at an IRI > > containing those invisible characters, one can be misled as to what > > the real content is. One of the consequences is that one would not be > > able to recreate the IRI content when typing it. > > (BTW that's applicable to any UCCs, not necessarily bidi ones.) > > Even ASCII, right? rn vs m et al. > > > > Does the RFC suggest anything to fix appearance of bidirectional IRIs > > instead? > > No, this is still a matter of research. To be clear, the IETF has published > > https://tools.ietf.org/html/rfc3987 > https://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines > > on the subject, but neither is very conclusive or adopted as such by user > agents. E.g., studying this security issue reported against Chrome might be of > interest: > > https://code.google.com/p/chromium/issues/detail?id=351639 > > > The reason I'm working on this is > > https://www.w3.org/Bugs/Public/show_bug.cgi?id=27641 > > which tries to figure out what > > https://url.spec.whatwg.org/ > > needs to say on the subject. I suspect the considerations mostly need to give > advice as how to best display URLs, even with bidirectional code points going > around. The Chrome issue so far has the best leads as to what that might be. > > > -- > https://annevankesteren.nl/
Received on Tuesday, 25 August 2015 16:32:17 UTC