RE: IRIs and bidirectional formatting characters

FWIW, I tried to capture the then-state of this discussion at [1]. It handwaves about why LRM/RLM and friends are not desired in a URL: it focused on documenting what the URL presentation problem is. Last I heard there was no consensus on the approaches in 3987bis or the bidi guidelines draft and browser vendors were (more or less) ignoring the problem.

I don't think the goal was to prohibit the presence of bidi controls in URLs, but rather to allow for their absence (and clear a path to not requiring them). Many URLs are auto-generated from data, for example to support SEO. While the Unicode bidi controls could be valid to include in a path component, the bidi problem is not *inside* a given path component--it exists at the URL component boundaries--at URL-meaningful characters such as ./?&=# that are bidi neutral.

> > Formatting characters are not visible in display, so looking at an IRI
> > containing those invisible characters, one can be misled as to what
> > the real content is. One of the consequences is that one would not be
> > able to recreate the IRI content when typing it.

What's more, the presence or absence of the characters may cause false matches/mismatches at the server.

Addison

[1] https://www.w3.org/International/wiki/IRIStatus


> -----Original Message-----
> From: Anne van Kesteren [mailto:annevk@annevk.nl]
> Sent: Tuesday, August 25, 2015 7:18 AM
> To: Lina Kemmel
> Cc: Martin Dürst; Larry Masinter; www-international@w3.org
> Subject: Re: IRIs and bidirectional formatting characters
> 
> On Tue, Aug 25, 2015 at 2:14 PM, Lina Kemmel <LKEMMEL@il.ibm.com>
> wrote:
> > 1. When bidi formatting characters constitute an integral part of the
> > content.
> > Formatting characters are not visible in display, so looking at an IRI
> > containing those invisible characters, one can be misled as to what
> > the real content is. One of the consequences is that one would not be
> > able to recreate the IRI content when typing it.
> > (BTW that's applicable to any UCCs, not necessarily bidi ones.)
> 
> Even ASCII, right? rn vs m et al.
> 
> 
> > Does the RFC suggest anything to fix appearance of bidirectional IRIs
> > instead?
> 
> No, this is still a matter of research. To be clear, the IETF has published
> 
>   https://tools.ietf.org/html/rfc3987

>   https://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines

> 
> on the subject, but neither is very conclusive or adopted as such by user
> agents. E.g., studying this security issue reported against Chrome might be of
> interest:
> 
>   https://code.google.com/p/chromium/issues/detail?id=351639

> 
> 
> The reason I'm working on this is
> 
>   https://www.w3.org/Bugs/Public/show_bug.cgi?id=27641

> 
> which tries to figure out what
> 
>   https://url.spec.whatwg.org/

> 
> needs to say on the subject. I suspect the considerations mostly need to give
> advice as how to best display URLs, even with bidirectional code points going
> around. The Chrome issue so far has the best leads as to what that might be.
> 
> 
> --
> https://annevankesteren.nl/

Received on Tuesday, 25 August 2015 16:32:17 UTC