RE: IRIs and bidirectional formatting characters

> > This was a (perhaps "the") main sticking point for IRI and it's thorny: there is
> no obvious solution for all use cases, just sets of compromises or potential
> things we could try to enforce.
> 
> I added something to this effect to the standard:
> 
>   https://url.spec.whatwg.org/#url-rendering

> 
> https://github.com/whatwg/url/commit/d1152b94a16ae91e1f72d128fd5ef58

> 9635f0e7c
> 

I think that's a good start, although it doesn't clarify what the problem is. I understand why you don't want to pollute the standard with extravagant illustrations. If I18N WG published a suitable reference (a Note or the update to [1] which we need to do anyway at some point) would that help?

A couple additional points:

You have a suggestion for presentation in the browser address bar. Since that *is* a special environment, perhaps a bit more could be added to the suggestion. For example, you might recommend to make the display (as if) each of the path, query, and fragment start with FSI (U+2068) and then surround sequences made up of 'userinfo encode set' with PDI (U+2069) before and FSI (U+2068) after (bidi experts: do I have that right? Or would we want LSI?). Although the display would differ from the plain text presentation, it would not be confusable.

I think the note is a little weak. Currently says:

--
Unfortunately, as rendered URLs are simply strings and can appear anywhere, producing a specific bidirectional algorithm for URLs would unlikely see wide adoption and therefore it is better to embrace how they are rendered today.
--

Suggest:

--
Unfortunately, as URLs are simply strings and can appear anywhere, producing a specific bidirectional algorithm for URLs would be unlikely to see wide adoption. Bidirectional text interacts with the parts of a URL in ways that can cause the presentation to be different from what is actually encoded. Users of bidirectional languages are thus cautioned that this is to be expected, particularly in plain text environments.
--

> >
> > I think that's oversimplifying.  In the case of rn vs. m and so on,
> > with a clear font and sufficient size it is at least possible to see
> > the difference.  With formatting characters, it is not possible _by
> > definition_ to see them, any more than control characters and so on.
> 
> That is a good point. In the above commit I attempted to address this, but
> mentioning one such case.

Adding the Latin/Cyrillic 'a' is good, but maybe add a reference to UTR#36? While that report focuses mostly on domain names, many of the considerations also apply to URLs in general.

Addison

[1] www.org/TR/charmod-resid

Received on Wednesday, 26 August 2015 18:36:47 UTC