Re: IRIs and bidirectional formatting characters

On Tue, Aug 25, 2015 at 04:18:23PM +0200, Anne van Kesteren wrote:
> On Tue, Aug 25, 2015 at 2:14 PM, Lina Kemmel <LKEMMEL@il.ibm.com> wrote:
> > 1. When bidi formatting characters constitute an integral part of the
> > content.
> > Formatting characters are not visible in display, so looking at an IRI
> > containing those invisible characters, one can be misled as to what the
> > real content is. One of the consequences is that one would not be able
> > to recreate the IRI content when typing it.
> > (BTW that's applicable to any UCCs, not necessarily bidi ones.)
> 
> Even ASCII, right? rn vs m et al.

I think that's oversimplifying.  In the case of rn vs. m and so on,
with a clear font and sufficient size it is at least possible to see
the difference.  With formatting characters, it is not possible _by
definition_ to see them, any more than control characters and so on.

The various different confusion problem cases are already hard enough
without adding the additional complication of lumping them all
together as though they are one problem.  Since we can make a
distinction, between these different classes, it's probably wisest to
keep the distinction in mind.

> No, this is still a matter of research. To be clear, the IETF has published
> 
>   https://tools.ietf.org/html/rfc3987
>   https://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines

There's no question that IRIs are a mess in this aspect, and the topic
is not getting improved by the fairly low engagement around the IETF
of people worried about i18n.  It could use some help in this area,
actually, so if anyone has any spare cycles (ha!) I have some mailing
lists to suggest.

Best regards,

A

-- 
Andrew Sullivan
ajs@anvilwalrusden.com

Received on Wednesday, 26 August 2015 04:40:09 UTC