Re: Bidi issues from Martin Duerst on 2003-08-07 (public-iri@w3.org from August 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 07 Aug 2003 10:55:18 -0400
To: IETF IMAA list <ietf-imaa@imc.org>, ietf-imaa@imc.org
Cc: public-iri@w3.org
Message-Id: <4.2.0.58.J.20030807103622.053204e8@localhost>
Hello Roy,

Adam provides a pretty good explanation. We have also adopted
very much the same rules for IRIs (which contain domain names).
For the current (internal) draft, see
http://www.w3.org/International/iri-edit/draft-duerst-iri.html#Bidi
and please also have a look at the examples at
http://www.w3.org/International/iri-edit/BidiExamples.

Some more details below.

[I have copied public-iri@w3.org because I'm mentioning the IRI solution]

At 02:41 03/08/07 +0000, Adam M. Costello wrote:

>Roy Badami <roy@gnomon.org.uk> wrote:

>As for IMAA, I have no doubt that some sort of bidi check is needed for
>the local part, for the same reasons it is needed for domain labels.
>And I have no doubt that the bidi check in Stringprep is sufficient and
>overkill, just as it is for domain labels.  The only difference is how
>much overkill.  Consider the address foo.bar@example.net.  Stringprep
>is applied to "example" and to "net" independently, but it is applied
>to "foo.bar" all together.  Therefore there might exist strings that
>would be valid domain names but not valid local parts.

Yes, in particular if e.g. foo is Arabic and bar is Latin, or so.
Sounds like a rather strict restriction to me that there should be
absolutely no way to fit these two alphabets into the local part.
But I guess Roy and others can judge better how much pain that
will be in practice.


>I'm not sure--it
>depends on how ASCII dots influence the bidi algorithm.

No, it doesn't depend on the bidi algorithm, just on stringprep.


>But that might be a good thing.  The user interface might understand
>that example.net is a domain name composed of labels, and would be able
>to override the bidi algorithm if necessary to preserve the proper order
>of the labels.  But the local part is an opaque string (except possibly
>when viewed by the mail exchanger for example.net) and is therefore
>fully subject to the bidi algorithm, and therefore needs the protection
>of having Stringprep's bidi check applied to the whole thing.

Please note that currently, IDNA does not specify how to display
domain names. IMAA may do that, but I don't think it does.
For IRIs, we have decided that the only context an application
needs is to know is that overall, this is an identifier. This
simplifies implementation quite a bit. In many cases, in particular
for all-rtl cases, this knowledge is not even necessary, and even
if an IRI is not embedded in explicit LTR context, the single
missing bit still can be guessed easily.

This is based on feedback we have received both from Israelis
and from Arabs. For example, whenever Arabs have presented
Arabic domain names, they always did it as MOC.BARA.BEW
(inverting not only each component of web.arab.com, but
also the order of the labels) rather than BEW.BARA.MOC
(just inverting the labels internally).

The solution makes it easier for people without any specific
knowledge of IRI/mail address/domain name syntax to read
these things in the right order.

So the proposed display algorithm for IRIs doesn't need any
internal knowledge of the IRI (or email address, or domain
name) structure. So there is no need to apply the bidi check
to the whole left hand part if that's not deemed appropriate.


Regards,    Martin.
Received on Thursday, 7 August 2003 11:04:58 UTC