Re: Bidi IRI with a Bidi TLD

   Hello, Adil!

I tend to like your proposal, but I have not yet formed a definitive 
opinion. However I can already comment that your third rule seems to me 
too restrictive.
You wrote:
 3. The characters of a registered domain MUST match the Unicode bidi 
class of the TLD if the TLD is an RTL-TLD.
Since your rules 1 and 2 constrain an RTL-TLD to contain only characters 
with bidi class R or only with bidi class AL, this rule forbids domain 
names such as "ABC-DE" although hyphen is allowed even in LDH labels. I 
think that rule 3 should be relaxed to allow innocuous characters to 
appear at innocuous locations (inside a label and not at its ends).
I leave it to you to define what are "innocuous characters". At first 
glance, I would say anything except characters with bidi class L, but this 
needs some more reflection.


Shalom (Regards),  Mati
       Bidi Architect
       Globalization Center Of Competency - Bidirectional Scripts
       IBM Israel
       Mobile: +972 52 2554160




From:   Adil Allawi <adil@diwan.com>
To:     "public-iri@w3.org" <public-iri@w3.org>
Cc:     "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Date:   05/11/2011 23:24
Subject:        Bidi IRI with a Bidi TLD



Dear all,

Following is my suggestion for a new section in the bidi iri document. 
This is to improve useability. The main point of this proposal is that the 
domain and TLD always appear together in a URL so that a user can read, 
enter, highlight and copy it. Also, so that a user looking at a bi-di URL 
will always recognize the domain part.

The restriction I propose is only for the main domain eg. the "google" in 
"google.com" not the subdomain e.g. the "translate" in 
"translate.google.com". That is name registered with the domain registrar.

Adil

------

Restrictions on domain names for Top Level Domains (TLDs)

Definition: Right-To-Left Top Level Domains (RTL-TLD). These are top-level 
domains that are in languages using right-to-left characters. Namely the 
Unicode bidi class of the characters that make up the TLD is either R or 
AL (see UAX 9).

As an IRI must always be rendered left-to-right (see section 2) there 
exists a number of cases where an RTL-TLD will render in a way that is 
visually unclear what the TLD is in a particular URL. For example:

Logical representation: http://abc.def.GHI/JKL
Visual representation: http://abc.def.LKJ/IHG

In the above case the path appears after the registered domain and is in 
the visual location of the TLD. This can confuse the reader as to which is 
the actual TLD. In order to restrict such confusing cases the following 
rules will apply:

   1. An RTL-TLD is a TLD which is in a language where the characters draw 
right to left. An LTR-TLD is a TLD which is in a language where the 
characters draw left to right.
   2. The characters in an RTL-TLD MUST always be of the same Unicode bidi 
class.
   3. The characters of a registered domain MUST match the Unicode bidi 
class of the TLD if the TLD is an RTL-TLD.
   4. if the characters of a registered domain contain more than one bidi 
class, the domain MUST be registered to an LTR-TLD.

The restriction of MUST guarantees that the registered domain and its 
corresponding TLD will always appear together and in the same order in all 
possible IRIs. There may be cases where numbers and bidi neutral 
characters may be reordered by the Unicode bidi algorithm in a way that 
changes their visual position relative to the TLD. The above rules prevent 
such cases. If the domain registrar needs to register a name that contains 
characters that are mixed direction (e.g. contains numbers, punctuation or 
LTR characters) then the domain can still be registered with a TLD that 
has left to right characters.

Examples:

A. This is a good case - the TLD is visually followed by the domain:

Logical representation: http://ABC.DEF.GHI/jkl
Visual representation: http://IHG.FED.CBA/jkl

B. With an LTR second level domain there is a sub-optimal case where the 
path appears next to the sub-domain. But in this case it is still clear 
where the TLD and registered domain are in the IRI:

Logical representation: http://abc.DEF.GHI/JKL
Visual representation: http://abc.LKJ/IHG.FED 

Received on Sunday, 6 November 2011 08:26:44 UTC