Re: Bidi IRI with a Bidi TLD

Hello Adil,

Many thanks for your proposal!

On 2011/11/06 6:23, Adil Allawi wrote:
>    Dear all,
>
> Following is myt suggestion for a new section in the bidi iri document. This is
> to improve useability. The main point of this proposal is that the domain and
> TLD always appear together in a URL so that a user can read, enter, highlight
> and copy it. Also, so that a user looking at a bi-di URL will always recognize
> the domain part.
>
> The restriction I propose is only for the main domain eg. the "google" in
> "google.com" not the subdomain e.g. the "translate" in "translate.google.com".
> That is name registered with the domain registrar.
>
> Adil
>
> ------
>
> *Restrictions on domain names for Top Level Domains (TLDs)*
>
> *Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are top-level
> domains that are in languages using right-to-left characters. Namely the Unicode
> bidi class of the characters that make up the TLD is either R or AL (see UAX 9).
>
> As an IRI must always be rendered left-to-right (see section 2) there exists a
> number of cases where an RTL-TLD will render in a way that is visually unclear
> what the TLD is in a particular URL. For example:
>
> Logical representation: http://abc.def.GHI/JKL
> Visual representation: http://abc.def.LKJ/IHG
>
> In the above case the path appears after the registered domain and is in the
> visual location of the TLD. This can confuse the reader as to which is the
> actual TLD. In order to restrict such confusing cases the following rules will
> apply:
>
>      1. An RTL-TLD is a TLD which is in a language where the characters draw
> right to left. An LTR-TLD is a TLD which is in a language where the characters
> draw left to right.
>      2. The characters in an RTL-TLD MUST always be of the same Unicode bidi class.

I think that for the two rules above, we could instead refer to the 
rules in RFC 5893, Section 2 
(http://tools.ietf.org/html/rfc5893#section-2; there the 6 rules given 
are alltogether called a 'rule' (singular)).

This will be less restrictive while still having the necessary guarantees.

>      3. The characters of a registered domain MUST match the Unicode bidi class
> of the TLD if the TLD is an RTL-TLD.

With "registered domain", do you always mean second-level domain? Or do 
you mean third-level domain e.g. in cases such as bristol.ac.uk ?

Is this similar in nature to the range of labels that some modern 
browsers (*) mark in black (whereas the rest of the URI/IRI is in gray)?
(*): Firefox, Opera, and IE8; Chrome has the whole domain name black, 
whereas Safari doesn't gray out path or scheme.

>      4. if the characters of a registered domain contain more than one bidi
> class, the domain MUST be registered to an LTR-TLD.

This would create an inherent asymmetry. What's the reason for this? Is 
it that such domains already exist, and can't be prohibited anymore? Is 
it that because IRIs are supposed to be displayed in an LTR context, 
this creates less problems? Or something else?


> The restriction of MUST guarantees that the registered domain and its
> corresponding TLD will always appear together and in the same order in all
> possible IRIs. There may be cases where numbers and bidi neutral characters may
> be reordered by the Unicode bidi algorithm in a way that changes their visual
> position relative to the TLD. The above rules prevent such cases. If the domain
> registrar needs to register a name that contains characters that are mixed
> direction (e.g. contains numbers, punctuation or LTR characters) then the domain
> can still be registered with a TLD that has left to right characters.
>
> Examples:
>
> A. This is a good case - the TLD is visually followed by the domain:
>
> Logical representation: http://ABC.DEF.GHI/jkl
> Visual representation: http://IHG.FED.CBA/jkl
>
> B. With an LTR second level domain there is a sub-optimal case where the path
> appears next to the sub-domain. But in this case it is still clear where the TLD
> and registered domain are in the IRI:
>
> Logical representation: http://abc.DEF.GHI/JKL
> Visual representation: http://abc.LKJ/IHG.FED

The two examples differ in the first label in the domain name and in the 
path component. I think it would be good to (also) have examples that 
only differed in the label we are talking about.


Regards,   Martin.

Received on Sunday, 6 November 2011 10:59:50 UTC