Re: Bidi IRI with a Bidi TLD

Hello Adil,

Many thanks for your very instructive explanations.

On 2011/11/07 10:07, Adil Allawi wrote:
>    Martin, to answer your points...
>
> 1. The rules you refer to (http://tools.ietf.org/html/rfc5893#section-2;)
> already have a problem as it allows characters of class L.

Well, yes, that's because they are written in a general way. What we 
would say is that if there's a TLD with an RTL label, then the 
"registered domain" (and everything in between that and the TLD, I 
assume) also to be an RTL label, where RTL label is defined in point 
(1.) of http://tools.ietf.org/html/rfc5893#section-2.

> My suggestion is to make my proposed rules as restrictive as possible and only
> give way where absolutely necessary. Ultimately we must convey these rules to
> ISPs, registrars and ordinary people registering domain names. For this to work
> the rule should very simple to understand without going through the complexity
> of explaining bidi classes.

I understand that with respect to users, not necessarily with respect to 
registrars (there are many other details these have to deal with). And 
registries could always introduce additional restrictions. I don't 
expect, just as an example, that most Arabic TLDs would accept Hebrew 
second-level domain names.

> 2. By "registered domain" I mean the name that you would go to a registrar to
> have registered. After a long search of the internet I found no consistent way
> to refer to parts of a domain name. So the most complex case I have seen is for
> a British school. e.g.:
>
> http://www.kingsdale.southwark.sch.uk/
>
> In this case I would view the '/registered domain/'  as "kingsdale" and
> "southwark.sch.uk" is the /sub-domain/. I would restrict all of these to the
> same bi-di class rule as these should always appear in the same order.

Okay, that helps a lot as an example to understand the motivation for 
the proposal.

> 3. Regarding rule 4 "if the characters of a registered domain contain more than
> one bidi class, the domain MUST be registered to an LTR-TLD".
>
> In this case I suggest that if the domain needs to have mixed bi-di characters
> in it, then it should be registered to ".com.eg" not ".مصر" So:
>
> Adil-علاوي.com.eg is allowed
>
> Adil-علاوي.مصر is not allowed

Well, but the label Adil-علاوي is not allowed in the first place, unless 
we change RFC 5893, which is not on our charter. Is that a problem?

Regards,   Martin.

> regards
>
> Adil
>
> On 06/11/2011 10:59, "Martin J. Dürst" wrote:
>>  Hello Adil,
>>
>>  Many thanks for your proposal!
>>
>>  On 2011/11/06 6:23, Adil Allawi wrote:
>>>     Dear all,
>>>
>>>  Following is myt suggestion for a new section in the bidi iri document. This is
>>>  to improve useability. The main point of this proposal is that the domain and
>>>  TLD always appear together in a URL so that a user can read, enter, highlight
>>>  and copy it. Also, so that a user looking at a bi-di URL will always recognize
>>>  the domain part.
>>>
>>>  The restriction I propose is only for the main domain eg. the "google" in
>>>  "google.com" not the subdomain e.g. the "translate" in "translate.google.com".
>>>  That is name registered with the domain registrar.
>>>
>>>  Adil
>>>
>>>  ------
>>>
>>>  *Restrictions on domain names for Top Level Domains (TLDs)*
>>>
>>>  *Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are top-level
>>>  domains that are in languages using right-to-left characters. Namely the Unicode
>>>  bidi class of the characters that make up the TLD is either R or AL (see UAX 9).
>>>
>>>  As an IRI must always be rendered left-to-right (see section 2) there exists a
>>>  number of cases where an RTL-TLD will render in a way that is visually unclear
>>>  what the TLD is in a particular URL. For example:
>>>
>>>  Logical representation: http://abc.def.GHI/JKL
>>>  Visual representation: http://abc.def.LKJ/IHG
>>>
>>>  In the above case the path appears after the registered domain and is in the
>>>  visual location of the TLD. This can confuse the reader as to which is the
>>>  actual TLD. In order to restrict such confusing cases the following rules will
>>>  apply:
>>>
>>>       1. An RTL-TLD is a TLD which is in a language where the characters draw
>>>  right to left. An LTR-TLD is a TLD which is in a language where the characters
>>>  draw left to right.
>>>       2. The characters in an RTL-TLD MUST always be of the same Unicode bidi
>>>  class.
>>
>>  I think that for the two rules above, we could instead refer to the rules in
>>  RFC 5893, Section 2 (http://tools.ietf.org/html/rfc5893#section-2; there the 6
>>  rules given are alltogether called a 'rule' (singular)).
>>
>>  This will be less restrictive while still having the necessary guarantees.
>>
>>>       3. The characters of a registered domain MUST match the Unicode bidi class
>>>  of the TLD if the TLD is an RTL-TLD.
>>
>>  With "registered domain", do you always mean second-level domain? Or do you
>>  mean third-level domain e.g. in cases such as bristol.ac.uk ?
>>
>>  Is this similar in nature to the range of labels that some modern browsers (*)
>>  mark in black (whereas the rest of the URI/IRI is in gray)?
>>  (*): Firefox, Opera, and IE8; Chrome has the whole domain name black, whereas
>>  Safari doesn't gray out path or scheme.
>>
>>>       4. if the characters of a registered domain contain more than one bidi
>>>  class, the domain MUST be registered to an LTR-TLD.
>>
>>  This would create an inherent asymmetry. What's the reason for this? Is it
>>  that such domains already exist, and can't be prohibited anymore? Is it that
>>  because IRIs are supposed to be displayed in an LTR context, this creates less
>>  problems? Or something else?
>>
>>
>>>  The restriction of MUST guarantees that the registered domain and its
>>>  corresponding TLD will always appear together and in the same order in all
>>>  possible IRIs. There may be cases where numbers and bidi neutral characters may
>>>  be reordered by the Unicode bidi algorithm in a way that changes their visual
>>>  position relative to the TLD. The above rules prevent such cases. If the domain
>>>  registrar needs to register a name that contains characters that are mixed
>>>  direction (e.g. contains numbers, punctuation or LTR characters) then the domain
>>>  can still be registered with a TLD that has left to right characters.
>>>
>>>  Examples:
>>>
>>>  A. This is a good case - the TLD is visually followed by the domain:
>>>
>>>  Logical representation: http://ABC.DEF.GHI/jkl
>>>  Visual representation: http://IHG.FED.CBA/jkl
>>>
>>>  B. With an LTR second level domain there is a sub-optimal case where the path
>>>  appears next to the sub-domain. But in this case it is still clear where the TLD
>>>  and registered domain are in the IRI:
>>>
>>>  Logical representation: http://abc.DEF.GHI/JKL
>>>  Visual representation: http://abc.LKJ/IHG.FED
>>
>>  The two examples differ in the first label in the domain name and in the path
>>  component. I think it would be good to (also) have examples that only differed
>>  in the label we are talking about.
>>
>>
>>  Regards,   Martin.
>>
>>

Received on Monday, 7 November 2011 08:43:12 UTC