- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Mon, 07 Nov 2011 17:42:23 +0900
- To: Adil Allawi <adil@diwan.com>
- CC: "public-iri@w3.org" <public-iri@w3.org>
Hello Adil, Many thanks for your very instructive explanations. On 2011/11/07 10:07, Adil Allawi wrote: > Martin, to answer your points... > > 1. The rules you refer to (http://tools.ietf.org/html/rfc5893#section-2;) > already have a problem as it allows characters of class L. Well, yes, that's because they are written in a general way. What we would say is that if there's a TLD with an RTL label, then the "registered domain" (and everything in between that and the TLD, I assume) also to be an RTL label, where RTL label is defined in point (1.) of http://tools.ietf.org/html/rfc5893#section-2. > My suggestion is to make my proposed rules as restrictive as possible and only > give way where absolutely necessary. Ultimately we must convey these rules to > ISPs, registrars and ordinary people registering domain names. For this to work > the rule should very simple to understand without going through the complexity > of explaining bidi classes. I understand that with respect to users, not necessarily with respect to registrars (there are many other details these have to deal with). And registries could always introduce additional restrictions. I don't expect, just as an example, that most Arabic TLDs would accept Hebrew second-level domain names. > 2. By "registered domain" I mean the name that you would go to a registrar to > have registered. After a long search of the internet I found no consistent way > to refer to parts of a domain name. So the most complex case I have seen is for > a British school. e.g.: > > http://www.kingsdale.southwark.sch.uk/ > > In this case I would view the '/registered domain/' as "kingsdale" and > "southwark.sch.uk" is the /sub-domain/. I would restrict all of these to the > same bi-di class rule as these should always appear in the same order. Okay, that helps a lot as an example to understand the motivation for the proposal. > 3. Regarding rule 4 "if the characters of a registered domain contain more than > one bidi class, the domain MUST be registered to an LTR-TLD". > > In this case I suggest that if the domain needs to have mixed bi-di characters > in it, then it should be registered to ".com.eg" not ".مصر" So: > > Adil-علاوي.com.eg is allowed > > Adil-علاوي.مصر is not allowed Well, but the label Adil-علاوي is not allowed in the first place, unless we change RFC 5893, which is not on our charter. Is that a problem? Regards, Martin. > regards > > Adil > > On 06/11/2011 10:59, "Martin J. Dürst" wrote: >> Hello Adil, >> >> Many thanks for your proposal! >> >> On 2011/11/06 6:23, Adil Allawi wrote: >>> Dear all, >>> >>> Following is myt suggestion for a new section in the bidi iri document. This is >>> to improve useability. The main point of this proposal is that the domain and >>> TLD always appear together in a URL so that a user can read, enter, highlight >>> and copy it. Also, so that a user looking at a bi-di URL will always recognize >>> the domain part. >>> >>> The restriction I propose is only for the main domain eg. the "google" in >>> "google.com" not the subdomain e.g. the "translate" in "translate.google.com". >>> That is name registered with the domain registrar. >>> >>> Adil >>> >>> ------ >>> >>> *Restrictions on domain names for Top Level Domains (TLDs)* >>> >>> *Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are top-level >>> domains that are in languages using right-to-left characters. Namely the Unicode >>> bidi class of the characters that make up the TLD is either R or AL (see UAX 9). >>> >>> As an IRI must always be rendered left-to-right (see section 2) there exists a >>> number of cases where an RTL-TLD will render in a way that is visually unclear >>> what the TLD is in a particular URL. For example: >>> >>> Logical representation: http://abc.def.GHI/JKL >>> Visual representation: http://abc.def.LKJ/IHG >>> >>> In the above case the path appears after the registered domain and is in the >>> visual location of the TLD. This can confuse the reader as to which is the >>> actual TLD. In order to restrict such confusing cases the following rules will >>> apply: >>> >>> 1. An RTL-TLD is a TLD which is in a language where the characters draw >>> right to left. An LTR-TLD is a TLD which is in a language where the characters >>> draw left to right. >>> 2. The characters in an RTL-TLD MUST always be of the same Unicode bidi >>> class. >> >> I think that for the two rules above, we could instead refer to the rules in >> RFC 5893, Section 2 (http://tools.ietf.org/html/rfc5893#section-2; there the 6 >> rules given are alltogether called a 'rule' (singular)). >> >> This will be less restrictive while still having the necessary guarantees. >> >>> 3. The characters of a registered domain MUST match the Unicode bidi class >>> of the TLD if the TLD is an RTL-TLD. >> >> With "registered domain", do you always mean second-level domain? Or do you >> mean third-level domain e.g. in cases such as bristol.ac.uk ? >> >> Is this similar in nature to the range of labels that some modern browsers (*) >> mark in black (whereas the rest of the URI/IRI is in gray)? >> (*): Firefox, Opera, and IE8; Chrome has the whole domain name black, whereas >> Safari doesn't gray out path or scheme. >> >>> 4. if the characters of a registered domain contain more than one bidi >>> class, the domain MUST be registered to an LTR-TLD. >> >> This would create an inherent asymmetry. What's the reason for this? Is it >> that such domains already exist, and can't be prohibited anymore? Is it that >> because IRIs are supposed to be displayed in an LTR context, this creates less >> problems? Or something else? >> >> >>> The restriction of MUST guarantees that the registered domain and its >>> corresponding TLD will always appear together and in the same order in all >>> possible IRIs. There may be cases where numbers and bidi neutral characters may >>> be reordered by the Unicode bidi algorithm in a way that changes their visual >>> position relative to the TLD. The above rules prevent such cases. If the domain >>> registrar needs to register a name that contains characters that are mixed >>> direction (e.g. contains numbers, punctuation or LTR characters) then the domain >>> can still be registered with a TLD that has left to right characters. >>> >>> Examples: >>> >>> A. This is a good case - the TLD is visually followed by the domain: >>> >>> Logical representation: http://ABC.DEF.GHI/jkl >>> Visual representation: http://IHG.FED.CBA/jkl >>> >>> B. With an LTR second level domain there is a sub-optimal case where the path >>> appears next to the sub-domain. But in this case it is still clear where the TLD >>> and registered domain are in the IRI: >>> >>> Logical representation: http://abc.DEF.GHI/JKL >>> Visual representation: http://abc.LKJ/IHG.FED >> >> The two examples differ in the first label in the domain name and in the path >> component. I think it would be good to (also) have examples that only differed >> in the label we are talking about. >> >> >> Regards, Martin. >> >>
Received on Monday, 7 November 2011 08:43:12 UTC