Re: Bidi: is stringprep broken?

According to my understanding, and to testing against the Unicode C 
reference implementation, you are correct in stating that the 2 strings ("A-123,456B" and "A456,-123B") will give the same display according to the Unicode algorithm for 
Bidirectional text.

It proves that you have a more creative mind than the people who proposed 
the limitations for Bidi names in IRIs, at least more than mine.

You will admit that your example is more than a little contrived.  The 
limitations set on IRIs intend to avoid ambiguity when converting from the 
display order to the logical order (which in this case is not achieved, 
although the vast majority of users would assume form A-123,456B, because the other form with the comma adjacent to a minus sign makes 
little sense in a domain name).  But those limitations were also designed 
not to restrict too much the potential of creating interesting domain 
names, so a compromise had to be achieved.  I  can find other examples of 
names allowed by the rules which can mislead users trying to induce the 
logical order based on the display order.  All of these examples are quite 
bizarre.

By the way, can you give a reference to "UseSTD13ASCIIRules", for an ignoramus like myself?

Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52 
554160


Sent by:        public-iri-request@w3.org
To:        ietf-imaa@imc.org, public-iri@w3.org
cc:         
Subject:        Bidi: is stringprep broken?



I wrote:

 >  > Ergo, we need another display model; this one doesn't work 
 > There are also other real nasties with this display model:

Worse than that, I think the bidi restrictions in stringprep don't
actually achieve their goal of ensuring that you can't have two
different labels that render the same.

Consider the labels:

                 A-123,456B

and

                 A456,-123B

Here, A is HEBREW LETTER ALEF, B is HEBREW LETTER BET (or any
characters of bidi class R that you like, but *not* arabic letters,
which are class AL) and the comma is actually ARABIC COMMA U+060C (or
any character of class CS or ES).

As far as I can tell these both pass nameprep with UseSTD13ASCIIRules
set, and they both render identically under bidi as:

                 B-123,456A

If you don't care about UseSTD13ASCIIRules, you can replace
ARABIC COMMA with COMMA, SOLIDUS or COLON.

I fully expect someone to reply explaining why I'm mistaken, but I've
checked the above as best I can...

                 -roy

Received on Monday, 8 September 2003 05:21:59 UTC