- From: Matitiahu Allouche <matial@il.ibm.com>
- Date: Mon, 8 Sep 2003 12:20:18 +0300
- To: Roy Badami <roy@gnomon.org.uk>
- Cc: ietf-imaa@imc.org, public-iri@w3.org
According to my understanding, and to testing against the Unicode C
reference implementation, you are correct in stating that the 2 strings ("A-123,456B" and "A456,-123B") will give the same display according to the Unicode algorithm for
Bidirectional text.
It proves that you have a more creative mind than the people who proposed
the limitations for Bidi names in IRIs, at least more than mine.
You will admit that your example is more than a little contrived. The
limitations set on IRIs intend to avoid ambiguity when converting from the
display order to the logical order (which in this case is not achieved,
although the vast majority of users would assume form A-123,456B, because the other form with the comma adjacent to a minus sign makes
little sense in a domain name). But those limitations were also designed
not to restrict too much the potential of creating interesting domain
names, so a compromise had to be achieved. I can find other examples of
names allowed by the rules which can mislead users trying to induce the
logical order based on the display order. All of these examples are quite
bizarre.
By the way, can you give a reference to "UseSTD13ASCIIRules", for an ignoramus like myself?
Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
554160
Sent by: public-iri-request@w3.org
To: ietf-imaa@imc.org, public-iri@w3.org
cc:
Subject: Bidi: is stringprep broken?
I wrote:
> > Ergo, we need another display model; this one doesn't work
> There are also other real nasties with this display model:
Worse than that, I think the bidi restrictions in stringprep don't
actually achieve their goal of ensuring that you can't have two
different labels that render the same.
Consider the labels:
A-123,456B
and
A456,-123B
Here, A is HEBREW LETTER ALEF, B is HEBREW LETTER BET (or any
characters of bidi class R that you like, but *not* arabic letters,
which are class AL) and the comma is actually ARABIC COMMA U+060C (or
any character of class CS or ES).
As far as I can tell these both pass nameprep with UseSTD13ASCIIRules
set, and they both render identically under bidi as:
B-123,456A
If you don't care about UseSTD13ASCIIRules, you can replace
ARABIC COMMA with COMMA, SOLIDUS or COLON.
I fully expect someone to reply explaining why I'm mistaken, but I've
checked the above as best I can...
-roy
Received on Monday, 8 September 2003 05:21:59 UTC