- From: Matitiahu Allouche <matial@il.ibm.com>
- Date: Mon, 8 Sep 2003 12:20:18 +0300
- To: Roy Badami <roy@gnomon.org.uk>
- Cc: ietf-imaa@imc.org, public-iri@w3.org
According to my understanding, and to testing against the Unicode C reference implementation, you are correct in stating that the 2 strings ("A-123,456B" and "A456,-123B") will give the same display according to the Unicode algorithm for Bidirectional text. It proves that you have a more creative mind than the people who proposed the limitations for Bidi names in IRIs, at least more than mine. You will admit that your example is more than a little contrived. The limitations set on IRIs intend to avoid ambiguity when converting from the display order to the logical order (which in this case is not achieved, although the vast majority of users would assume form A-123,456B, because the other form with the comma adjacent to a minus sign makes little sense in a domain name). But those limitations were also designed not to restrict too much the potential of creating interesting domain names, so a compromise had to be achieved. I can find other examples of names allowed by the rules which can mislead users trying to induce the logical order based on the display order. All of these examples are quite bizarre. By the way, can you give a reference to "UseSTD13ASCIIRules", for an ignoramus like myself? Shalom (Regards), Mati Bidi Architect Globalization Center Of Competency - Bidirectional Scripts IBM Israel Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52 554160 Sent by: public-iri-request@w3.org To: ietf-imaa@imc.org, public-iri@w3.org cc: Subject: Bidi: is stringprep broken? I wrote: > > Ergo, we need another display model; this one doesn't work > There are also other real nasties with this display model: Worse than that, I think the bidi restrictions in stringprep don't actually achieve their goal of ensuring that you can't have two different labels that render the same. Consider the labels: A-123,456B and A456,-123B Here, A is HEBREW LETTER ALEF, B is HEBREW LETTER BET (or any characters of bidi class R that you like, but *not* arabic letters, which are class AL) and the comma is actually ARABIC COMMA U+060C (or any character of class CS or ES). As far as I can tell these both pass nameprep with UseSTD13ASCIIRules set, and they both render identically under bidi as: B-123,456A If you don't care about UseSTD13ASCIIRules, you can replace ARABIC COMMA with COMMA, SOLIDUS or COLON. I fully expect someone to reply explaining why I'm mistaken, but I've checked the above as best I can... -roy
Received on Monday, 8 September 2003 05:21:59 UTC