Re: Bidi: is stringprep broken? from Roy Badami on 2003-09-08 (public-iri@w3.org from September 2003)

From: Roy Badami <roy@gnomon.org.uk>
Date: Mon, 8 Sep 2003 11:26:29 +0100
To: "Matitiahu Allouche" <matial@il.ibm.com>
Cc: Roy Badami <roy@gnomon.org.uk>, ietf-imaa@imc.org, public-iri@w3.org
Message-ID: <16220.22869.152563.294583@moriarty.gnomon.org.uk>

Matitiahu Allouche writes:
 > According to my understanding, and to testing against the Unicode C 
 > reference implementation, you are correct in stating that the 2 strings ("A-123,456B" and "A456,-123B") will give the same display according to the Unicode algorithm for 
 > Bidirectional text.

Thanks for verifying it.  Though it's still possible that I'm mistaken
about it passing nameprep.

 > You will admit that your example is more than a little contrived. 

Yes, and it's probably unlikely ever to be registerable, since it
involves punctuation (and not only that, but punctuation associated
with the wrong script).

My other example (that ARAB.123.com and 123.ARAB.com render the same)
worries me more.

 > I can find other examples of names allowed by the rules which can
 > mislead users trying to induce the logical order based on the
 > display order.  All of these examples are quite bizarre.

I'd be interested in the examples you have.

 > By the way, can you give a reference to "UseSTD13ASCIIRules", for
 > an ignoramus like myself?

RFC3490.  When the UseSTD13ASCIIRules flag is set, ASCII characters
other than alphanumerics and HYPHEN-MINUS are prohibited, in
accordance with traditional hostname rules.  Hence my use of ARABIC
COMMA; I needed a character that was a number separator, was
non-ASCII, and didn't have a compatibility decomposition (since IDNA
uses NFKC).

	-roy

Received on Monday, 8 September 2003 06:26:42 UTC