RE: IRI Templates and Bidi Characters from Brian Smith on 2007-12-03 (uri@w3.org from December 2007)

From: Brian Smith <brian@briansmith.org>
Date: Tue, 4 Dec 2007 00:50:19 +0700
To: "'URI'" <uri@w3.org>
Message-ID: <014d01c835d4$fab369a0$6601a8c0@Junk>

James M Snell wrote:
> 
> I'm fine with this but the spec needs to at least state how 
> and where the bidi formatting codes should be used in order 
> to display the template properly so users don't have to 
> guess.  It also needs to be pointed out explicitly in the 
> spec that typing out a bidi template in logical order without 
> the formatting codes will have surprising and often ambiguous 
> results.

Note that this problem is not really limited to BIDI. For example, it is difficult to tell the difference between
 ブライアン
and:
 ブラｲアン
visually, with many (most) fonts. (One has a half-width katakana "I" and the other one has a full-width katakana "I"). There are also problems where a very common Kanji (Chinese-origin character) looks very similar to a rare Kanji, and so the rare Kanji is likely to be mistakenly recognized as the common one, even among native readers. Even in English, it is hard to tell the diference between http://mmmmmnmnmnmnmnmnmnmnmnmnmn.com and http://mmmmmnmnmnmmmnmnmnmnmnmnmn.com. When composed characters and unnormalized IRIs and IRI templates are added to the equation, there will be many cases where two IRIs or two IRI templates will look literally identical when printed, even though they are actually different logically. 

People who design IRIs and IRI templates need to factor readability into the design. And, when somebody stores or prints an IRI or IRI template, they need to do so in a way that results in as little ambiguity as possible, regardless of the languages used.

- Brian

Received on Monday, 3 December 2007 17:50:33 UTC