W3C home > Mailing lists > Public > uri@w3.org > December 2007

RE: IRI Templates and Bidi Characters

From: Brian Smith <brian@briansmith.org>
Date: Tue, 4 Dec 2007 00:50:19 +0700
To: "'URI'" <uri@w3.org>
Message-ID: <014d01c835d4$fab369a0$6601a8c0@Junk>

James M Snell wrote:
> 
> I'm fine with this but the spec needs to at least state how 
> and where the bidi formatting codes should be used in order 
> to display the template properly so users don't have to 
> guess.  It also needs to be pointed out explicitly in the 
> spec that typing out a bidi template in logical order without 
> the formatting codes will have surprising and often ambiguous 
> results.

Note that this problem is not really limited to BIDI. For example, it is difficult to tell the difference between
	ブライアン
and:
	ブライアン
visually, with many (most) fonts. (One has a half-width katakana "I" and the other one has a full-width katakana "I"). There are also problems where a very common Kanji (Chinese-origin character) looks very similar to a rare Kanji, and so the rare Kanji is likely to be mistakenly recognized as the common one, even among native readers. Even in English, it is hard to tell the diference between http://mmmmmnmnmnmnmnmnmnmnmnmnmn.com and http://mmmmmnmnmnmmmnmnmnmnmnmnmn.com. When composed characters and unnormalized IRIs and IRI templates are added to the equation, there will be many cases where two IRIs or two IRI templates will look literally identical when printed, even though they are actually different logically. 

People who design IRIs and IRI templates need to factor readability into the design. And, when somebody stores or prints an IRI or IRI template, they need to do so in a way that results in as little ambiguity as possible, regardless of the languages used.

- Brian
Received on Monday, 3 December 2007 17:50:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:11 UTC