- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Thu, 04 Mar 2010 17:23:22 +0900
- To: Shawn Steele <Shawn.Steele@microsoft.com>
- CC: Slim Amamou <slim@alixsys.com>, Larry Masinter <LMM@acm.org>, "public-iri@w3.org" <public-iri@w3.org>, Peter Constable <petercon@microsoft.com>, "(unicode@unicode.org)" <unicode@unicode.org>
On 2010/03/04 2:12, Shawn Steele wrote: >> An IRI is a sequence of Unicode characters. Is there not >> already a well-defined way of converting a sequence of >> Unicode characters to a visual display? > > The problem (from my perspective at least) is that the Unicode BIDI rules are somewhat "generic". Yes indeed. It would be nice if we could add support for more and more stuff with arbitrary complexity to the Unicode bidi algorithm, but I don't see how that could be deployed. > Unicode expects things like / and . to be used in a context of same-script stuff, like a date, time or number. > IRIs use them as delimiters for a list of elements (labels in the domain name or folders in the path), in a hierarchical form. > The Unicode BIDI algorithm doesn't recognize that there's an underlying hierarchy, so it can end up "swapping" pieces in that hierarchy in some cases. There's of course a lot of hierarchy, but what's more inherent and basic is sequence. The URI spec defines the order of the various components, the hierarchy is more in people's heads than anywhere else. > I'm not sure UTR#36 is the proper place I fully agree that UTR#36 is NOT the right place for putting what's currently in section 4 of the IRI WG draft (http://tools.ietf.org/html/draft-ietf-iri-3987bis-00#section-4). In some sense, this would be equivalent to the IRI spec only saying that IRIs are composed of domain names, path components, query parts,..., and then saying: Look over there for how to order them on a napkin or on the side of the bus (or on a display). UTR#36 already has a good section on 2.5 Bidirectional Text Spoofing (http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing), which currently does exactly the right thing, namely say that bidi display of IDNs and IRIs is, among else, also a security issue. [off-topic: 2.5.1 in UTR#36 doesn't belong in 2.5, but should be its own subsection; there is only minor overlap in that Arabic is affected by both bidi and complex shaping.] > to clarify display of such ordered lists. Ok, you got from hierarchy to ordered list, which I think is exactly what I called 'sequence' above. > Proper BIDI rendering of IRIs isn't just a security, but also a usability, problem. Very much so. There are two levels here: - Interoperability as usability: If there isn't a single, well-defined, consistent logic <-> visual mapping for IRIs, they are not usable at all. - Immediate human usability: It should be possible for humans to build an easily understandable and actionable mental model (or use an existing mental model that they already have) for bidi IRIs and their visual ordering. Regards, Martin. > It does seem like perhaps this concept should be mentioned in Unicode somewhere. (IRIs aren't the only place that similar ordered lists happen). > > -Shawn > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 4 March 2010 08:24:10 UTC