- From: Matitiahu Allouche <matitiahu.allouche@gmail.com>
- Date: Wed, 18 Apr 2012 16:19:52 +0300
- To: "'Richard Ishida'" <ishida@w3.org>, <public-i18n-core@w3.org>
Please note that the latest version of the document is 02 (as listed in the subject), and not 00 as referenced by Richard. I have a number of comments on specific clauses in the document, but it is more urgent to agree or disagree on the general principles on which the document is based. A. First of all, we should agree on whose problems the document is supposed to solve. This is not stated in the document, but I see 3 classes of "users": - Site administrators who create IRIs - Consumers who see IRIs in print (on paper, on bus sides, etc...) or on screen - Implementers who have to implement the rules. The main requirements stated in the document are: 1. user-predictable conversion between visual and logical representation; 2. the ability to include a wide range of characters in various parts of the IRI; and 3. minor or no changes or restrictions for implementations. The first requirement is for the benefit of consumers, the second one for administrators, the third one for implementers. If I was to set the priorities, I would say that the first concern is for consumers reading IRIs on paper or bus side, then for consumers seeing IRIs on screen, with the requirement that IRIs should appear identically on paper and everywhere on screen, whether in a browser or in an application where they can be part of plain text. The current document does not satisfy completely its own first requirement, since the visual IRI "http://abc.123.FED" can be interpreted equally reasonably as the logical IRI "http://abc.123.DEF" or "http://abc.DEF.123". It does not satisfy the third requirement either, since it states that IRIs must be rendered as in within a LTR embedding, which is a kind of special treatment. B. The document seems to hesitate between handling IRIs with the UBA transparently for the application (i.e. the application does not have to do anything special for displaying IRIs) and special handling. On one hand, it says "Bidirectional IRIs MUST be rendered by using the Unicode Bidirectional Algorithm", so it seeks transparency. On the other hand, it says "Bidirectional IRIs MUST be rendered in the same way as they would be if they were in a left-to-right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL FORMATTING (PDF).", which means special handling. Another paragraph states: "To make sure that it does not affect the rendering of bidirectional IRIs too much, some restrictions on bidirectional IRIs are necessary. These restrictions are given in terms of delimiters (structural characters, mostly punctuation such as "@", ".", ":", and "/") and components (usually consisting mostly of letters and digits)." The document does not specify what are the announced restrictions (and the reference to RFC3987bis does not clarify anything, for me at least). My guess is that the authors are in favor of some special handling that would prevent interference between components (what appears between delimiters), but this is not detailed, and of course that would harm the transparency requirement. In fact, what is sorely missing is a precise definition of how an IRI with domain, path, fragment and query all potentially including RTL characters should be displayed. The problem is that currently there is no consensus on that matter. Since the target is not clearly painted, the arrow does not know where to go. C. So I see 2 possible venues: 1) IRIs are handled transparently. This is ideal for implementers. Then some more restrictions should be placed on IRIs creators to make sure that the IRI on bus side can be interpreted unambiguously. The restrictions may not be enforceable for path and query, but this is not critical, since the IRI on bus side will typically be short and not include these parts. IRIs on screen can hopefully be clicked on, or copied and pasted into the address line of a browser, and will not be typed manually. 2) IRIs are handled specially. This allows displaying IRIs according to any rules will be agreed upon, including separating the components in path, fragment and query parts. This puts a burden on implementers who must identify IRIs within plain text, but many applications already do this in order to allow clicking on IRIs. The difficult part here will be to get a consensus on how to display mixed LTR/RTL IRIs. I think that the discussion above should be resolved before commenting on finer points of the document. Shalom (Regards), Mati
Received on Thursday, 19 April 2012 20:41:24 UTC