W3C home > Mailing lists > Public > public-iri@w3.org > May 2012

comments on draft-ietf-iri-bidi-guidelines [forwarded by moderator]

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 08 May 2012 08:45:25 +0900
Message-ID: <4FA85E95.8000908@it.aoyama.ac.jp>
To: "public-iri@w3.org" <public-iri@w3.org>
This post refers to the document at 
http://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines-02 .

I have a number of comments on specific clauses in the document, but it 
is more urgent to agree or disagree on the general principles on which 
the document is based.

A. First of all, it should be agreed about whose problems the document 
is supposed to solve. This is not stated in the document, but I see 3 
classes of "users":

- Site administrators who create IRIs

- Consumers who see IRIs in print (on paper, on bus sides, etc...) or on 

- Implementers who have to implement the rules.

The main requirements stated in the document are:

1. user-predictable conversion between visual and logical 
representation; 2. the ability to include a wide range of characters in 
various parts of the IRI; and 3. minor or no changes or restrictions for 

The first requirement is for the benefit of consumers, the second one 
for administrators, and the third one for implementers.

If I was to set the priorities, I would say that the first concern is 
for consumers reading IRIs on paper or bus side, then for consumers 
seeing IRIs on screen, with the requirement that IRIs should appear 
identically on paper and everywhere on screen, whether in a browser or 
in an application where they can be part of plain text.

The current document does not satisfy completely its own first 
requirement, since the visual IRI "http://abc.123.FED" can be 
interpreted equally reasonably as the logical IRI "http://abc.123.DEF" 
or "http://abc.DEF.123".

It does not satisfy the third requirement either, since it states that 
IRIs must be rendered as if within a LTR embedding, which is a kind of 
special treatment.

B. The document seems to hesitate between handling IRIs with the UBA 
transparently for the application (i.e. the application does not have to 
do anything special for displaying IRIs) and special handling.

On one hand, it says "Bidirectional IRIs MUST be rendered by using the 
Unicode Bidirectional Algorithm", so it seeks transparency. On the other 
hand, it says "Bidirectional IRIs MUST be rendered in the same way as 
they would be if they were in a left-to-right embedding; i.e., as if 
they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and 
followed by U+202C, POP DIRECTIONAL FORMATTING (PDF).", which means 
special handling.

Another paragraph states: 
"To make sure that it does not affect the rendering of bidirectional 
IRIs too much, some restrictions on bidirectional IRIs are necessary. 
These restrictions are given in terms of delimiters (structural 
characters, mostly punctuation such as "@", ".", ":", and "/") and 
components (usually consisting mostly of letters and digits)."

The document does not specify what the announced restrictions are (and 
the reference to RFC3987bis does not clarify anything, for me at least).

My guess is that the authors are in favor of some special handling that 
would prevent interference between components (what appears between 
delimiters), but this is not detailed, and of course that would harm the 
transparency requirement.

In fact, what is sorely missing is a precise definition of how an IRI 
with domain, path, fragment and query all potentially including RTL 
characters should be displayed. The problem is that currently there is 
no consensus on that matter. Since the target is not clearly painted, 
the arrow does not know where to go.

C. So I see 2 possible venues:

1) IRIs are handled transparently. This is ideal for implementers. Then 
some more restrictions should be placed on IRI creators to make sure 
that the IRI on bus side can be interpreted unambiguously. The 
restrictions may not be enforceable for path and query, but this is not 
critical, since the IRI on bus side will typically be short and not 
include these parts. IRIs on screen can hopefully be clicked on, or 
copied and pasted into the address line of a browser, and will not be 
typed manually.

2) IRIs are handled specially. This allows displaying IRIs according to 
any rules will be agreed upon, including separating the components in 
path, fragment and query parts. This puts a burden on implementers who 
must identify IRIs within plain text, but many applications already do 
this in order to allow clicking on IRIs. The difficult part here will be 
to get a consensus on how to display mixed LTR/RTL IRIs.

I think that the discussion above should be resolved before commenting 
on finer points of the document.

Shalom (Regards),  Mati
Received on Monday, 7 May 2012 23:46:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:44 UTC