W3C home > Mailing lists > Public > uri@w3.org > December 2007

Re: IRI Templates and Bidi Characters

From: James M Snell <jasnell@gmail.com>
Date: Sun, 02 Dec 2007 11:00:37 -0800
Message-ID: <475300D5.4000705@gmail.com>
To: Brian Smith <brian@briansmith.org>
CC: 'URI' <uri@w3.org>

Brian Smith wrote:
> [snip]
> This all seems to make it difficult to create a valid BIDI IRI template 
> using a regular text editor (one that doesn't know anything about IRI 
> templates). In particular, the implicit LTR overrides mean that the editor 
> of a BIDI IRI template will see something differently from how the processor 
> processes it, right? The only way the editor can present the template correctly 
> is if it inserts explicit overrides. If explicit overrides are necessary anyway 
> for accurate editing, why are implicit overrides needed?
> 

Yes, it does make it more difficult and the explicit overrides are
required for the editor to render the template correctly.  The implicit
overrides are needed in case the explicit overrides are not provided.

> I don't know a lot about BIDI, but I would think it would be a lot simpler to use 
> the exact same rules as IRIs, remove all implicit overrides, suggest when explicit 
> overrides should be provided, and specify when and how overrides are inserted/coalesced 
> during the substitution phase. Is there a reason that wouldn't work?
> 

The rules specified by RFC3987 are not sufficient as they lead to some
rather unfortunate visual effects in templates that contain a mix of RTL
and LTR characters.

For instance,

A template with logical Order:

  {-join|ABCD|EF,g=HI}JK/lmn

Comes out rendered as:

  {-join|FE|DCBA,g=KJ{IH/lmn

Things aren't much better if we wrap the {...} token in LRE/PDF

  KJ{-join|FE|DCBA,g=KJ}/lmn

The main difficulty here is the mixture of LTR and RTL characters --
which is specifically why rfc3987 indicates that components SHOULD NOT
mix LTR/RTL characters.  With {...} tokens, however, it is impossible to
avoid mixing characters so we have to jump through some hoops to get
things to render properly.

Regardless of any of this, explicit bidi formatting codes have to be
stripped from the template prior to processig so that part is already
covered :-)

Also, there is another issue with the {...} that I realized last night.

To illustrate by example:

L:  http://ab.CD.EF/{-prefix|~|XY}GH/IJ/kl.html
V:  http://ab.FE.DC/JI/HG{-prefix|~|YX}/kl.html

The problem with this is that it is ambiguous.  The following template
renders exactly the same way:

L:  http://ab.CD.EF/GH/IJ{-prefix|~|XY}/kl.html
V:  http://ab.FE.DC/JI/HG{-prefix|~|YX}/kl.html

There is absolutely no way of differentiating the two templates
visually;  neither the rules I provided or the rules for bidi IRI's can
help to resolve the ambiguity.  The only way I can see to resolve the
ambiguity is the require that IRI Templates always be rendered in
logical order, while still allowing the varname to be rendered in visual
order, in which case we end up with:

L:  http://ab.CD.EF/{-prefix|~|XY}GH/IJ/kl.html
V:  http://ab.CD.EF/{-prefix|~|YX}GH/IJ/kl.html

and

L:  http://ab.CD.EF/GH/IJ{-prefix|~|XY}/kl.html
V:  http://ab.CD.EF/GH/IJ{-prefix|~|YX}/kl.html

Basically, the rendering rules would be:

  <LRO>http://ab.CD.EF/GH/IJ{-prefix|~|<LRE>XY<PDF>}/kl.html<PDF>

If a default was specified for a var, it would be:

  <LRO>http://ab.CD.EF/GH/IJ{-prefix|~|<LRE>XY<PDF>=abc}/kl.html<PDF>

This would seem to be the only way of ensuring that the template is
rendered with zero ambiguity.

- James

> Regards,
> Brian
> 
> 
Received on Sunday, 2 December 2007 19:00:57 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:11 UTC