Re: IRI Templates and Bidi Characters

Brian Smith wrote:
> [snip]
>> 1. IRI Templates MUST be stored and transmitted in logical order.
>> 2. IRI Templates MUST be rendered using the unicode bidi algorithm
>> 7. The IRI Template MAY contain bidi formatting characters necessary
>>    to ensure that the template is properly rendered.  The bidi
>>    formatting characters MAY be stored and transmitted with the
>>    template but the template processor MUST remove all bidi formatting
>>    characters from the template prior to processing.
> 
> I agree with these completely. How about changing "The IRI template MAY 
> contain..." to "It is RECOMMENDED that the IRI Template contain..." and 
> "BIDI overrides SHOULD be preserved as long as possible (until the template 
> is expanded into an IRI)." That way, the author's preferred rendering will 
> be used throughout processing.
> 

I don't think there is a need to make it any stronger than a MAY since
many templates won't have any need for them.  Also, there is actually
very little reason to preserve or even transmit the formatting
characters because they can be restored easily given rules #3 and #4.

>> 5. Variables names SHOULD NOT contain a mix of LTR and RTL characters
>> 6. Variable names containing RTL characters SHOULD start and end with
>>    RTL characters.
> 
> There should not be inconsistency between the grammar and the prose. I 
> think these requirements should be left out, or changed to recommendations, 
> since the grammar allows such names. 

Assuming the varname is explicitly wrapped in LRE/PDF, these can be dropped.

> 
>> 3. IRI Templates SHOULD be rendered as if they were in a Left-to-Right
>>    Override (preceded by U+202D and followed by U+202C). As with IRIs,
>>    there is no need to use the explicit override if the 
>>    template can be displayed properly without it.
>> 4. Template variable names SHOULD be rendered as if they were in a
>>    Left-to-Right embedding (preceded by U+202A and followed by U+202C).
>>    This will ensure that variable names containing RTL characters will
>>    be properly rendered without affecting the ordering of the rest of
>>    the template.  There is no requirement to use the explicit 
>>    embedding if the template can be displayed properly without it.
> 
> I think these should be changed to recommendations for authors of IRI templates 
> to explicitly include these overrides in their templates using the recommended 
> mechanism for the embedding document (markup or override characters). By 
> rules #1, #2, and #7, the template author can choose whatever rendering(s) of the 
> template he deems to be best understood.
> 

The language above actually accounts for that. Note that the use of the
formatting characters is not a requirement.  The statement is "as if
they were in a left-to-right override" and "as if they were in a
left-to-right embedding".  The following, for instance, would be
perfectly acceptable because the template will be rendered properly in
the browser.

  <bdo dir="ltr">{-prefix|/|<span dir="ltr">ABCD</span>}</bdo>

> I agree that your recommendations are good. But, almost no IRI template 
> processors will render IRI templates, and almost all renderers of IRI templates 
> will be text editors, web browsers, word processors, etc. that are ignorant 
> of these requirements. If there was a renderer with specific knowledge of IRI 
> templates, it would probably want to display the IRI template in both the 
> logical order the visual order, to aid in debugging. So, basically, these 
> requirements mean nothing because they won't be implemented.

FWIW, there's already one implementation [1].

Example:

  Template template = new Template("...");
  String forprocess = template.getPattern();
  String fordisplay = template.getPatternForDisplay();

[1]
http://svn.apache.org/repos/asf/incubator/abdera/java/trunk/dependencies/i18n/src/main/java/org/apache/abdera/i18n/templates/

> 
> The final suggestion I have is that the requirements should be written such that 
> BIDI markup is taken into account everywhere where the override characters are 
> allowed, since usually BIDI markup is preferred over using the Unicode overrides 
> (according to  http://www.w3.org/TR/unicode-xml/).

See above.  The text could likely be made more explicit in this regard.

- James

Received on Monday, 3 December 2007 03:49:56 UTC