RE: IRI Templates and Bidi Characters

James M Snell wrote:
> I'm fine with this but the spec needs to at least state how 
> and where the bidi formatting codes should be used in order 
> to display the template properly so users don't have to 
> guess.  It also needs to be pointed out explicitly in the 
> spec that typing out a bidi template in logical order without 
> the formatting codes will have surprising and often ambiguous results.

I see. You are saying that if I see a IRI template in printed documentation, and I want to copy that template into my source code, I cannot reliably do so if BIDI characters are present. How about recommending that BIDI formatting characters should be rendered explicitly in printed IRI templates, much like IETF specifications render whitespace using symbols like <SP> and <CRLF>, whenever there is a chance of ambiguity? 

> 1. IRI Templates MUST be stored and transmitted in logical order.
> 2. IRI Templates MUST be rendered using the unicode bidi algorithm
> 7. The IRI Template MAY contain bidi formatting characters necessary
>    to ensure that the template is properly rendered.  The bidi
>    formatting characters MAY be stored and transmitted with the
>    template but the template processor MUST remove all bidi formatting
>    characters from the template prior to processing.

I agree with these completely. How about changing "The IRI template MAY contain..." to "It is RECOMMENDED that the IRI Template contain..." and "BIDI overrides SHOULD be preserved as long as possible (until the template is expanded into an IRI)." That way, the author's preferred rendering will be used throughout processing.

> 5. Variables names SHOULD NOT contain a mix of LTR and RTL characters
> 6. Variable names containing RTL characters SHOULD start and end with
>    RTL characters.

There should not be inconsistency between the grammar and the prose. I think these requirements should be left out, or changed to recommendations, since the grammar allows such names. 

> 3. IRI Templates SHOULD be rendered as if they were in a Left-to-Right
>    Override (preceded by U+202D and followed by U+202C). As with IRIs,
>    there is no need to use the explicit override if the 
>    template can be displayed properly without it.
> 4. Template variable names SHOULD be rendered as if they were in a
>    Left-to-Right embedding (preceded by U+202A and followed by U+202C).
>    This will ensure that variable names containing RTL characters will
>    be properly rendered without affecting the ordering of the rest of
>    the template.  There is no requirement to use the explicit 
>    embedding if the template can be displayed properly without it.

I think these should be changed to recommendations for authors of IRI templates to explicitly include these overrides in their templates using the recommended mechanism for the embedding document (markup or override characters). By rules #1, #2, and #7, the template author can choose whatever rendering(s) of the template he deems to be best understood.

I agree that your recommendations are good. But, almost no IRI template processors will render IRI templates, and almost all renderers of IRI templates will be text editors, web browsers, word processors, etc. that are ignorant of these requirements. If there was a renderer with specific knowledge of IRI templates, it would probably want to display the IRI template in both the logical order the visual order, to aid in debugging. So, basically, these requirements mean nothing because they won't be implemented.

The final suggestion I have is that the requirements should be written such that BIDI markup is taken into account everywhere where the override characters are allowed, since usually BIDI markup is preferred over using the Unicode overrides (according to  http://www.w3.org/TR/unicode-xml/).

- Brian

Received on Monday, 3 December 2007 02:42:29 UTC