Re: Moving Right Along on the Inclusions Table...

Martin Duerst wrote:

<quote>
Checking RFC 3987, I also
found that the text there may need to be clarified (it needs to
be updated to take into account combining marks at the end of
components anyway). [cc: the IRI mailing list]

It currently says:

   1.  A component SHOULD NOT use both right-to-left and left-to-right
       characters.

   2.  A component using right-to-left characters SHOULD start and end
       with right-to-left characters.

I think that at least should be changed to:

   1.  A component SHOULD NOT use both right-to-left and left-to-right
       letters.

   2.  A component using right-to-left characters SHOULD start and end
       with right-to-left letters.
<end of quote>

While I fully agree with Martin's intent, I am not sure that the proposed 
changed text accomplishes its purpose.

First, changing the wording of rule 1 from "characters" to "letters" 
allows mixing in the same component LTR and RTL characters which are not 
letters, leading to confusion for readers who are not expert in the 
Unicode Bidirectional Algorithm.

Secondly, the new wording of rule 2 does not allow combining marks at the 
end of components.  I suggest the following.

a) Leave rule 1 as is:

   1.  A component SHOULD NOT use both right-to-left and left-to-right
       characters.

b) Change rule 2 as follows:

   2.  A component using right-to-left characters SHOULD start with a
       right-to-left letter and end with a right-to-left letter 
       optionally followed by combining marks.


Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52 
2554160

Received on Tuesday, 2 January 2007 13:43:43 UTC