- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 12 Feb 2004 16:32:38 -0500
- To: "Michel Suignard" <michelsu@windows.microsoft.com>
- Cc: <public-iri@w3.org>, "Mark Davis" <mark.davis@jtcsv.com>, bidi@unicode.org
Hello Michel, Many thanks for yor text. I have taken a different way. The new text now reads: <<<<<<<< When rendered, bidirectional IRIs MUST be rendered using the Unicode Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be rendered in the same way as they would be rendered if they were in an left-to-right embedding, i.e. as if they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL FORMATTING (PDF). Setting the embedding direction can also be done in a higher-order protocol (e.g. the dir='ltr' attribute in HTML). There is no requirement to actually use the above embedding if the display is still the same without the embedding. For example, a bidirectional IRI in a text with left-to-right base directionality (such as used for English or Cyrillic) that is preceded and followed by whitespace and strong left-to-right characters does not need an embedding. Also, a bidirectional relative IRI that only contains strong right-to-left characters and weak characters and that starts and ends with a strong rigth-to-left character and appears in a text with right-to-left base directionality (such as used for Arabic or Hebrew) and is preceded and followed by whitespace and strong characters does not need an embedding. In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM) may be sufficient to force the correct display behavior. However, the details of the Unicode Bidirectional algorithm are not always easy to understand. Implementers are strongly advised to err on the side of caution and to use embedding in all cases where they are not completely sure that the display behavior is unaffected without the embedding. The Unicode Bidirectional Algorithm ([UNI9], Section 4.3) permits higher-level protocols to influence bidirectional rendering. Such changes by higher-level protocols MUST NOT be used if they change the rendering of IRIs. The bidirectional formatting characters that may be used before or after the IRI to assure correct display are themselves not part of the IRI. IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of the IRI, but do not themselves appear visually. It would therefore not be possible to correctly input an IRI with such characters. <<<<<<<< The old text read: >>>>>>>> When rendered, bidirectional IRIs MUST be rendered using the Unicode Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be rendered with an overall left-to-right (ltr) direction. The Unicode Bidirectional Algorithm ([UNI9], Section 4.3) permits higher-level protocols to influence bidirectional rendering. Such changes by higher-level protocols MUST NOT be used if they change the rendering of IRIs. In text with a left-to-right base directionality or embedding (such as used for English or Cyrillic), the Unicode Bidirectional Algorithm will automatically use an overall ltr direction for the IRI. In text with a rtl base directionality or embedding (such as used for Arabic or Hebrew), setting a different embedding direction for the IRI is needed. Setting the embedding direction can be done in a higher- order protocol (e.g. the dir='ltr' attribute in HTML). If this is not available (e.g. in plain text), setting the embedding is done with Unicode bidi formatting codes, i.e. U+202A, LEFT-TO-RIGHT EMBEDDING (LRE) before the IRI, and U+202C, POP DIRECTIONAL FORMATTING (PDF) after the IRI, both not being part of the IRI itself. IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of the IRI, but do not themselves appear visually. It would therefore not be possible to correctly input an IRI with such characters. >>>>>>>> There are several changes, in particular: - Making clear that the required display behavior is that of an ltr embedding (not just ltr base directionality). - Tightening the case(s) that don't actually need the embedding to avoid the cases that were wrongly included, as found by Michael. - Describing a case where no embedding is necessary in a purely rtl context (what Jony was looking for). The rest is mostly just moving things around a bit. Please check and tell me if I have missed something. At 17:04 04/02/11 -0800, Michel Suignard wrote: >Martin, here is my new proposed text (in quotes) for replacement ofn the >2nd paragraph of clause 4.1: > ><< >When rendered, bidirectional IRIs MUST be rendered using the Unicode >Bidirectional Algorithm [UNIV4] [UNI9] with an overall left-to-right >(ltr) direction. >To achieve this, the IRI is embedded left-to-right in >all the following cases: >1. If the current embedding level before the IRI is odd (right-to-left) >2. If the last character with a strong directionality before the IRI is >right-to-left >3. If the first character with a strong directionality after the IRI is >right-to-left. I think these three conditions would cover all the necessary cases, but they would also force embedding in Jony's case, which is not necessary and which I wanted to avoid. >No additional bidirectional rendering change by higher-level protocols >is allowed. > >Note: Embedding the IRI left-to-right can be achieved by embedding the >text with LRE...PDF. If the maximum allowed embedding level is exceded >(above 62), the IRI overall left-to-right direction may not be enforced. > >> I prefer not to mention the 62 levels case. It is part of the bidi algorithm, and the limit is set so high that it shouldn't affect anything but pathological cases anyway. >The small diagramm (to be seen in monospaced chars) shows the desired >result > >-String before-| IRI |-String after-- > L ON L >(For the string before and after, the IRI behaves as bidi 'ON') I'm not actually sure that that's possible with an embedding. For example in rule W1 in the bidi algorithm, we have sor NSM -> sor L (assuming sor is L) A sor of L could result from a closing of an ltr embedding. So if I understand the way to calculate sor/eor correctly, the IRI would appear as L to the surroundings. >(For the >IRI itself, string before and after behave as bidi 'L') That I think is correct. >BTW I am interpreting clause W2 of the Unicode Bidi algorithm concerning >the strong type enumeration as including as well the embedding >characters (at least the LRE) as it is necessary in the logic expressed >above. Yes. That's expressed by the sor, which would be L in the case of starting an ltr embedding. Regards, Martin. >I have tried one of the sample bidi algorithm (Asmus Freytag >version) and it behaves that way. > >Michel
Received on Thursday, 12 February 2004 17:10:10 UTC