- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 12 Feb 2004 16:32:38 -0500
- To: "Michel Suignard" <michelsu@windows.microsoft.com>
- Cc: <public-iri@w3.org>, "Mark Davis" <mark.davis@jtcsv.com>, bidi@unicode.org
Hello Michel,
Many thanks for yor text. I have taken a different way. The new text
now reads:
<<<<<<<<
When rendered, bidirectional IRIs MUST be rendered using the Unicode
Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be
rendered in the same way as they would be rendered if they were in an
left-to-right embedding, i.e. as if they were preceded by U+202A,
LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP
DIRECTIONAL FORMATTING (PDF). Setting the embedding direction can
also be done in a higher-order protocol (e.g. the dir='ltr'
attribute in HTML).
There is no requirement to actually use the above embedding if the
display is still the same without the embedding. For example, a
bidirectional IRI in a text with left-to-right base directionality
(such as used for English or Cyrillic) that is preceded and followed
by whitespace and strong left-to-right characters does not need an
embedding. Also, a bidirectional relative IRI that only contains
strong right-to-left characters and weak characters and that starts
and ends with a strong rigth-to-left character and appears in a text
with right-to-left base directionality (such as used for Arabic or
Hebrew) and is preceded and followed by whitespace and strong
characters does not need an embedding.
In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM) may be
sufficient to force the correct display behavior. However, the
details of the Unicode Bidirectional algorithm are not always easy to
understand. Implementers are strongly advised to err on the side of
caution and to use embedding in all cases where they are not
completely sure that the display behavior is unaffected without the
embedding.
The Unicode Bidirectional Algorithm ([UNI9], Section 4.3) permits
higher-level protocols to influence bidirectional rendering. Such
changes by higher-level protocols MUST NOT be used if they change the
rendering of IRIs.
The bidirectional formatting characters that may be used before or
after the IRI to assure correct display are themselves not part of
the IRI. IRIs MUST NOT contain bidirectional formatting characters
(LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual
rendering of the IRI, but do not themselves appear visually. It
would therefore not be possible to correctly input an IRI with such
characters.
<<<<<<<<
The old text read:
>>>>>>>>
When rendered, bidirectional IRIs MUST be rendered using the Unicode
Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be
rendered with an overall left-to-right (ltr) direction. The Unicode
Bidirectional Algorithm ([UNI9], Section 4.3) permits higher-level
protocols to influence bidirectional rendering. Such changes by
higher-level protocols MUST NOT be used if they change the rendering
of IRIs.
In text with a left-to-right base directionality or embedding (such
as used for English or Cyrillic), the Unicode Bidirectional Algorithm
will automatically use an overall ltr direction for the IRI. In text
with a rtl base directionality or embedding (such as used for Arabic
or Hebrew), setting a different embedding direction for the IRI is
needed. Setting the embedding direction can be done in a higher-
order protocol (e.g. the dir='ltr' attribute in HTML). If this is
not available (e.g. in plain text), setting the embedding is done
with Unicode bidi formatting codes, i.e. U+202A, LEFT-TO-RIGHT
EMBEDDING (LRE) before the IRI, and U+202C, POP DIRECTIONAL
FORMATTING (PDF) after the IRI, both not being part of the IRI
itself.
IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM,
LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of
the IRI, but do not themselves appear visually. It would therefore
not be possible to correctly input an IRI with such characters.
>>>>>>>>
There are several changes, in particular:
- Making clear that the required display behavior is that of an ltr
embedding (not just ltr base directionality).
- Tightening the case(s) that don't actually need the embedding
to avoid the cases that were wrongly included, as found by Michael.
- Describing a case where no embedding is necessary in a purely
rtl context (what Jony was looking for).
The rest is mostly just moving things around a bit. Please check and
tell me if I have missed something.
At 17:04 04/02/11 -0800, Michel Suignard wrote:
>Martin, here is my new proposed text (in quotes) for replacement ofn the
>2nd paragraph of clause 4.1:
>
><<
>When rendered, bidirectional IRIs MUST be rendered using the Unicode
>Bidirectional Algorithm [UNIV4] [UNI9] with an overall left-to-right
>(ltr) direction.
>To achieve this, the IRI is embedded left-to-right in
>all the following cases:
>1. If the current embedding level before the IRI is odd (right-to-left)
>2. If the last character with a strong directionality before the IRI is
>right-to-left
>3. If the first character with a strong directionality after the IRI is
>right-to-left.
I think these three conditions would cover all the necessary cases,
but they would also force embedding in Jony's case, which is not
necessary and which I wanted to avoid.
>No additional bidirectional rendering change by higher-level protocols
>is allowed.
>
>Note: Embedding the IRI left-to-right can be achieved by embedding the
>text with LRE...PDF. If the maximum allowed embedding level is exceded
>(above 62), the IRI overall left-to-right direction may not be enforced.
> >>
I prefer not to mention the 62 levels case. It is part of the bidi
algorithm, and the limit is set so high that it shouldn't affect
anything but pathological cases anyway.
>The small diagramm (to be seen in monospaced chars) shows the desired
>result
>
>-String before-| IRI |-String after--
> L ON L
>(For the string before and after, the IRI behaves as bidi 'ON')
I'm not actually sure that that's possible with an embedding.
For example in rule W1 in the bidi algorithm, we have
sor NSM -> sor L
(assuming sor is L)
A sor of L could result from a closing of an ltr embedding.
So if I understand the way to calculate sor/eor correctly,
the IRI would appear as L to the surroundings.
>(For the
>IRI itself, string before and after behave as bidi 'L')
That I think is correct.
>BTW I am interpreting clause W2 of the Unicode Bidi algorithm concerning
>the strong type enumeration as including as well the embedding
>characters (at least the LRE) as it is necessary in the logic expressed
>above.
Yes. That's expressed by the sor, which would be L in the case of
starting an ltr embedding.
Regards, Martin.
>I have tried one of the sample bidi algorithm (Asmus Freytag
>version) and it behaves that way.
>
>Michel
Received on Thursday, 12 February 2004 17:10:10 UTC