- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 18 Feb 2005 07:01:53 -0000
- To: <eyalroz@technion.ac.il>, <public-i18n-core@w3.org>
- Cc: <bidi@unicode.org>
Eyal, <br> is intended for use as presentation oriented markup, not structural. The HTML 4.01 spec clearly describes it as equivalent to a line separator (ie. white space), as opposed to a paragraph delimiter[1]. Lines are not semantically important in HTML. (You can think of the effect as similar to what you'd expect from reducing the width of a window or box containing text, but applying that to a single line at a time.) The way Internet Explorer handles it looks clever in the context of the example being used, but actually produces incorrect results in other situations. Try this code in a browser: First, let's look at the original example from the bugzilla inclusion [for the code snippets I replaced body with p, and changed hebrew characters to xxxx, and display the text between markup in the order it appears *in memory*]: <p dir="rtl"> 1. xxxxxx xxxx English.<br> 2. xxx. </p> This produces this in Mozilla: English. xxxx xxxxxx .1 .xxx .2 and this in IE: .English xxxx xxxxxx .1 .xxx .2 Now compare that to: <p dir="rtl"> 1. xxxxxx xxxx English,<br> and more xxx. </p> In Mozilla you'll see: English, xxxx xxxxxx .1 .xxx and more (which is correct) and in IE you'll see: ,English xxxx xxxxxx .1 .xxx and more (which is incorrect) So Mozilla is actually doing the right thing. The way to think about this is that the <br> should actually be irrelevant for things like voice browsers. It is only there to force the line to visually wrap at a given point. In this way it is exactly the same as a 'soft carriage return' or 'forced line break' in other types of software. Authors typically use these things for making line lengths the same without setting the text box (often a questionable practise). In translation, for example, it is routine to remove all such forced line breaks before translation. This is because all the line breaks will occur at different vertical alignments in the translated string because text on a line expands or contracts at different rates as translated words are substituted. After translation, if necessary, forced line breaks are put back in at the appropriate places. In terms of the bidi algorithm, the code snippets above are equivalent to <p dir="rtl">1. xxxxxx xxxx English. 2. xxxx.</p> and <p dir="rtl">1. xxxxxx xxxx English, and more xxxx.</p> So how do we get the original example to look how we want? Like the old joke "What's the best way to get to <your city>?" "Well, I wouldn't start from here." The code is a bad implementation. If this is a numbered list, one ought to use list markup. If this is output from a text box, and creating list elements is really too complicated (although things like wiki's manage to figure it out), then enforce the use of carriage returns as paragraph separators, and put each line in a separate p element. Text input boxes should enforce appropriate behaviour by wrapping lines automatically when they are too wide for the box. It is possible for the author of the text to produce results that look better using ‏ or its equivalent, but this rides roughshod over the true structural problems of the text. So in summary, the problem is not with the bidi algorithm, nor with <br>, it is with the way the text has been marked up. Hope that helps, RI [1] http://www.w3.org/TR/html401/struct/text.html#edef-BR ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ W3C Internationalization: http://www.w3.org/International/ Publication blog: http://people.w3.org/rishida/blog/ > -----Original Message----- > From: public-i18n-core-request@w3.org > [mailto:public-i18n-core-request@w3.org] On Behalf Of Eyal > Rozenberg (by way of Martin Duerst <duerst@w3.org>) > Sent: 16 February 2005 04:45 > To: public-i18n-core@w3.org > Subject: Re: An issue with the Unicode BiDi Algorithm > > > > > > Mark Davis wrote: > >... it appears to be that the bug filers want to treat <br> > as if it > >really does start another paragraph, but one without > paragraph spacing. > > ... > >So it seems like what the people really want would be to use a <p > >style="margin:0"> instead of a <br>. > > But then, why should <br> ever be used? What I mean is, <p>'s > and <br>'s have semantic significance, they're not just > vehicles for visual style which you override with something > like "margin:0". e.g. I may want to break a line without > breaking the paragraph, and it is reasonable for me to want > to write an RTL sentence which ends with an LTR word before > the period on the first line, followed on the next line by an > RTL sentence which happens to begin with, say, a number. They > may be two sentences forming a single paragraph semantically, > which should not have to be split up just so as to display > like one would expect them to. > > So, is there some compelling reason why neutrals at ends of > lines should not have the same direction as that of the > paragraph (with no control characters present of course)? > > Eyal > > PS - I'm assuming it is appropriate for me to also CC the two > mailing lists; if that is not the case, please let me know. > > >
Received on Friday, 18 February 2005 07:01:54 UTC