Re: Should <br> be removed from the HTML5 spec?

Entities like &ls; and &ps; seem nice to have. Even without those, we can 
always use numeric character references like &#x2028; and &#x2029;
The really important thing is to convince browser developers to support 
these characters properly, in whatever form they appear in the text.

Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Fax: +972 2 5870333    Mobile: +972 52 2554160




From:   Amit Aronovitch <aronovitch@gmail.com>
To:     public-i18n-bidi@w3.org
Date:   05/11/2010 10:04
Subject:        Should <br> be removed from the HTML5 spec?
Sent by:        public-i18n-bidi-request@w3.org



Following the correspondence on HTML5 bugs #10828 and #11211,
I started to think that maybe a more radical approach is the way to go.

Quick summary: 

1) The issue at hand is the current problems and incompatibility issues 
with <br>:
       * HTML4 spec (implemented by Gecko and Opera) says that <br> should 
be "soft" (not introduce a bidi paragraph break).
       * Implementation by IE and Webkit (as well as common usage on 
websites) assume "hard" <br>  (considered bidi paragraph separator).

2) Our suggestion, originally in #10828, was split into two bugs:
    * #10828 suggests that <br> should be "hard" by default (contrary to 
HTML4).
    * #11211 suggests that some way should be provided to produce a "soft 
<br>".

3) While #10828 seems to be on the road to acceptance, we are having 
trouble with use cases for #11211:
    *  The "natural" use case (I assume, based on old HTML tutorials, that 
this was the original purpose of <br>  ) is poems. 
        However, while there are *a lot* of websites with poems, I could 
not find an RTL poem that contains numbers and LTR words (I am sure there
        exist such sites - would appreciate a link if you find one). 
        Same goes for mail addresses (they do include numbers and Latin 
words, but the <br>'s are normally positioned in non-sensitive places.
    * A use case brought up by Adil (comment 18 on #10828), a site 
displaying a newspaper page, in a way that should match the actual printed 
page.
       This was criticized by Ian Hickson for being "a bit of an abuse of 
HTML".

Now, considering the last point, the editor does have a point. These 
linebreaks are a matter of display. However, the thing that troubles me is 
that 
*the same argument applies equally well to the "hard" <br> * (bug 
#102828).
As opposed to <p>, which reflects a logical partition of the text into DOM 
nodes, <br> is a matter of textual display.
Since HTML5 aims to represent the pure logical structure of document, 
maybe the right place for such things is in the interaction between the 
contents (text) and CSS.

Hence, the new suggestion:

(1) Add mandatory entities (temporary names):
    &ps; for U+2029 (paragraph separator, replacement for "hard" <br>)
    &ls; for U+2028 (line separator, replacement for "soft" <br>)

(2) Remove <br> from HTML5 spec.
    Add a comment saying that <br> was deprecated, stating explicitly that 
when upgrading from ÑýHTML4, <br> should be replaced with &ps; if the 
intended
    use was a hard break, or with &ls; if the intended use was compliant 
with the HTML4 spec (i.e. soft break).

In a private discussion with Aharon, he said that this approach might, in 
practice, work against our goal of reaching compatibility. I'm sure he can 
explain this better than me.
The purpose of this post is trying to reach some concensus before posting 
the idea in bugzilla.

 thanks for taking the time to read,
            Amit A.

Received on Sunday, 7 November 2010 10:46:05 UTC