Re: [XHTML2] Unicode line and paragraph separators

On 4 Apr 2003 at 4:46, Masayasu wrote:

> "Ernest Cline" <ernestcline@mindspring.com> wrote:
> 
> > I can see both pluses and minuses to this but how about using the 
> > Unicode characters U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR 
> > either instead of or in addition to the <l> and <p> elements?
> 
> "Unicode in XML and other Markup Languages" Note classifies those
> characters as "characters not suitable for use with markup" [1].
> It is quite unlikely that XHTML 2.0 would advocate such usage
> against this guideline.
> 
> [1] http://www.w3.org/TR/unicode-xml/#Line

I wasn't aware of the TR, but having read it, I only agree with it in 
part.  The only reason that these characters are not recommended is 
because of the existance of paragraph and line markup in (X)HTML.  
Clearly the separator characters should not be used in HTML4.01 and 
earlier or for XHTML1.1 and earlier as those standards do not have any 
idea of those characters, as those standards were formed without those 
formatting characters being part of the consideration.  Therefore, a 
large number of existing working implementations would be broken if 
those characters were to become significant as anything more than 
whitespace for those earlier standards.  However, since XHTML2 will be 
starting fresh, any implementation will have to deal with its ways of 
doing things, such as in the current working draft of using <l></l> 
instead of <br/>. Therefore, I do not see this TR as an absolute bar 
against a decision to use the separators instead of <p></p> and <l></l> 
in XHTML2. The existence of the TR does mean that the change to use 
format characters instead of markup should be made only if a good case 
can be made for them.  Because so much is being changed with XHTML2, 
the question is which is the better choice for indicating lines and 
paragraphs, markup or formatting characters.  

As I pointed out earlier, both approaches have their advantages. 
Current practice would suggest the use of <p> and <l>. Using separators 
would allow for more compact coding of documents and would enable the 
XHTML2 grammar to be simplified, because both <p> and <l> must be 
special cased due to their need to not include themselves.  If there 
were no earlier (X)HTML standards, I think that separator model would 
be clearly the superior. If XHTML2 was not already engaged in the 
pruning of existing (X)HTML elements, then markup elements would 
clearly be the preferred choice. However, the earlier standards do 
exist and XHTML2 is pruning a significant number of (X)HTML elements, 
meaning that the choice must be made on another basis.

One potential basis for making the choice is whether paragraphs and 
lines are more semantic or presentational in nature. In making that 
determination, I think that looking at sentences would also be of use. 
Paragraphs, sentences, and lines are both semantic and presentational. 
There is no sentence markup, because most of the time, there is no 
benefit, and in those few circumstances there is benefit, <span> can 
handle the need.  Traditionally, paragraphs and lines were indicated by 
markup because there were no adequate formating characters to indicate 
thir boundaries, but they now exist.  If <p> and <l> would rarely be 
used for reasons beyond simply marking paragraph or line boundaries, 
the case would be made to shift the paradigm for indicating paragraphs 
and lines from markup to format characters for XHTML2.  If applying 
attributes (or other uses that depending upon paragraphs or lines being 
elements such as styling or scripting) occur commonly enough, then they 
should remain elements.  (It is quite possible that one method should 
be chosen for lines and another method for paragraphs.)  In the 
referenced technical note all occurances of attributes on the <p> 
elements could have been handled by applying them to containing 
<section>, <blockquote> or <td> elements in XHTML2, and none of the 
<br>'s could not have been replaced by simple line separators.  
However, one document does not make a point.  Is there any data on how 
often <p> does something more than mark paragraph boundaries in current 
(X)HTML practice?  That is what the choice should be based upon in my 
opinion.  (If someone has a set of representative (X)HTML documents 
that could be analyzed, that would also be of use.)

Received on Friday, 4 April 2003 00:51:56 UTC