Re: [XHTML2] Unicode line and paragraph separators from Ernest Cline on 2003-04-06 (www-html@w3.org from April 2003)

From: Ernest Cline <ernestcline@mindspring.com>
Date: Sun, 06 Apr 2003 12:13:09 -0400
To: "www-html" <www-html@w3.org>
Message-ID: <3E9019D5.12974.294181B@localhost>
On 5 Apr 2003 at 20:59, Simon wrote:

> Ernest Cline wrote:-
> 
> <The main use of <p> in current (X)HTML practice is to apply
>  styling to paragraphs.>
> 
> Eh? The main use of <p> in current (X)HTML practice is to denote
> paragraphs, not to style them.

Oh? And how do most end users know that the content of a <p> is a 
paragraph? By the way that the user agent applies a style to all of  
the <p> elements in a document so that it looks like a paragraph. 
Despite the wishes of coding purists, current (X)HTML is largely 
written on the basis of its presentation and not its structure.

> <Is there somewhere a representative set of current (X)HTML pages
>  that could be used to make such a determination of how often the
>  <p> element is used for reasons in addition to breaking text up
>  into paragraphs?>
> 
> I don't understand why there needs to be another reason.

Because there exists another way to code paragraph breaks in the 
default character set of XML, Unicode, that does not incur the overhead 
associated with using markup, namely the paragraph separator U+2029.  
This method did not exist in the default character set of HTML, ISO-
8859-1 and hence for HTML, paragraphs had to be elements.  XHTML1.0 was 
an element for element reworking of HTML4.01 into XML without making 
any attempt to change how things were done.  XHTML1.1 is basically 
XHTML1.0 Strict with Ruby markup added.  HTML is a presentational 
language with some structural elements added because of how they 
affected the presentation.

XHTML2 is breaking new ground and is removing a lot of the clutter that 
has built up over the years so that it can be what we hope will be a 
purely structural language.  However, XHTML2 will not, should not and 
cannot hope to cover every single type of structure.  That is why the 
generic elements <div> and <span> exist.  Every element that is kept 
should have to demonstrate why it should be kept.

Since the presentational aspects of <p> can be obtained thru the use of 
the paragraph separator and appropriate CSS, <p> should only be 
retained if the other uses, such as styling or scripting on a specific 
paragraph, are used often enough that it is worth retaining.  This is 
especially the case for <p> because it has a unique and complex grammar 
due to the desire to keep certain block elements (Namely, <p> and those 
elements such <div> and <section> that can be thought of as containing 
groups of paragraphs.) from being inside it.  Using the paragraph 
separator instead would vastly simplify the grammar.  I am not saying 
that a case cannot be made for keeping <p>.  I am saying that the case 
has not yet been made.

> A paragraph is a structural element, not a presentational one.
> A paragraph is supposed to be a distinct section of a piece of
> writing and a paragraph break is supposed to denote either the end
> of a thought/idea or the end of a particular speaker.

Sentences and words are structural elements, yet they do not have 
markup associated with them.  There are two reasons why that is the 
case.  First, unlike <p>, no markup was necessary in HTML to be able to 
get the desired presentational effects.  Second, authors are rarely  
concerned with providing non-presentational effects on sentences and 
words.  When they are, markup such as <span>, <em> or <dfn> can be 
used.  As I noted earlier, the presentational effects of <p> can be 
achieved by other means if so desired. The question therefore becomes 
are the benefits of marking up paragraphs worth the extra overhead of 
doing so.
Received on Sunday, 6 April 2003 12:12:52 UTC