W3C home > Mailing lists > Public > www-html@w3.org > April 2003

Re: [XHTML2] Unicode line and paragraph separators

From: Ernest Cline <ernestcline@mindspring.com>
Date: Sun, 06 Apr 2003 22:03:21 -0400
To: "www-html" <www-html@w3.org>
Message-ID: <3E90A429.13609.790A8D@localhost>

On 6 Apr 2003 at 18:11, Simon wrote:

> ----- Original Message -----
> From: "Ernest Cline" <ernestcline@mindspring.com>
> Subject: Re: [XHTML2] Unicode line and paragraph separators
> > Oh? And how do most end users know that the content of a <p> is a
> > paragraph? By the way that the user agent applies a style to all of
> > the <p> elements in a document so that it looks like a paragraph.
> But with XHTML 2.0, we are not actually concerned with what the end user
> sees. We are concerned by how the document is structuted.

True, but structure that serves mainly to provide styling serevs little 
purpose.  The majority of uses of <p> in current webpages is 
presenetation in nature.  How much so, I am currently in the midst of 

> > Because there exists another way to code paragraph breaks in the
> > default character set of XML, Unicode, that does not incur the overhead
> > associated with using markup, namely the paragraph separator U+2029.
> Why do we need another way of marking up a paragraph when a perfectly good
> one already exists? Furthermore, why should that new way differ from the
> usual method of marking up an element - by wrapping it in start and end
> tags?

Because there might be a better way, or to paraphrase you, "Why do we 
need another way of marking up hypertext when a perfectly good one 
already exists?" After all, the single genuinely new concept in XHTML2 
that cannot be duplicated by existing markup is <nl> and even that can 
be achieved by some fairly basic scripting. The reason is we are 
seeking a better way that involves eliminating unecessary elements. <p> 
may be such an example of an uneccessary element. (Altho <p> probably 
has a wider degree of utility than <code>, <kbd>, <samp>, and <var>.)

> > Every element that is kept should have to demonstrate why it should be
> kept.
> I agree with that.
> > Since the presentational aspects of <p> can be obtained thru the use of
> > the paragraph separator and appropriate CSS, <p> should only be
> > retained if the other uses, such as styling or scripting on a specific
> > paragraph, are used often enough that it is worth retaining.
> We have no idea what kind of user agent may be employed to 'read' our
> documents. It may be as visual browser, it may be an audio browser, it may
> also be some kind of search bot or similar tool. A paragraph is a BLOCK of
> text that includes one or more sentences and encompasses a concept, or it
> may be being used to indicate different speakers. It is just as fundamental
> as a sentence. In fact a sentence on its own would normally be considered a
> paragraph in any case. Therefore the paragraph should be considered a
> structural construct. The default formatting given to it by a UA isn't
> relevant to its existence.

Agreed, but the default formatting is why the majority of paragraphs in 
current HTML are marked up as paragraphs, even when the content of the 
<p> is not really a parargraph.

> > Sentences and words are structural elements, yet they do not have
> > markup associated with them.  There are two reasons why that is the
> > case.  First, unlike <p>, no markup was necessary in HTML to be able to
> > get the desired presentational effects.  Second, authors are rarely
> > concerned with providing non-presentational effects on sentences and
> > words.
> Actually, I think you have missed the most important reason. Sentence and
> word structure within paragraphs can be radically different from language to
> language, yet exist in almost all of them.
> One thing occurs to me. If you are suggesting we ignore the structural
> significance of paragraphs and treat them simply as separated chunks of
> text, is that not reducing them to something similar to an unordered list?
> Perhaps paragraphs should be marked up as lists instead.

I wouldn't say ignore, so much as say deemphasize. After all, there is 
not problem with being able to determine what is a sentence or a word 
in existing HTML if the need is there. It can either be done by 
scanning the text if all sentences or words need to ba analyzed or by 
placing markup such as <span> or some more appropriate markup around 
the senetence or word in question.

However, please tell me that you were trying to be humorous with that 
last remark.  In doing my little survey of web pages, I have seen so 
much bad coding practice, it is not funny. I have seen <p>'s used to 
hold content that should have been lists, list-items, headings and 
other semantic elements as well. <td><p>Text</p></td> is also 
depressingly common even in cases where the content was not even a 
sentence, much less a paragraph.
Received on Sunday, 6 April 2003 22:03:05 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:03 UTC