Re: DogFood (and inline/block constraints) from David Carlisle on 2007-12-12 (www-html@w3.org from December 2007)

From: David Carlisle <davidc@nag.co.uk>
Date: Wed, 12 Dec 2007 14:55:39 GMT
To: ian@hixie.ch
Cc: www-html@w3.org
Message-Id: <200712121455.lBCEtdrT032184@edinburgh.nag.co.uk>
Ian

> However, this is far from a resolved issue. What would be especially 
> useful is a study of use cases -- occurances where people mix inlines and 
> blocks, and why unmixed alternatives don't really address the needs of the 
> author. (Your blog would be a great start to look for such cases.)

HTML has always stood out amongst marked up document formats in having a
very restricted content model for paragraphs that doesn't allow block
level markup. I always viewed div as "p with a fixed content model"
(which isn't really the intention of div, but a very plausible way of
using it.)

docbook, TEI, the W3C's xmlspec markup all allow block level markup in
paragraphs, as does (La)TeX.

Consider the following two paragraphs:

The subject of this paragraph is the equation
  E=mc^2
where c denotes the speed of light.


I have a list of three things
1, the first thing,
2, the second thing and
3, the third thing.
This list is not very interesting.


In the first case the paragraph consists of a single sentence: the
"where..." is not a new paragraph it doesn't want to be marked up as 
<p class="no-indentation">.. 
It is just the end of the sentence and the end of the paragraph, so
should be  in the same block as the start of the sentence:
<p>The subject...



The second case with a list is similar, although there at least the text
following the list is a different sentence.


Apart from forcing the users to mark up the text in a way that is at
variance with the intended meaning (you can't have a sentence that spans
two paragraphs, even if that sentence contains a quotation that itself
has block structure) this restricted content model causes many problems
when mapping from other markup languages to HTML  (If html p is used to
model paragraphs.


See for example a recent comment on coming from the w3c's xmlspec markup.

http://lists.w3.org/Archives/Public/spec-prod/2007OctDec/0002.html

  I made a couple of other fixes to the standard XSLT but the one that
  probably needs doing the most that I haven't done is that a list inside
  a paragraph in xmlspec generates invalid XHTML. 


A lightning survey of other document types:

DoocBook paragraphs

http://www.docbook.org/tdg/en/html/para.html

   A Para is a paragraph. Paragraphs in DocBook may contain almost all
  inlines and most block elements. 


TEI pargraphs

http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html

  [ a bit inscrutable but note that the content model contains (at least)
  lists as well as inline text]


XHTML2 paragraphs

http://www.w3.org/TR/xhtml2/mod-structural.html#sec_8.6.

   In comparison with earlier versions of HTML, where a paragraph could
   only contain inline text, XHTML2's paragraphs represent the
   conceptual idea of a paragraph, and so may contain lists,
   blockquotes, pre's and tables as well as inline text. 


LaTeX paragraphs
    [couldn't find a good URI to cite, but trust me the LaTeX system
    goes to some lengths to support nested block structures such as
    displayed mathematics and lists within a paragraph]


All of the above document types are commonly used for authoring and
converted to HTML for display. the propsoal to restrict the content
model for div but not follow XHTML2 in opening up the content model for
p makes that conversion significantly harder, and makes the resulting
HTML significantly less structurally useful as you need to introduce
spurious paragraphs together with extra CSS styling to supress any
typographic display that would normally be associated with a paragraph.



David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________
Received on Wednesday, 12 December 2007 14:56:00 UTC