[Bug 6530] <div class="p"> is an abomination from bugzilla@wiggum.w3.org on 2009-02-04 (www-xml-schema-comments@w3.org from January to March 2009)

From: <bugzilla@wiggum.w3.org>
Date: Wed, 04 Feb 2009 19:23:31 +0000
To: www-xml-schema-comments@w3.org
Message-Id: <E1LUnLP-0005gk-75@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6530





--- Comment #3 from C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>  2009-02-04 19:23:30 ---
    [Executive summary: 
    The answer to the question "If W3C can't get this right, who can?" 
    is essentially "But this IS right -- or as nearly right as HTML's
    faulty document grammar allows one to get".]

I should probably fess up; the editor who produced the div elements
with class="p" is me.

The root of the problem is that two schools of thought analyse modern
technical documentation in two different ways.  One school of thought
distinguishes rigorously between character-level styles and
paragraph-level styles, and holds that objects with paragraph-level
styling do not nest.  Many word processors take essentially this view;
perhaps it simplifies the layout calculations.  The other school of
thought observes that after a short block-style example

  <eg>like this one</eg>

it is not unusual for the same paragraph -- or even the same sentence
-- to continue.

I have known intelligent, thoughtful people on both sides of this
question, and I don't want to re-argue it here.  The two schools of
thought exist, and their analyses have observable consequences for the
document grammars they write.

HTML's rules for p, ul, ol, etc. align it with the first school.  The
rules for p, list, etc. in the XMLspec / specprod vocabulary align it
with the second; this reflects its heritage from TEI (and probably
also Docbook).

Replacing all the occurrences of div with class="p" by 'p' elements
would result in an invalid document, and thus in a document
unpublishable on the W3C /TR page.

Translating from the second style to the first style is always
possible in theory, sometimes easy, and often feasible if the
stylesheet author has a high enough pain threshold, but in my
experience it can be remarkably error prone.  When I began maintaining
the editorial system, our stylesheets routinely produced invalid XHTML
for this reason among others.  We were able to change the stylesheets
to make them produce better XHTML, but chunk-level objects of many
different kinds can appear insde of specprod paragraphs, there are
very complicated interactions with the diff markup. and from time to
time the first version of a change I installed would turn out to break
something else.  As would the second through fifth versions of the
change.  Writing each fix six different ways can really eat into a
time budget.

For a while we tried tidy, but I was unable to find ways to prevent
tidy from introducing unwanted white space in semantically sensitive
locations, so we no longer use it.

Eventually, I did in the stylesheets for the XSD spec what I had long
ago done in my stylesheets for TEI markup.  Since the HTML 'p' element
does not model paragraphs as I understand paragraphs, but the HTML
'div' element does, I began translating 'p' elements in TEI (and now
in specprod) into 'div', not 'p'.  (In my TEI stylesheets, the class
attribute gets the value 'real-P', which captures my sentiments but
seemed unnecessarily truculent for the XSD spec.)

Ultimately, I guess my defense of the current markup is that 'div
class="p"' is a better semantic match for the 'p' of the source
document than the HTML 'p' element.  And unlike the HTML 'p' element
it does not require jumping through hoops to make its usage valid.
That is to say, the answer to the question "If W3C can't get this
right, who can?" is essentially "but this IS right; translating a
specprod 'p' into HTML 'p' is tag abuse".

This suggests, of course, that we ought to eliminate ALL of the 'p'
elements in the output, in favor of 'div class="p"', for consistency.
But some are produced by templates I don't own.

It might be entertaining to write a stylesheet which does nothing at
all but try to turn 'div class="p"' elements into appropriate
sequences of HTML 'p' and other elements; if nothing else, it's a good
advertisement for the grouping features of XSLT 2.0.  So I'll put this
on my someday pile.

I'm sympathetic to the cause of semantic cleanliness.  
But in the immediate future, any change would require a rather large
investment of effort, which would seem to produce either a small
benefit, or a small decrement in quality.  Given that our editorial
resources have just been cut back severely, I'm not confident this
issue will make the cut.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Wednesday, 4 February 2009 19:23:40 UTC