- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 04 Feb 2009 19:23:31 +0000
- To: www-xml-schema-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6530 --- Comment #3 from C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> 2009-02-04 19:23:30 --- [Executive summary: The answer to the question "If W3C can't get this right, who can?" is essentially "But this IS right -- or as nearly right as HTML's faulty document grammar allows one to get".] I should probably fess up; the editor who produced the div elements with class="p" is me. The root of the problem is that two schools of thought analyse modern technical documentation in two different ways. One school of thought distinguishes rigorously between character-level styles and paragraph-level styles, and holds that objects with paragraph-level styling do not nest. Many word processors take essentially this view; perhaps it simplifies the layout calculations. The other school of thought observes that after a short block-style example <eg>like this one</eg> it is not unusual for the same paragraph -- or even the same sentence -- to continue. I have known intelligent, thoughtful people on both sides of this question, and I don't want to re-argue it here. The two schools of thought exist, and their analyses have observable consequences for the document grammars they write. HTML's rules for p, ul, ol, etc. align it with the first school. The rules for p, list, etc. in the XMLspec / specprod vocabulary align it with the second; this reflects its heritage from TEI (and probably also Docbook). Replacing all the occurrences of div with class="p" by 'p' elements would result in an invalid document, and thus in a document unpublishable on the W3C /TR page. Translating from the second style to the first style is always possible in theory, sometimes easy, and often feasible if the stylesheet author has a high enough pain threshold, but in my experience it can be remarkably error prone. When I began maintaining the editorial system, our stylesheets routinely produced invalid XHTML for this reason among others. We were able to change the stylesheets to make them produce better XHTML, but chunk-level objects of many different kinds can appear insde of specprod paragraphs, there are very complicated interactions with the diff markup. and from time to time the first version of a change I installed would turn out to break something else. As would the second through fifth versions of the change. Writing each fix six different ways can really eat into a time budget. For a while we tried tidy, but I was unable to find ways to prevent tidy from introducing unwanted white space in semantically sensitive locations, so we no longer use it. Eventually, I did in the stylesheets for the XSD spec what I had long ago done in my stylesheets for TEI markup. Since the HTML 'p' element does not model paragraphs as I understand paragraphs, but the HTML 'div' element does, I began translating 'p' elements in TEI (and now in specprod) into 'div', not 'p'. (In my TEI stylesheets, the class attribute gets the value 'real-P', which captures my sentiments but seemed unnecessarily truculent for the XSD spec.) Ultimately, I guess my defense of the current markup is that 'div class="p"' is a better semantic match for the 'p' of the source document than the HTML 'p' element. And unlike the HTML 'p' element it does not require jumping through hoops to make its usage valid. That is to say, the answer to the question "If W3C can't get this right, who can?" is essentially "but this IS right; translating a specprod 'p' into HTML 'p' is tag abuse". This suggests, of course, that we ought to eliminate ALL of the 'p' elements in the output, in favor of 'div class="p"', for consistency. But some are produced by templates I don't own. It might be entertaining to write a stylesheet which does nothing at all but try to turn 'div class="p"' elements into appropriate sequences of HTML 'p' and other elements; if nothing else, it's a good advertisement for the grouping features of XSLT 2.0. So I'll put this on my someday pile. I'm sympathetic to the cause of semantic cleanliness. But in the immediate future, any change would require a rather large investment of effort, which would seem to produce either a small benefit, or a small decrement in quality. Given that our editorial resources have just been cut back severely, I'm not confident this issue will make the cut. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Wednesday, 4 February 2009 19:23:40 UTC