- From: Benjamin Hawkes-Lewis <benjaminhawkeslewis@hotmail.com>
- Date: Sun, 17 Sep 2006 11:57:35 +0100
- To: www-style@w3.org
David Woolley writes: > A general diatribe against HTML, and for TEI, is not a good start. My post certainly wasn't intended as a "general diatribe against HTML, and for TEI", so I am sorry if that's how it came across. To describe HTML as more "conservative" than TEI with regards to semantic markup is *not* the same as saying it is *worse* than TEI. I don't even especially like how TEI handles quotations. I was actually trying to emphasize just how weird <Q> is when compared with the rest of HTML. > HTML is not designed for professional typesetters. Neither is TEI, of course, if that's what you were implying. The computer age has placed much of what was once "professional typesetting" into the hands of the ordinary citizen. To most content creators I've encountered, democratized WYSIWYG typography is vastly more approachable than markup of any sort, let alone semantic markup. (This partly reflects mass acculturation.) Moreover, the CSS specifications include plenty of functionality that has traditionally been reserved for "professional typesetting", such as support for drop capitals [1]. However, the markup of quotations has little to do with typesetting, and the punctuation of quotations is simply a matter of correct punctuation, hardly "professional typesetting". > It was originally designed to allow any reasonably intelligent person > to create useful documents. As an aside, I haven't seen much freely available evidence for this as a *explicit* design goal of HTML. If you look at the 1992 "Design Constraints" [2] for example, HTML's simplicity aimed to make things easy for browser and editor developers, not content authors; if you look at the 1997 HTML 3.2 specification [3] there's an assertion that HTML's "generic semantics ... are appropriate for representing information from a wide range of applications." Interestingly, HTML 4.0 claimed that HTML was "the publishing language of the World Wide Web" [4], dropping its previous formulation as "a simple format" [3, 5-6]. Tim Berners-Lee has stated that "HTML is too complex" [7]. Having said that, comprehensibility to a "reasonably intelligent person" (at the very least!) must be an implicit design goal for *any* document markup language, and I'm sure it was for HTML too. In my previous post, I criticized (X)HTML for deviating from *that* design goal with counter-intuitive specifications for headings, <p>, <ol>, and <dl>. This wasn't meant as a "general" attack on (X)HTML, but a warning about the minefield of lossy abstractions that you walk into when you start replacing typography familiar from print with semantic markup. Did you mean to imply that understanding <q> is beyond the abilities of a "reasonably intelligent person", or that it would be especially difficult to develop tools that generate documents including <q>? I would have thought <q> to be one of the most easily comprehensible elements in the HTML 4.01 specification. Unlike elements such as <span>, <dl>, <object>, and <frame>, it maps directly to a concept most of us learn in school. It is admittedly confusing that print typography uses quotation punctuation for styling things other than quotations from sources, as with scare quotes. But print typography also uses italic for more purposes than emphasis (book titles for example), which hasn't stopped people designing WYSIWIG authoring tools which confusingly generate <em> with an "I" button [8]. Do you think creating <em> was a mistake? Neither MediaWiki [9] nor John Gruber's Markdown [10] include syntax comparable to <q>, but as these languages are intermediate interfaces between the user and the complexities of HTML, that may be a knock-on effect of browsers' problems with <q>. DocBook includes a <quote> element because using "an element for quotations is frequently more convenient than entering the character entities for the quotation marks by hand, and makes it possible for a presentation system to alter the format of the quotation marks" [11]. Jukka "Yucca" Korpela's proposal for "simple and intuitive" document markup language that could be easily written by hand [12] includes markup for quotations similar to that in HTML 4.01 [13]. > that compromise often includes relying on normal punctuation in the > text, without specific markup. There seem to be two vaguely contradictory assumptions in what you're saying: A) Correct punctuation is too arcane to facilitate within a markup language designed for "reasonably intelligent" people. B) "Reasonably intelligent" people can be relied on to produce "normal punctuation in the text". Also, your reference to "normal punctuation in the text" ignores the fact that wherever Problem C arises from "normal punctuation", it would require CSS or some new sort of markup to produce punctuation at the start of each line -- unless you think people should split lines of inner quotations to arbitrary lengths with <br>? Or by "normal punctuation" do you actually mean not the punctuation the author is used to, but rather whichever arbitrary punctuation is within the capabilities of current (X)HTML? Anyway, if you think <q> shouldn't exist at all, then you're welcome to advocate that it should be deprecated in HTML, removed from XHTML 2.0, and that the quote styling properties in CSS should be dropped. I don't think that would be my first choice, but it would be an improvement over the current situation. But if we're *not* going to do that [14-16], I don't see why we shouldn't aim to create a specification for <q> in (X)HTML + CSS that actually works. I think it is a bad idea to create specifications -- even if they require at least "reasonable intelligence" -- that include broken components. It's confusing and damaging to confidence in the specification as a whole. > An attempt to introduce a purely semantic inline quote element was > essentially ignored by users. Given it was never effectively implemented by Internet Explorer or Jaws, content creators [e.g. 17-19] had little or no choice in the matter. Indeed, this argument from tag soup would be more effective against <blockquote>, which has near universal browser support but is often ignored or used incorrectly by authors [e.g. 20-26]! It's worth noting that the <blockquote> and <q> element also have cite attributes. If browsers had actually implemented those (much as many of them implemented the title attribute for <acronym>), it might have made the elements rather more popular. References ---------- [1] http://www.w3.org/TR/REC-CSS2/selector.html#first-letter [2] http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/HTMLConstraints.html [3] http://www.w3.org/TR/REC-html32 [4] http://www.w3.org/TR/1998/REC-html40-19980424/ [5] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt [6] http://www.w3.org/MarkUp/html-spec/html-spec_1.html#SEC1 [7] http://www.3i.com/isight/pages/interview.html [8] http://www.fckeditor.net/demo/ [9] http://www.mediawiki.org/wiki/Help:Formatting [10] http://daringfireball.net/projects/markdown/syntax [11] http://www.oasis-open.org/docbook/documentation/reference/html/quote..html [12] http://www.cs.tut.fi/~jkorpela/data/utd.html#goals [13] http://www.cs.tut.fi/~jkorpela/data/utd.html#q [14] http://www.w3.org/TR/xhtml2/mod-text.html#sec_9.8. [15] http://www.w3.org/TR/css3-content/#quotes-specify [16] http://whatwg.org/specs/web-apps/current-work/#the-q [17] http://forum.textpattern.com/viewtopic.php?id=12599 [18] http://wordpress.org/support/topic/37476 [19] http://historytalk.typepad.com/basic/2006/05/scholarship_on_.html [20] http://www.quotationspage.com/quote/14577.html [21] http://education.yahoo.com/reference/quotations/quote/22493 [22] http://www.bartleby.com/66/59/12359.html [23] http://www.uky.edu/AS/Classics/rhetoric.html [24] http://en.wikipedia.org/wiki/We_shall_fight_on_the_beaches [25] http://news.bbc.co.uk/onthisday/hi/dates/stories/june/4/newsid_3500000/3500865.stm [26] http://www.nla.gov.au/pub/gateways/archive/38/p14a01.html --------------------- Benjamin Hawkes-Lewis
Received on Sunday, 17 September 2006 11:32:07 UTC