Re: CSS and quotation typography from Benjamin Hawkes-Lewis on 2006-09-17 (www-style@w3.org from September 2006)

From: Benjamin Hawkes-Lewis <benjaminhawkeslewis@hotmail.com>
Date: Sun, 17 Sep 2006 11:57:35 +0100
To: www-style@w3.org
Message-Id: <pan.2006.09.17.10.57.21.681953@hotmail.com>
David Woolley writes:

> A general diatribe against HTML, and for TEI, is not a good start.

My post certainly wasn't intended as a "general diatribe against HTML,
and for TEI", so I am sorry if that's how it came across. To describe
HTML as more "conservative" than TEI with regards to semantic markup is
*not* the same as saying it is *worse* than TEI. I don't even
especially like how TEI handles quotations. I was actually trying to
emphasize just how weird <Q> is when compared with the rest of HTML.

> HTML is not designed for professional typesetters. 

Neither is TEI, of course, if that's what you were implying.

The computer age has placed much of what was once "professional
typesetting" into the hands of the ordinary citizen. To most content
creators I've encountered, democratized WYSIWYG typography is vastly
more approachable than markup of any sort, let alone semantic markup.
(This partly reflects mass acculturation.) Moreover, the CSS
specifications include plenty of functionality that has traditionally
been reserved for "professional typesetting", such as support for drop
capitals [1].

However, the markup of quotations has little to do with typesetting,
and the punctuation of quotations is simply a matter of correct
punctuation, hardly "professional typesetting".

> It was originally designed to allow any reasonably intelligent person
> to create useful documents.

As an aside, I haven't seen much freely available evidence for this as a
*explicit* design goal of HTML. If you look at the 1992 "Design
Constraints" [2] for example, HTML's simplicity aimed to make things
easy for browser and editor developers, not content authors; if you
look at the 1997 HTML 3.2 specification [3] there's an assertion that
HTML's "generic semantics ... are appropriate for representing
information from a wide range of applications." Interestingly, HTML 4.0
claimed that HTML was "the publishing language of the World Wide Web"
[4], dropping its previous formulation as "a simple format" [3, 5-6].
Tim Berners-Lee has stated that "HTML is too complex" [7].

Having said that, comprehensibility to a "reasonably intelligent person"
(at the very least!) must be an implicit design goal for *any* document
markup language, and I'm sure it was for HTML too. In my previous post,
I criticized (X)HTML for deviating from *that* design goal with
counter-intuitive specifications for headings, <p>, <ol>, and <dl>. This
wasn't meant as a "general" attack on (X)HTML, but a warning about the
minefield of lossy abstractions that you walk into when you start
replacing typography familiar from print with semantic markup.

Did you mean to imply that understanding <q> is beyond the abilities of
a "reasonably intelligent person", or that it would be especially
difficult to develop tools that generate documents including <q>?  I
would have thought <q> to be one of the most easily comprehensible
elements in the HTML 4.01 specification. Unlike elements such as <span>,
<dl>, <object>, and <frame>, it maps directly to a concept most of us
learn in school.

It is admittedly confusing that print typography uses quotation
punctuation for styling things other than quotations from sources, as
with scare quotes. But print typography also uses italic for more
purposes than emphasis (book titles for example), which hasn't stopped
people designing WYSIWIG authoring tools which confusingly generate <em>
with an "I" button [8]. Do you think creating <em> was a mistake?

Neither MediaWiki [9] nor John Gruber's Markdown [10] include syntax
comparable to <q>, but as these languages are intermediate interfaces
between the user and the complexities of HTML, that may be a knock-on
effect of browsers' problems with <q>. DocBook includes a <quote>
element because using "an element for quotations is frequently more
convenient than entering the character entities for the quotation marks
by hand, and makes it possible for a presentation system to alter the
format of the quotation marks" [11]. Jukka "Yucca" Korpela's proposal
for "simple and intuitive" document markup language that could be easily
written by hand [12] includes markup for quotations similar to that
in HTML 4.01 [13]. 

> that compromise often includes relying on normal punctuation in the
> text, without specific markup.

There seem to be two vaguely contradictory assumptions in what you're
saying:

A) Correct punctuation is too arcane to facilitate within a markup
   language designed for "reasonably intelligent" people.

B) "Reasonably intelligent" people can be relied on to produce "normal
   punctuation in the text".

Also, your reference to "normal punctuation in the text" ignores the
fact that wherever Problem C arises from "normal punctuation", it would
require CSS or some new sort of markup to produce punctuation at the
start of each line -- unless you think people should split lines of
inner quotations to arbitrary lengths with <br>? Or by "normal
punctuation" do you actually mean not the punctuation the author is used
to, but rather whichever arbitrary punctuation is within the
capabilities of current (X)HTML?

Anyway, if you think <q> shouldn't exist at all, then you're welcome to
advocate that it should be deprecated in HTML, removed from XHTML 2.0,
and that the quote styling properties in CSS should be dropped. I don't
think that would be my first choice, but it would be an improvement over
the current situation.

But if we're *not* going to do that [14-16], I don't see why we shouldn't
aim to create a specification for <q> in (X)HTML + CSS that actually
works. I think it is a bad idea to create specifications -- even if they
require at least "reasonable intelligence" -- that include broken
components. It's confusing and damaging to confidence in the
specification as a whole.

> An attempt to introduce a purely semantic inline quote element was
> essentially ignored by users.

Given it was never effectively implemented by Internet Explorer or Jaws,
content creators [e.g. 17-19] had little or no choice in the matter.
Indeed, this argument from tag soup would be more effective against
<blockquote>, which has near universal browser support but is often
ignored or used incorrectly by authors [e.g. 20-26]!

It's worth noting that the <blockquote> and <q> element also have cite
attributes. If browsers had actually implemented those (much as many of
them implemented the title attribute for <acronym>), it might have made
the elements rather more popular.

References
----------

 [1] http://www.w3.org/TR/REC-CSS2/selector.html#first-letter

 [2] http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/HTMLConstraints.html

 [3] http://www.w3.org/TR/REC-html32

 [4] http://www.w3.org/TR/1998/REC-html40-19980424/

 [5] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt

 [6] http://www.w3.org/MarkUp/html-spec/html-spec_1.html#SEC1

 [7] http://www.3i.com/isight/pages/interview.html

 [8] http://www.fckeditor.net/demo/

 [9] http://www.mediawiki.org/wiki/Help:Formatting

[10] http://daringfireball.net/projects/markdown/syntax

[11] http://www.oasis-open.org/docbook/documentation/reference/html/quote..html

[12] http://www.cs.tut.fi/~jkorpela/data/utd.html#goals

[13] http://www.cs.tut.fi/~jkorpela/data/utd.html#q

[14] http://www.w3.org/TR/xhtml2/mod-text.html#sec_9.8.

[15] http://www.w3.org/TR/css3-content/#quotes-specify

[16] http://whatwg.org/specs/web-apps/current-work/#the-q

[17] http://forum.textpattern.com/viewtopic.php?id=12599

[18] http://wordpress.org/support/topic/37476

[19] http://historytalk.typepad.com/basic/2006/05/scholarship_on_.html

[20] http://www.quotationspage.com/quote/14577.html

[21] http://education.yahoo.com/reference/quotations/quote/22493

[22] http://www.bartleby.com/66/59/12359.html

[23] http://www.uky.edu/AS/Classics/rhetoric.html

[24] http://en.wikipedia.org/wiki/We_shall_fight_on_the_beaches

[25]
http://news.bbc.co.uk/onthisday/hi/dates/stories/june/4/newsid_3500000/3500865.stm

[26] http://www.nla.gov.au/pub/gateways/archive/38/p14a01.html

---------------------
Benjamin Hawkes-Lewis
Received on Sunday, 17 September 2006 11:32:07 UTC