CSS and quotation typography

Summary
-------

How can CSS replicate arcane typographic rules that depend on
punctuation around and within quotations?

Introduction
------------

Proper typography for quotations helps clarify a document for both
sighted users and screen readers. To ensure effective machine
processing, voice browsing, and scholarly disambiguation, the meaning of
quotation markup must not be corrupted by irregular and unpredictable
punctuation arising from defects in (X)HTML and CSS.

This ideal of semantic markup poses a major problem for web authors,
since most are habituated to context and typography providing textual
semantics. To the extent that (X)HTML has attempted to replace such
traditional markers, it has often fallen short of print's flexibility --
from the limitation of headings to six levels, to the <P> "paragraph"
elements that could not contain quotations that themselves include
paragraphs, to the <OL> which pretended numbering was purely a matter of
style, to the markup of dialogue with "definition lists".

In reality, (X)HTML remains deeply conservative when it comes to
replacing or disambiguating context and typography with semantic markup
-- far more conservative than alternative markup such as TEI [1]. One
area that has been largely untouched by (X)HTML is punctuation. There is
no markup for exclamations (!), questions (?), rhetorical questions 
(? again), statements (.) clauses (; or ,), or parenthetical remarks
( ()). By contrast, TEI can distinguish explicitly between a stop (.)
used to end an abbreviation (etc.) and a stop used to end a sentence
like this.

It is somehow typical that the one substantial assault by (X)HTML on the
Babel of punctuation should be in arguably the most complex area of all:
the typography of quotations. In theory, at least, this is great. Much
existing print styling does a poor job at making it clear when
extraneous material has been inserted into the source for editorial or
stylistic reasons, risking distortion of the quoted author's meaning.
This is a serious problem since it is not always possible for
third-party readers to consult the original source, even when motivated
to do so.

However, the typography of quotations varies wildly between languages,
dialects, publishers, authors, over time and space, and within
individual texts. It thus poses a substantial challenge to automation.

<Q> and (X)HTML
---------------

(Those familiar with the history of <Q> may wish to skip this section.)

The <Q> element was first introduced in the HTML 3.0 draft
[2]:

> The <Q> element is used for a short quotation. It is typically shown
> enclosed in quotation marks appropriate to the language context. For
> English these would be matching double or single quotation marks,
> alternating for nested quotes. The language context is set by the LANG
> attribute.

Earlier "logical styles" transformed gracefully when presented as plain
text by user agents that did not implement the associated formatting.
But this was not the case with the <Q>, <INS>, and <DEL> elements
introduced in the HTML 3.0 draft. The warning of the HTML 1.0 draft [3]
that "information providers should attempt not to rely on them as
essential to the information content" became problematic.

In the end, the <Q> element did not make it into HTML 3.2 [4] and had to
wait until HTML 4.0 [5] to appear in an actual standard. The HTML 4.01
specification [6] states that:

> Visual user agents must ensure that the content of the <Q> element is
> rendered with delimiting quotation marks. Authors should not put
> quotation marks at the beginning and end of the content of a <Q>
> element. User agents should render quotation marks in a
> language-sensitive manner (see the lang attribute). Many languages
> adopt different quotation styles for outer and inner (nested)
> quotations, which should be respected by user-agents.

It continues:

> We recommend that style sheet implementations provide a mechanism for
> inserting quotation marks before and after a quotation delimited by
> <BLOCKQUOTE> in a manner appropriate to the current language context
> and the degree of nesting of quotations."

User agent support for default culturally-aware formatting of <Q> is
notoriously poor -- even with cutting-edge browsers. Internet Explorer 7
RC and ELinks 0.10.6 ignore <Q> entirely, Amaya 9.5.1, the current
WebKit build, Konqueror 3.52, and Opera 9.01 follow RFC 2070 [7] by
adding straight double quotation marks (U+0022 or ") regardless of
quotation depth or language, and Lynx 2.8.5 and Firefox Bon Echo render
nested quotation marks correctly for US English but ignore language
information. Jaws 7.10 neither reads punctuation added to <Q> by the
browser or CSS, nor distinguishes quotations in any other fashion.

Partly as a consequence, the current XHTML 2.0 draft [8] mandates a less
optimistic user agent treatment of <Q>:

> Visual user agents must not by default add delimiting quotation marks
> (as was the case for the <q> element in earlier versions of XHTML and
> HTML). It is the responsibility of the document author to add any
> required quotation marks, either directly in the text, or via a style
> sheet.

The only common idea between the two specifications is that in some
circumstances CSS can specify the delimiting quotation marks. And
indeed, the CSS 2 specification includes a crude quotes property and a
lang selector to do just that [9].

Problems with the current specifications
----------------------------------------

The practical problems of this approach has frequently provoked
discussion [10-14]. Unfortunately, the existing CSS standards are (or
appear to be) ill-equipped to express real-world variation in quotation
punctuation. *This is true even for English*, whose idiosyncrasies have
tended to become embedded in web technology. I've identified three clear
areas, and one more debatable case, where the current standards are
deficient:

Problem A: interaction with punctuation immediately outside <q>
---------------------------------------------------------------

Authors cannot specify whether an end quotation marker should be placed
*after* commas (U+200C or ,) and full stops (U+002E or .) following a
<q>, as is typical in US English, or *before* such punctuation, as is
typical of British English [15].

For example, let's say the author must quote from Marx's Communist
Manifesto:

> Owing to the extensive use of machinery, and to the division of
> labour, the work of the proletarians has lost all individual
> character, and, consequently, all charm for the workman. He becomes an
> appendage of the machine, and it is only the most simple, most
> monotonous, and most easily acquired knack, that is required of him.

In typical US English:

> According to Marx, the modern worker “becomes an appendage of the
> machine.”

In typical British English, the same full stop is placed after the
quotation:

> According to Marx, the modern worker ‘becomes an appendage of the
> machine’.

Note that without CSS content selection support, rendering the US
English version requires the author to either corrupt the markup by
including punctuation from outside the quotation within <q>, like so:

> According to Marx, the modern worker <q>becomes an appendage of the
> machine.</q>

-- or to hack around the mess with <span>, like so:

> According to Marx, the modern worker <q>becomes an appendage of the
> machine</q><span class="punctuation-after-q">.</span>

Problem B: interaction with punctuation immediately within <q>
---------------------------------------------------------------

Authors cannot specify whether end quotation marks should be conflated
with preceding end punctuation. In French, only one quotation marker is
put at the end of two quotations ending simultaneously [16]. In Russian,
if quoted text happens to end with a double reversed comma quotation
mark (U+201F or ‟) or a right-pointing guillemet (U+00BB or »), this
should not be repeated to mark the end of the quotation itself [17].
The Oxford Guide to Style gives the following example for Russian:

> Он ответил: «Я приехал вчера на пароходе «Казань»

The interior guillemets indicate the name of a ship, Казань. No
additional guillemet is necessary to signify the end of the quotation.
This too could only be catered for with a hack-ish <span>:

> Он ответил: <q>Я  приехал  вчера на пароходе «Казань<span
> "guillemet">»</span></q>

Problem C: punctuation for wrapped lines of <q>
------------------------------------------------

Authors cannot specify punctuation for the start and end of each wrapped
line of a quotation. With French, for example, in a nested quotation, a
left pointing guillemet (U+00AB or «) followed by a thin space (U+2009)
must be placed at the commencement of each line [18-19]. The Style Guide
gives the following example:

> On lit dans le Radical: « Une malheureuse erreur a été
> commise par un de nos artistes du boulevard. Ayant à dire:
> « Mademoiselle, je ne veux qu’un mot de vous! », il a fait
> entendre ces paroles: « Mademoiselle, je ne veux qu’un
> « mou de veau! »  

Note the guillemet at the beginning of the last line! Note also the
final guillemet ends *two* quotations at once, as discussed in
Problem B above.

Problem D: rendering lengthy <q> elements as blocks
===================================================

This is the more debatable case.

In some typographic styles, a long quotation is displayed for
clarity's sake as a block even if it doesn't include a paragraph. The
Oxford Style Guide advises [20]:

> Do not break off quotations in text for display unless the matter
> exceeds sixty words in prose, or about five lines when set in normal
> type and measure. If the quoted extract comprises fewer than about
> sixty words, it is run on in the text matter, with quotation marks at
> the beginning and end.
>
> Short quotations may be broken off if the context demands it, for
> example if the author sets them out as examples or specimens ...

It is not entirely clear how such cases should be treated in (X)HTML.
Most would probably say to use a <BLOCKQUOTE>, but in fact things are
rather more complicated than that. There is a rather unfortunate
tendency to treat <p> as though it were simply a block like <div>,
rather than an authorial division of thought and argument. It is
arguably bad practice for a <BLOCKQUOTE> to mark up paragraphs that do
not exist, as in the actual example given by the HTML 4.01 specification
[21]:

> <BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html">
> <P>They went in single file, running like hounds on a strong scent,
> and an eager light was in their eyes. Nearly due west the broad swath
> of the marching Orcs tramped its ugly slot; the sweet grass of Rohan
> had been bruised and blackened as they passed.</P>
> </BLOCKQUOTE>

If you consult the original text [22], you'll find that the paragraph
does not, in fact, finish where as this markup implies, but continues:
"Presently Aragorn gave a cry and turned aside." I suppose more accurate
markup might end:

> Nearly due west the broad swath of the marching Orcs tramped its ugly
> slot; the sweet grass of Rohan had been bruised and blackened as they
> passed.<INS title="editorial omission">…</INS></P>
> </BLOCKQUOTE>

But consider a paragraph just a little earlier in the same text:

>     At the bottom they came with a sudden strangeness on the grass of
> Rohan. It swelled like a green sea up to the very foot of the Emyn
> Muil. The falling stream vanished into a deep growth of cresses and
> water-plants, and they could hear it tinkling away in green tunnels,
> down long gentle slopes towards the fens of Entwash Vale far away.
> They seemed to have left winter clinging to the hills behind. Here the
> air was softer and warmer, and faintly scented, as if spring was
> already stirring and the sap was flowing again in herb and leaf.
> Legolas took a deep breath, like one that drinks a great draught after
> long thirst in barren places.

If we were to extract the text from "It swelled" to "herb and leaf", we
would face a hard decision between (at least) three ways of marking up
the quotation unambiguously in HTML 4.01:

> <P>Tolkien evokes the Three Hunters' arrival to the grass plains of
> Rohan with characteristic eloquence: <Q>It swelled ... herb and
> leaf.</Q> This upbeat landscape description begins to set the scene
> for the reappearance of their long-lost wizard.</P>

-- or --

> <DIV CLASS="paragraph">Tolkien evokes the Three Hunters' arrival to
> the grass plains of Rohan with characteristic eloquence:>
> <BLOCKQUOTE><DIV>It swelled ... herb and leaf.</DIV></BLOCKQUOTE> This
> upbeat landscape description begins to set the scene for the
> reappearance of their long-lost wizard.</DIV>

-- or --

> <DIV CLASS="paragraph">Tolkien evokes the Three Hunters' arrival to
> the grass plains of Rohan with characteristic eloquence:
> <BLOCKQUOTE><P><INS title="editorial omission">…</INS>It swelled ...
> herb and leaf.<INS title="editorial omission">…</INS></P></BLOCKQUOTE>
> This upbeat landscape description begins to set the scene for the
> reappearance of their long-lost wizard.</DIV>

Such markup maps uncomfortably onto typographic styling of over-lengthy
quotations, since it hardcodes the presentational decision about
quotation length into the (X)HTML. It would perhaps be better to be able
to blockify <Q> with CSS if its content exceeds a given length in words,
lines, or characters.

Solutions?
----------

I think CSS 3 drafts include a solution for Problem C, as the
::line-marker pseudo-element [23] can be used together with the
quotes property [24], like so:

> :lang(fr) > q::line-marker { quotes: '' '' '«\2009' '' }

Would this work? If it would, it might be worth including this or
something similar as an example, not least to maximize its chances of
implementation.

But do the new CSS specifications offer any solutions to Problems A and
B? If not, shouldn't they? If for some reason they can't (for example,
if selection by and interpolation into text content is impossible),
don't we need to abandon the whole idea of using CSS to add quotation
punctuation and ensure that (X)HTML does not rely on it? I note that the
section for content selectors [25] remains ominously "intentionally left
blank", even though that draft is in the Last Call stage [26].

How does the alternative of adding the quotation punctuation directly to
the (X)HTML measure up to the challenges? Not very well. In the case of
HTML 4.01, it would prevent authors using the <Q> element. It is hard
to see how Problem 3 could be catered for within (X)HTML of any variety
without either hard-coding line breaks within quotations or introducing
new attributes for the <Q> element to specify such punctuation.

References
----------

 [1] http://www.tei-c.org/

 [2] http://www.w3.org/MarkUp/html3/logical.html

 [3] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt 

 [4] http://www.w3.org/TR/REC-html32

 [5] http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.2..2

 [6] http://www.w3.org/TR/html4/struct/text.html#h-9.2.2.1

 [7] http://www.ietf.org/rfc/rfc2070.txt

 [8] http://www.w3.org/TR/xhtml2/mod-text.html#sec_9.8.
 
 [9] http://www.w3.org/TR/REC-CSS2/generate.html#quotes

[10] "Why the quote element doesn't add quotes by default" thread:
     http://lists.w3.org/Archives/Public/www-html/2004Aug/0009.html 

[11] "[CSS21] Features at risk: quote in CSS or Q in HTML" thread:
     http://lists.w3.org/Archives/Public/www-style/2005Jul/0357.html

[12] "<q> element, XHTML2, and CSS" thread:
     http://lists.w3.org/Archives/Public/public-i18n-core/2005JulSep/0018..html

[13] "the q element" thread:
     http://lists.w3.org/Archives/Public/w3c-wai-gl/2006JulSep/0124.html

[14] http://diveintomark.org/archives/2002/05/04/the_q_tag

[15] The Oxford Guide to Style (2002), 238.

[16] Ibid. 277.

[17] Ibid. 341.

[18] Ibid. 277.

[19] Jean Méron, En question: la grammaire typographique: les guillemets
     (1999), http://listetypo.free.fr/meron/new/Guilles.pdf, 37-38.

[20] Oxford Guide to Style, 193-4.

[21] http://www.w3.org/TR/html4/struct/text.html#h-9.2.2

[22] J. R. R. Tolkien, The Two Towers, Chapter 2.

[23] http://www.w3.org/TR/css3-content/#line-markers

[24] http://www.w3.org/TR/css3-content/#quotes-specify

[25] http://www.w3.org/TR/css3-selectors/#content-selectors

[26] http://www.w3.org/Style/CSS/current-work#selectors

---------------------
Benjamin Hawkes-Lewis

Received on Friday, 15 September 2006 16:19:33 UTC