- From: Benjamin Hawkes-Lewis <benjaminhawkeslewis@hotmail.com>
- Date: Fri, 15 Sep 2006 15:11:10 +0100
- To: www-style@w3c.org
Summary ------- How can CSS replicate arcane typographic rules that depend on punctuation around and within quotations? Introduction ------------ Proper typography for quotations helps clarify a document for both sighted users and screen readers. To ensure effective machine processing, voice browsing, and scholarly disambiguation, the meaning of quotation markup must not be corrupted by irregular and unpredictable punctuation arising from defects in (X)HTML and CSS. This ideal of semantic markup poses a major problem for web authors, since most are habituated to context and typography providing textual semantics. To the extent that (X)HTML has attempted to replace such traditional markers, it has often fallen short of print's flexibility -- from the limitation of headings to six levels, to the <P> "paragraph" elements that could not contain quotations that themselves include paragraphs, to the <OL> which pretended numbering was purely a matter of style, to the markup of dialogue with "definition lists". In reality, (X)HTML remains deeply conservative when it comes to replacing or disambiguating context and typography with semantic markup -- far more conservative than alternative markup such as TEI [1]. One area that has been largely untouched by (X)HTML is punctuation. There is no markup for exclamations (!), questions (?), rhetorical questions (? again), statements (.) clauses (; or ,), or parenthetical remarks ( ()). By contrast, TEI can distinguish explicitly between a stop (.) used to end an abbreviation (etc.) and a stop used to end a sentence like this. It is somehow typical that the one substantial assault by (X)HTML on the Babel of punctuation should be in arguably the most complex area of all: the typography of quotations. In theory, at least, this is great. Much existing print styling does a poor job at making it clear when extraneous material has been inserted into the source for editorial or stylistic reasons, risking distortion of the quoted author's meaning. This is a serious problem since it is not always possible for third-party readers to consult the original source, even when motivated to do so. However, the typography of quotations varies wildly between languages, dialects, publishers, authors, over time and space, and within individual texts. It thus poses a substantial challenge to automation. <Q> and (X)HTML --------------- (Those familiar with the history of <Q> may wish to skip this section.) The <Q> element was first introduced in the HTML 3.0 draft [2]: > The <Q> element is used for a short quotation. It is typically shown > enclosed in quotation marks appropriate to the language context. For > English these would be matching double or single quotation marks, > alternating for nested quotes. The language context is set by the LANG > attribute. Earlier "logical styles" transformed gracefully when presented as plain text by user agents that did not implement the associated formatting. But this was not the case with the <Q>, <INS>, and <DEL> elements introduced in the HTML 3.0 draft. The warning of the HTML 1.0 draft [3] that "information providers should attempt not to rely on them as essential to the information content" became problematic. In the end, the <Q> element did not make it into HTML 3.2 [4] and had to wait until HTML 4.0 [5] to appear in an actual standard. The HTML 4.01 specification [6] states that: > Visual user agents must ensure that the content of the <Q> element is > rendered with delimiting quotation marks. Authors should not put > quotation marks at the beginning and end of the content of a <Q> > element. User agents should render quotation marks in a > language-sensitive manner (see the lang attribute). Many languages > adopt different quotation styles for outer and inner (nested) > quotations, which should be respected by user-agents. It continues: > We recommend that style sheet implementations provide a mechanism for > inserting quotation marks before and after a quotation delimited by > <BLOCKQUOTE> in a manner appropriate to the current language context > and the degree of nesting of quotations." User agent support for default culturally-aware formatting of <Q> is notoriously poor -- even with cutting-edge browsers. Internet Explorer 7 RC and ELinks 0.10.6 ignore <Q> entirely, Amaya 9.5.1, the current WebKit build, Konqueror 3.52, and Opera 9.01 follow RFC 2070 [7] by adding straight double quotation marks (U+0022 or ") regardless of quotation depth or language, and Lynx 2.8.5 and Firefox Bon Echo render nested quotation marks correctly for US English but ignore language information. Jaws 7.10 neither reads punctuation added to <Q> by the browser or CSS, nor distinguishes quotations in any other fashion. Partly as a consequence, the current XHTML 2.0 draft [8] mandates a less optimistic user agent treatment of <Q>: > Visual user agents must not by default add delimiting quotation marks > (as was the case for the <q> element in earlier versions of XHTML and > HTML). It is the responsibility of the document author to add any > required quotation marks, either directly in the text, or via a style > sheet. The only common idea between the two specifications is that in some circumstances CSS can specify the delimiting quotation marks. And indeed, the CSS 2 specification includes a crude quotes property and a lang selector to do just that [9]. Problems with the current specifications ---------------------------------------- The practical problems of this approach has frequently provoked discussion [10-14]. Unfortunately, the existing CSS standards are (or appear to be) ill-equipped to express real-world variation in quotation punctuation. *This is true even for English*, whose idiosyncrasies have tended to become embedded in web technology. I've identified three clear areas, and one more debatable case, where the current standards are deficient: Problem A: interaction with punctuation immediately outside <q> --------------------------------------------------------------- Authors cannot specify whether an end quotation marker should be placed *after* commas (U+200C or ,) and full stops (U+002E or .) following a <q>, as is typical in US English, or *before* such punctuation, as is typical of British English [15]. For example, let's say the author must quote from Marx's Communist Manifesto: > Owing to the extensive use of machinery, and to the division of > labour, the work of the proletarians has lost all individual > character, and, consequently, all charm for the workman. He becomes an > appendage of the machine, and it is only the most simple, most > monotonous, and most easily acquired knack, that is required of him. In typical US English: > According to Marx, the modern worker “becomes an appendage of the > machine.” In typical British English, the same full stop is placed after the quotation: > According to Marx, the modern worker ‘becomes an appendage of the > machine’. Note that without CSS content selection support, rendering the US English version requires the author to either corrupt the markup by including punctuation from outside the quotation within <q>, like so: > According to Marx, the modern worker <q>becomes an appendage of the > machine.</q> -- or to hack around the mess with <span>, like so: > According to Marx, the modern worker <q>becomes an appendage of the > machine</q><span class="punctuation-after-q">.</span> Problem B: interaction with punctuation immediately within <q> --------------------------------------------------------------- Authors cannot specify whether end quotation marks should be conflated with preceding end punctuation. In French, only one quotation marker is put at the end of two quotations ending simultaneously [16]. In Russian, if quoted text happens to end with a double reversed comma quotation mark (U+201F or ‟) or a right-pointing guillemet (U+00BB or »), this should not be repeated to mark the end of the quotation itself [17]. The Oxford Guide to Style gives the following example for Russian: > Он ответил: «Я приехал вчера на пароходе «Казань» The interior guillemets indicate the name of a ship, Казань. No additional guillemet is necessary to signify the end of the quotation. This too could only be catered for with a hack-ish <span>: > Он ответил: <q>Я приехал вчера на пароходе «Казань<span > "guillemet">»</span></q> Problem C: punctuation for wrapped lines of <q> ------------------------------------------------ Authors cannot specify punctuation for the start and end of each wrapped line of a quotation. With French, for example, in a nested quotation, a left pointing guillemet (U+00AB or «) followed by a thin space (U+2009) must be placed at the commencement of each line [18-19]. The Style Guide gives the following example: > On lit dans le Radical: « Une malheureuse erreur a été > commise par un de nos artistes du boulevard. Ayant à dire: > « Mademoiselle, je ne veux qu’un mot de vous! », il a fait > entendre ces paroles: « Mademoiselle, je ne veux qu’un > « mou de veau! » Note the guillemet at the beginning of the last line! Note also the final guillemet ends *two* quotations at once, as discussed in Problem B above. Problem D: rendering lengthy <q> elements as blocks =================================================== This is the more debatable case. In some typographic styles, a long quotation is displayed for clarity's sake as a block even if it doesn't include a paragraph. The Oxford Style Guide advises [20]: > Do not break off quotations in text for display unless the matter > exceeds sixty words in prose, or about five lines when set in normal > type and measure. If the quoted extract comprises fewer than about > sixty words, it is run on in the text matter, with quotation marks at > the beginning and end. > > Short quotations may be broken off if the context demands it, for > example if the author sets them out as examples or specimens ... It is not entirely clear how such cases should be treated in (X)HTML. Most would probably say to use a <BLOCKQUOTE>, but in fact things are rather more complicated than that. There is a rather unfortunate tendency to treat <p> as though it were simply a block like <div>, rather than an authorial division of thought and argument. It is arguably bad practice for a <BLOCKQUOTE> to mark up paragraphs that do not exist, as in the actual example given by the HTML 4.01 specification [21]: > <BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html"> > <P>They went in single file, running like hounds on a strong scent, > and an eager light was in their eyes. Nearly due west the broad swath > of the marching Orcs tramped its ugly slot; the sweet grass of Rohan > had been bruised and blackened as they passed.</P> > </BLOCKQUOTE> If you consult the original text [22], you'll find that the paragraph does not, in fact, finish where as this markup implies, but continues: "Presently Aragorn gave a cry and turned aside." I suppose more accurate markup might end: > Nearly due west the broad swath of the marching Orcs tramped its ugly > slot; the sweet grass of Rohan had been bruised and blackened as they > passed.<INS title="editorial omission">…</INS></P> > </BLOCKQUOTE> But consider a paragraph just a little earlier in the same text: > At the bottom they came with a sudden strangeness on the grass of > Rohan. It swelled like a green sea up to the very foot of the Emyn > Muil. The falling stream vanished into a deep growth of cresses and > water-plants, and they could hear it tinkling away in green tunnels, > down long gentle slopes towards the fens of Entwash Vale far away. > They seemed to have left winter clinging to the hills behind. Here the > air was softer and warmer, and faintly scented, as if spring was > already stirring and the sap was flowing again in herb and leaf. > Legolas took a deep breath, like one that drinks a great draught after > long thirst in barren places. If we were to extract the text from "It swelled" to "herb and leaf", we would face a hard decision between (at least) three ways of marking up the quotation unambiguously in HTML 4.01: > <P>Tolkien evokes the Three Hunters' arrival to the grass plains of > Rohan with characteristic eloquence: <Q>It swelled ... herb and > leaf.</Q> This upbeat landscape description begins to set the scene > for the reappearance of their long-lost wizard.</P> -- or -- > <DIV CLASS="paragraph">Tolkien evokes the Three Hunters' arrival to > the grass plains of Rohan with characteristic eloquence:> > <BLOCKQUOTE><DIV>It swelled ... herb and leaf.</DIV></BLOCKQUOTE> This > upbeat landscape description begins to set the scene for the > reappearance of their long-lost wizard.</DIV> -- or -- > <DIV CLASS="paragraph">Tolkien evokes the Three Hunters' arrival to > the grass plains of Rohan with characteristic eloquence: > <BLOCKQUOTE><P><INS title="editorial omission">…</INS>It swelled ... > herb and leaf.<INS title="editorial omission">…</INS></P></BLOCKQUOTE> > This upbeat landscape description begins to set the scene for the > reappearance of their long-lost wizard.</DIV> Such markup maps uncomfortably onto typographic styling of over-lengthy quotations, since it hardcodes the presentational decision about quotation length into the (X)HTML. It would perhaps be better to be able to blockify <Q> with CSS if its content exceeds a given length in words, lines, or characters. Solutions? ---------- I think CSS 3 drafts include a solution for Problem C, as the ::line-marker pseudo-element [23] can be used together with the quotes property [24], like so: > :lang(fr) > q::line-marker { quotes: '' '' '«\2009' '' } Would this work? If it would, it might be worth including this or something similar as an example, not least to maximize its chances of implementation. But do the new CSS specifications offer any solutions to Problems A and B? If not, shouldn't they? If for some reason they can't (for example, if selection by and interpolation into text content is impossible), don't we need to abandon the whole idea of using CSS to add quotation punctuation and ensure that (X)HTML does not rely on it? I note that the section for content selectors [25] remains ominously "intentionally left blank", even though that draft is in the Last Call stage [26]. How does the alternative of adding the quotation punctuation directly to the (X)HTML measure up to the challenges? Not very well. In the case of HTML 4.01, it would prevent authors using the <Q> element. It is hard to see how Problem 3 could be catered for within (X)HTML of any variety without either hard-coding line breaks within quotations or introducing new attributes for the <Q> element to specify such punctuation. References ---------- [1] http://www.tei-c.org/ [2] http://www.w3.org/MarkUp/html3/logical.html [3] http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt [4] http://www.w3.org/TR/REC-html32 [5] http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.2..2 [6] http://www.w3.org/TR/html4/struct/text.html#h-9.2.2.1 [7] http://www.ietf.org/rfc/rfc2070.txt [8] http://www.w3.org/TR/xhtml2/mod-text.html#sec_9.8. [9] http://www.w3.org/TR/REC-CSS2/generate.html#quotes [10] "Why the quote element doesn't add quotes by default" thread: http://lists.w3.org/Archives/Public/www-html/2004Aug/0009.html [11] "[CSS21] Features at risk: quote in CSS or Q in HTML" thread: http://lists.w3.org/Archives/Public/www-style/2005Jul/0357.html [12] "<q> element, XHTML2, and CSS" thread: http://lists.w3.org/Archives/Public/public-i18n-core/2005JulSep/0018..html [13] "the q element" thread: http://lists.w3.org/Archives/Public/w3c-wai-gl/2006JulSep/0124.html [14] http://diveintomark.org/archives/2002/05/04/the_q_tag [15] The Oxford Guide to Style (2002), 238. [16] Ibid. 277. [17] Ibid. 341. [18] Ibid. 277. [19] Jean Méron, En question: la grammaire typographique: les guillemets (1999), http://listetypo.free.fr/meron/new/Guilles.pdf, 37-38. [20] Oxford Guide to Style, 193-4. [21] http://www.w3.org/TR/html4/struct/text.html#h-9.2.2 [22] J. R. R. Tolkien, The Two Towers, Chapter 2. [23] http://www.w3.org/TR/css3-content/#line-markers [24] http://www.w3.org/TR/css3-content/#quotes-specify [25] http://www.w3.org/TR/css3-selectors/#content-selectors [26] http://www.w3.org/Style/CSS/current-work#selectors --------------------- Benjamin Hawkes-Lewis
Received on Friday, 15 September 2006 16:19:33 UTC