[XHTML2] On the subject of handling quotations

In order to handle quotes correctly we must be able to handle the
quotation marks correctly.  Because in English at least there can be
a difference in how quote marks are handled in a block quote and in
an inline quote, XHTML 2 needs to be able to distinguish between the two
and handle quotation marks correctly.  This is fundamental to the nature
of a quotation and not just presentational.  Hence, XHTML2 should be
able to handle this even in the absence of styling.

There are three schemes I wish to consider.  The first one would be for
quotation marks to be included with the content of the quotation element
as is currently the case for the (X)HTML <blockquote> element.  The
second would be to use heuristics to determine what quote marks to use
as is currently the case for the (X)HTML <q> element.  The third would
be to use an child element like the <rp> element from the XRuby Module
to provide optional quotes that a user agent could include as warranted.
I will defer as long as possible the issue of block vs. inline, as this
is clearly secondary to the matter of quotation marks.  I will begin by
examining how all three approaches work when the content is part of a
single document and then proceed to consider how it works when
inclusion or transclusion is done.

QUOTATION MARKS AS CONTENT

With quotation marks as content, the first question we need to consider
where do they get put?

Example 1A:
<!-- XHTML2 fragment -->
<p>Text1 "<quote>Text2</quote>" Text3</p>

Example 1B:
<!-- XHTML2 fragment -->
<p>Text1 <quote>"Text2"</quote> Text3</p>

If <quote> is displayed inline, there is no display difference between
Examples 1A and 1B, but if <quote> is block, then Example 1A is
incorrect when displayed, while Example 1B will display correctly.
Thus, if quotation marks are to be handled only as content, they must
be inside the quotation element unless quote is to be restricted to
an inline only element which is clearly not acceptable.

This placement causes problems both from the viewpoint of styling and
from the viewpoint of inclusion.  Whether quote marks are used for
a quotation given a block presentation is a stylistic preference.
If the quote marks are not desired, there is no way in CSS to remove
them or alter them. With inclusion, it is impossible to shift to the
correct set of quotation marks for embedded quotes.

Example 2A:
<!-- file "example.2" -->
<quote>"Look at Spot,"</quote> said Dick.
<!-- XHTML2 fragment -->
When McGuffey wrote, <quote>"<xi:include href="example.2">"</quote>

Example 2B:
<!-- file "example.2" -->
<quote>"Look at Spot,"</quote> said Dick.
<!-- XHTML2 fragment -->
<p><xi:include href="example.2"> <quote>"See Spot run."</quote>

In Example 2A, a correct presentation should change the quotation marks
around "Look at Spot," to single quote marks, while in Example 2B they
should remain double quotation marks.  Hardcoding of one type of
quotation marks causes problems with both styling and inclusion.  As a
result, it can be concluded that XHTML should not handle quotation marks
only as content.

QUOTATION MARKS AS A SUBELEMENT

For the same reasons as given earlier, quotation marks given as
a subelement should have the subelement contained inside the quotation
element.  The advantage of using an element, is that it allows for
alternatives to be provided.

Example 3A:
<!-- file "example.3" -->
<q><qc>"<qc>'</qc></qc>Look at Spot,<qc>"<qc>'</qc></qc></q> said Dick.
<!-- XHTML2 fragment -->
When McGuffey wrote, <q>"<xi:include href="example.3">"</q>

Example 3B:
<!-- file "example.3" -->
<q><qc>"<qc>'</qc></qc>Look at Spot,<qc>"<qc>'</qc></qc></q> said Dick.
<!-- XHTML2 fragment -->
<p><xi:include href="example.3"> <q>"See Spot run."</q>

This solves the problem we had earlier as the user agent can select
the appropriate quote marks in both Examples 3A and 3B.  The usage of
a subelement also solves the quotation mark problem for block
quotations. As it enables alternate quotation marks to be specified
by styling simply by making only the desired marks visible.

All is not rosy tho.  There remain two potential problems.

One problem is how to handle included quotes in multilingual documents
correctly.  The problem arises from the fact the style of the quote marks
for inline quotes is derived not from the language of the text that
is being quoted, but by the language that is used to frame the quote.

Hopefully you'll let me get away with only describing the example I have
in mind.  My knowledge of XPointer and XPath is only sufficient for me
to say it can be done, not give the exact code needed to do it.

Imagine a source document in French that contains a quote in Latin.
Now another document, in English, transcludes the quote.

The problem comes in what to transclude.  If the quotation element is
transcluded, then we should get the French double angle bracket quote
characters coming along for the ride, which is not desirable.
If only the content inside the quote is transcluded, then we lose
access to the attributes of the quotation element such as xml:lang.
However, this is not as dire as it may seem at first glance.
Technically, the proper way for the French document to handle this is:
<q><qc/><span xml:lang="la">quoted text</span><qc/></q>
not:
<q xml:lang="la"><qc/>quoted text<qc/></q>
as the quote marks are French and not Latin.

However an XML Include implementation is only required to support
the element() scheme not the more complicated xpointer() scheme.
To ensure proper handling of quotes without imposing a higher level
of required competence for XML Include when used with XHTML it would
be useful to require the quoted text if this content model is followed
to be enclosed inside another element.  This element need not be
a special-purpose element tho; simply excluding PCDATA from the content
model of the quotation element would be sufficient.  Limiting the
quoted text to a single subelement would also help transclusions, but is
not essential and this model is getting cluttered with elements already.

Indeed, the other problem is that we're having to add a lot of elements
in an XHTML document, just to support quote marks correctly.
Instead of a single element, this scheme needs six elements just to
handle a simple English quote correctly in all conceivable
circumstances. (The quotation element, a pair of nested quote mark
elements and a quotation text element.)  This model works, but it is
very kludgy.

QUOTATION MARKS BY HEURISTIC

This would mean going back to the having the user agent provide the
quoting characters as specified in HTML 4/XHTML 1 for the <q> element.
There are two main problems with this. The first is that a commonly used
user agent for HTML 4 has never supported adding the quotes, nor the
associated CSS quotes property.  Thus it may prove problematic in
getting even a basic version of this widelysupported.  The second is
that the proper quote marks depend heavily upon the language, it is
not reasonable to expect a user agent to know the quoting convention
of all of the hundreds of languages specifiable by RFC 3066.

Automatic quote mark adding should be given a minimal level of support
that is required for conformance while encouraging the user agent
to improve the performance for languages it knows .  An obvious minimal
level of support would be equivalent to the following CSS style rules:
* { quotes: "\0022" "\0022" "'" "'" }
q:before { content: open-quote }
q:after { content: close-quote }
and then have any more intricate level of support come from styling.
(Whether from CSS or some other source.)

Then a user agent could provide the following CSS or its equivalent:
:lang(en)>*, :root:lang(en) { quotes: "\201C" "\201D" "\2018" "\2019"}
:lang(fr)>*, :root:lang(fr) { quotes: "\00AB" "\00BB" "\2039" "\203A"}
etc.
but it wouldn't be required to do so.

And of course if a user agent supports styling, the author would be able
to customize the quote characters into what he feels is most pleasant to
the eye or if he is using a language which he expects user agents may
not support themselves

A FINAL COMMENT

There is one last question that needs to be considered, the handling of
other punctuation adjacent to a quote and how it interacts with the
quotation marks of the quote.  The general rule seems to be: place the
additional marks outside the quote, but for American English, it is
common to place an adjacent comma or period inside the final quote
mark instead of outside.  Unfortunately I can't think of a very good
solution of how to do it.  The best I can think of using either
of the two methods outlined above would be to use either of:

Example 4A:
<q><qc>"<qc>'</qc></qc>Text<qc>."<qc>.'</qc></qc></q>

Example 4B:
<style>
q.ip {quotes: "\201C" ".\201D" "\2018" ".\2019"}
span.ip {display: none}
</style>
...
<q class="ip">Text</q><span class="ip">.</span>

Since the difference between placing such punctuation inside
or outside is purely presentation, I don't mind relying ypon styling
in this case.

IN SUMMARY
To handle quotes correctly will require either a very complicated
and mostly redundant structure for the quotation element that hand coders
will absolutely despise or reverting back to what HTML 4 calls for, i.e.,
for the user agent to add the quote marks.


Ernest Cline
ernestcline@mindspring.com

Received on Friday, 12 December 2003 19:16:11 UTC