Re: Why the quote element doesn't add quotes by default (was Re: http://www.w3.org/TR/2004/WD-xhtml2-20040722/xhtml2-diff.html)

On Tue, 3 Aug 2004, Masayasu Ishikawa wrote:

> Short summary:
>
>   The q element in earlier version of (X)HTML placed the burden of adding
>   "proper" quotation marks on the wrong side.  The quote element in
>   XHTML 2.0 shifts the burden of adding "proper" quotation marks from
>   user agents to authors, who know what are "proper" quotation marks
>   for their documents.

Short summary of my comments:

  There's no reason not to include it into XHTML 2.0, with essentially
  the same definition as in HTML 4.01, if XHTML 2.0 is not designed to be
  compatible with "old" user agents (such as IE 6 or current indexing robots).

> The basic problem is that the q element requires arcane knowledge of
> language-sensitive quotation marks, and no user agent would be able to
> capture all the possible combination of all languages around the world.

No, I don't think that's the basic problem. The basic problem is that
<q> was designed not to degrade gracefully. Browsers that do not recognize
or do not support <q> markup now render just the context, omitting the
potentially vital information that it's a quotation. Markup like
<q><qm>"</qm>To be or not to be, that is the question<qm>"</qm>.</q>
would degrade gracefully. Here qm elements would contain quotation marks
that are to be omitted by user agents that support the q element.
(Cf. to ideas of Ruby markup.)

It's a lot of work to support the quotation mark rules for all the
languages of the world. But it would be reasonable to support just
the hundred or so most used languages and use default rendering
(with Ascii quotation marks) for the rest. Besides, I think we can
realistically hope that the Common Locale Data Project conducted by the
Unicode Consortium will produce, as part of the locale data repository,
information about such conventions in a manner that can be directly fed
into software. (Quotation mark usage rules are not included
into the current scenario, but it is very natural to expect that they
will be addressed. Those rules are essential to text processing programs
for example.)

After all, if we don't think that even relatively trivial things like
punctuation character variation can be handled, then what's the point of
telling authors to use language markup (for all changes in language too!)?
The W3C documents about language markup promise quite a many cool things,
like language-sensitive text formatting, spelling checks, etc. If the
reality is that even selecting quotation marks is way above the state of
the art, then please give me a break.

> This situation effectively shows that the "minimal" level of support
> for the q element is certainly not difficult, but very few implementors
> dare to go beyond that level.

But for XHTML 2.0, making the minimal support mandatory, you will have a
new start. If some browser claims XHTML 2.0 conformance and accepts
XHTML 2.0 documents for rendering, yet fails to do such an extremely
simple thing as putting Ascii quotation marks around <q> context
if it can't do any better, then it's hopeless anyway. In fact, if you
added optional <qm> markup (though it would admittedly be odd - a
compatibility element in a language designed to be incompatible),
then authors could even take precautions against such misbehavior.

> This situation rather discourages the use of the q element, e.g. even if
> a French author does know what the French quotation marks should be,
> the specification says that authors should not put quotation marks
> by themselves around q, and most browsers just end up with ", which
> is not at all satisfactory.

The French rules are actually a good reason _for_ <q> markup. You can
enter guillemets as characters (in any way you like), in current HTML and
in the future, but how will you tell visual browsers to leave a
fine (thin) space between the text and the guillemets? If you use
the THIN SPACE character, you face the problem that it allows a line
break. If you use NO-BREAK SPACE, you get it typographically wrong
(too wide). Using <q> markup along with language information _allows_
a browser to create the best possible rendering. (Admittedly this raises
the question whether we need markup for questions and exclamations as
well.)

> Not using appropriate
> markup for quotations is worse than not having appropriate quotation marks.

Is it? What would quotation markup be used for, then? I can imagine quite
a many _possible_ uses (like search engines specifically searching for
occurrences of words inside quotations), but realistically, what do you
expect, during this century?

> Another difficult aspect of handling language-sensitive quotation marks
> is that existing practice vary whether quotation marks are considered
> as part of the content of the parent of the quoted text, or that of
> the quoted text itself.

Indeed. And there is variation within each language, too. And you might
have difficulties in finding out what the official rules, and the
de facto rules, for nested quotations are. But such things need to be
addressed anyway in the world of computing. A good-quality text processing
program needs to know how to change Ascii quotation marks or apostrophes
into something more suitable, by language-sensitive rules. Building such
things into a browser means work but not rocket science.

> So we concluded that it would be reasonable to place the burden of
> adding "proper" quotation marks on authors rather than implementors.

That's where the burden has been, and very few authors take the burden.
It has always been possible, for example, to use guillemets (which
belong to ISO 8859-1) in HTML documents in languages and orthographic
styles that use guillemets for quotations. But it is rare to see them used
in such situations. Placing the burden on authors alone means, in effect,
saying that proper use of quotation marks isn't generally relevant.
(I use the word "alone", since the <q> markup approach naturally
expects authors to use that markup for quotations.)

> The I18N WG recommended that using styling would be a preferable way
> and encouraged CSS implementors to support relevant feature more widely
> and consistently.

But how could you do such styling in CSS when your markup is
<quote>"To be or not to be, that is the question".</quote>
and you have no way in CSS to tell the browser not to render some
characters in the content? If you use the correct English quotation marks
in the content, you won't need any CSS styling. If you omit the quotation
marks from the content and add them in CSS, then you are relying on CSS
in conveying essential semantic information, and, besides, if authors
really did such things, millions of people would write the same CSS rules.
Well, not the same actually - authors would write _wrong_ rules.
Even the CSS 2 specification presents _wrong_ rules for quotation marks.
If those rules are hard to get right even to people who write
specifications, standards, and browsers, is it realistic to expect
that ordinary authors have a fair chance of getting them right?
(OK, admittedly we can expect a person to know the punctuation rules of
their native language. The expectation is mostly wrong, but fair.
But people use other languages in Web authoring, too.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Tuesday, 3 August 2004 03:22:04 UTC