Why the quote element doesn't add quotes by default (was Re: http://www.w3.org/TR/2004/WD-xhtml2-20040722/xhtml2-diff.html) from Masayasu Ishikawa on 2004-08-03 (www-html-editor@w3.org from July to September 2004)

From: Masayasu Ishikawa <mimasa@w3.org>
Date: Tue, 03 Aug 2004 14:40:49 +0900 (JST)
To: Christian.Hujer@itcqis.com
Cc: www-html@w3.org, www-html-editor@w3.org
Message-Id: <20040803.144049.74739023.mimasa@w3.org>
Christian Wolfgang Hujer <Christian.Hujer@itcqis.com> wrote:

> Section 9.8 The quote element
> What's the rationale behind requiring the author to add quotes via style or 
> content instead of inserting them by default ("default stylesheet")?

This question comes up frequently, so I'll explain the rationale behind
this.

Short summary:

  The q element in earlier version of (X)HTML placed the burden of adding
  "proper" quotation marks on the wrong side.  The quote element in
  XHTML 2.0 shifts the burden of adding "proper" quotation marks from
  user agents to authors, who know what are "proper" quotation marks
  for their documents.

Longer story:

Back to 2001, the HTML Working Group reviewed all elements/attributes
in the XHTML namespace whether they should be succeeded to XHTML 2.0.
A question arose whether the q element should be altered to NOT supply
the quotation marks by default, and had discussion with the I18N WG
and the CSS WG.

The basic problem is that the q element requires arcane knowledge of
language-sensitive quotation marks, and no user agent would be able to
capture all the possible combination of all languages around the world.
So, it would be unavoidable that each user agent would end up supporting
only certain subset of language-sensitive quotation marks, which may
differ by each user agent - the least common denominator would be quite
small, or even none.  So the result is unpredictable, and authors can't
be sure what kind of quotation marks will be rendered, even though they
do know what kind of quotation marks they intended.

While the HTML 4 spec didn't indicate what a user agent should minimally
do [1], RFC 2070 included the following note [2]:

      NOTE -- minimal support for the Q element is to surround the
      contents with some kind of quotes, like the plain ASCII double
      quotes.  As this is rather easy to implement, and as the lack of
      any visible quotes may affect the perceived meaning of the text,
      user-agent implementors are strongly requested to provide at least
      this minimal level of support.

And this fallback behavior is another reason why the q element was not
used widely.  In the early days, the main reason was of course the lack 
of support at all.  However, by 2001 many "modern" browsers provided
at least "minimal" support for the q element.  To list a few (caution:
these are the implementation status in 2001, those may have been
improved since then):

- Lynx has been supporting nested handling of the q element so that
  it alternates double-quotes and single-quotes with directionality
  of start and end single-quotes (i.e. something like "... `...' ..."),
  since 27 May 1996.
- Opera supports the q element since version 4 (but only minimally),
  and also supports relevant CSS properties.
- Mozilla/Netscape 6 also support it, but they all just insert " around
  <q>...</q>, in non-language-sensitive manner.  It also supports relevant
  CSS properties, but didn't handle nesting of quotes properly at that time.
- Amaya alternates " and ', but it's not language-sensitive.
- Alis Tango is able to configure quotation marks, but strangely its
  configuration is affected by the language of the *user interface*,
  so if a user chooses Japanese UI, Tango inserts Japanese quotation
  marks regardless of the language of the document, even in English or
  French context.
- IE5/Mac tries to be somewhat language-sensitive, but its behavior is
  sometimes strange, e.g. it uses the combination of U+201C - U+201D and
  U+2018 - U+2019 if the language is "en", but it merely uses " and '
  for "en-US", "en-GB" and so on, and for some languages it uses strange
  quotation marks.  It doesn't support relevant CSS properties or other
  means to override the default quotation marks.
- iCab implements the q element in a language-sensitive manner to some
  extent, but doesn't provide a way to override the default quotation marks.
- IE/Win lacks support for the q element in all versions.

This situation effectively shows that the "minimal" level of support
for the q element is certainly not difficult, but very few implementors
dare to go beyond that level.  Ironically IE5/Mac and iCab tried to
implement it far better than other user agents, but neither of them
provided a way to override the default quotation marks, so for example,
neither of them does Japanese quotation marks correctly but authors cannot
override the poor "fallback" quotation marks on those user agents.

This situation rather discourages the use of the q element, e.g. even if
a French author does know what the French quotation marks should be,
the specification says that authors should not put quotation marks
by themselves around q, and most browsers just end up with ", which
is not at all satisfactory.  Given that situation, it is quite possible
that some authors just insert French quotation marks directly and don't
use the q element at all.  Even the latest draft of "HTML Techniques
for WCAG 2.0" says as follows [3]:

    The q element marks up inline quotations.

    NOTE: The q element, though designed for semantic markup, is
    unsupported, or poorly-supported, in most browsers. So this is
    a future technique.

This is not a document written in the last century, a document written
in 2004.  Probably the "future" will never come.  Not using appropriate
markup for quotations is worse than not having appropriate quotation marks.

Another difficult aspect of handling language-sensitive quotation marks
is that existing practice vary whether quotation marks are considered
as part of the content of the parent of the quoted text, or that of
the quoted text itself.  We researched a bunch of publications, only
to find that there's no consistent rule across the world.

For example, the quotation marks around English quoted text inside French
content text are typically rendered as French quotation marks.  On the
other hand, when languages like Chinese, German, Indonesian, Korean,
Malay are quoted inside Japanese text, quotation marks are typically
rendered in the language of the *quoted text*, not as Japanese quotation
marks.  These are all real-world examples, and those examples effectively
show that there are diverse practices around the world, and it is not
at all trivial to determine the "proper" quotation marks in an appropriate
context.  The rule may even differ by local convention, or by author's
preference.

If we require that user agents should have default style rules,
implementors would have to prepare great number of language-sensitive
style rules, and even if they do a great job, they won't be able to
cover all possible combination of various languages around the world,
and even if they can, that may not match the author's preference/
convention.  On the other hand, it is rather rare that a document
includes multilingual quotations, and authors only have to provide
a few style rules that are necessary for their documents.  And they
do know their preference/convention.

So we concluded that it would be reasonable to place the burden of
adding "proper" quotation marks on authors rather than implementors.
The I18N WG recommended that using styling would be a preferable way
and encouraged CSS implementors to support relevant feature more widely
and consistently.  Then, each author may have their own default style
rules, and may include them in their author style sheet.  We could
provide some sample style rules, but it MUST NOT be in the default
XHTML 2.0 style sheet.  

That's what was agreed between HTML, I18N, and CSS WGs more than three
years ago, and why the quote element doesn't add quotes by default.

[1] http://www.w3.org/TR/html4/struct/text.html#edef-Q
[2] http://www.rfc-editor.org/rfc/rfc2070.txt
[3] http://www.w3.org/TR/2004/WD-WCAG20-HTML-TECHS-20040730/#q

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium
Received on Tuesday, 3 August 2004 01:40:58 UTC