Re: <q> Research & Conclusions

Chris Wilson wrote:
> Thanks, Ben, for such a great capture of where we are on this. It was very
> helpful.

It's also very cool to see such detailed feedback by a Microsoft employee on
this public list. You might not like this new message so much, though. :)


Executive Summary
=================
The common practice is for authors to punctuate the content of other
elements and attributes themselves, choosing from the endless variety of
sometimes complicated and always changing editorial conventions. These can
never be standardised and implemented interoperably and internationally due
to their very nature. I conclude that <q> MUST NOT generate quote marks.


Research
========

I've gone through all the use cases of <q> Philip Taylor published via a
link in this recent message:

<http://lists.w3.org/Archives/Public/public-html/2008Oct/0255.html>

What I found authors doing with <q> on those pages is documented with a
basic statistical breakdown followed by case-by-case notes:

<http://projectcerbera.com/web/study/2008/quotes#pt>

I places, I reviewed an archived version of the websites. It seems <q> is
becoming *even less* widely used on the Web, even though quoting text
remains popular.


General Findings
----------------

Ben Millard wrote:
> My impression of the few sites I've seen using <q> is that their authors
> are standards-savvy. They are equipped (although perhaps not willing!) to
> adjust their content or apply author CSS if browser behaviour changes.

Hah, if only! Basically, the way <q> gets used is a mess. To some extent,
this is "par for the course" with specialist phrase elements:

<http://projectcerbera.com/web/study/2008/quotes#pt>


Unique to <q>
-------------

The difference with <q> is the automatic quote marks. From the pages I've
now studied in detail, they seem more trouble than they're worth:

* By generating quote marks, some authors use <q> as a presentational
element. Like using <em> for italics. Especially in Germany it seems,
possibly due to Firefox's market share there:
    * <q> for &ldquo;.
    * </q> for &rdquo;.
* <q> for anything quoted, defined, cited, emphasised or otherwise removed
from the author's normal writing. (This is often done inconsistently, mixed
with <i> or punctuation and so on.)
* Non-english authors have to use a little-known CSS feature to adjust their
content's punctuation, as langauge-sensitive quoting is not implemented in
top browsers shipping currently (afaik).
* Authors who don't want styled quote marks but do want styled text (usually
italic) and like using the <q> element *and* are standards-savvy turn off
generated quotes from CSS and put the quote marks either outside of or
inside of the <q>.
* Authors sometimes seem to use <q> without realising it generates
quote marks. Presumably because IE6 and IE7 don't generate quote marks.
* Authors who use <q> in their templates (such as "random quote" features)
often use quote marks without <q> in their content (either straight, curly
or a mixture of both).

Far from being a convenience feature, generated quote marks make <q>
difficult and confusing. <q> either gets misused or not used at all. Except
for very unusually standards-savvy pages, like Howcome's profile.

Authors of pages not in English using <q> correctly must jump through hoops
to get the punctuation they want. It's a much smaller and higher hoop to
mainstream authors than it is to people on this list.


Compatibility with the Web
--------------------------

Ben Millard wrote:
>>* It's impossible to fully internationalise the generation of punctuation
>> on <q>.
>
> Technically, it appears correct, but you can definitely (imo) hit the bulk
> of use cases.

* In the pages Philip collected, there are 2 distinct approaches to German
quoting: Inward-pointing chevron things versus low curly quote with high
curly quote. A handful of pages use English quoting for German text in
German documents.
* It is impossible to automate quoting to the liking of mutually exclusive
authoring conventions simultaneously.
* In IRC recently, Philip mentioned that English is in a transitional phase
from single-outside-double to double-outside-single. You can't standardise a
moving target. It might move back or move somewhere else entirely.
    <http://krijnhoetmer.nl/irc-logs/whatwg/20081030#l-332>
* Conventions at the 3rd level of nesting in English are unclear (as
mentioned by David Baron at the HTMLWG meeting). The options are:
    * Alternate between double and single as depth increases.
    * Use double for the first level, single for anything deeper.
* The complex traditions of quoting in French and Russian are either beyond
CSS or so difficult to set up in CSS that only the most hardcore CSS gurus
would stand a chance of making it work. (It's beyond me, if that helps put
it in perspective.)
* Changing punctuation by using a CSS property is counter-intuitive and
inconsistent with current authoring practice on the web.
* Changing punctuation automatically via the lang attribute is doubtful due
to differing editorial conventions about quoting text from other languages,
as raised at various points elsewhere in recent discussion. If that text
itself quotes other text, which may or may not be from other languages,
then...well...all bets are off! :D


Conclusions
===========

The more I study this, the more I think <q> shouldn't generate quote marks.
It's impossible to do a good job of it in an internationalised way:

* Authors of pages who use IE6 or IE7 don't expect quote marks to be
generated on <q>, so they punctuate it themselves. They'll get a nasty
surprise when they upgrade to IE8 if it changes this behaviour. So the
backward compatibility story seems better for IE if it does not start
generating quotes on <q>.
* During the web's history, the leading web browser has never generated
quotes on <q> (afaik). To most users, there will be no difference in making
<q> quoteless by default.
* <q> is often used incorrectly for things where quote marks would not be
missed. Such as titles of work, which use title case and have context. Or to
decorate slogans, where quote marks make little sense to start with.
* UAs would avoid the bug reports and support queries about why some
nationalities get localised quote mark magic but others don't. And why the
magic in the browser doesn't quite match the conventions they see in some
fraction of the books in that language.
* HTML5 wouldn't avoid choosing which of the mutually exclusive, extremely
numerous and ever-changing quote mark conventions get locked into the spec.
* Author CSS remains if authors prefer doing quote marks this way.

-- 
Ben 'Cerbera' Millard
<http://projectcerbera.com/web/study/>

Received on Wednesday, 5 November 2008 07:33:15 UTC