Re: Correct usage of the q element from Jukka K. Korpela on 2004-02-13 (www-html@w3.org from February 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 13 Feb 2004 09:07:13 +0200 (EET)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.58.0402130828340.20289@korppi.cs.tut.fi>
On Thu, 12 Feb 2004, Ernest Cline wrote:

> The only benefit I can see to having <q> provide the quotation marks
> instead of making them content is that it makes providing correctly
> nested quotation marks in transcluded portion of documents easy.

I hadn't thought of that. Yes, it makes sense in principle.

> The problem is tho, support for transclusion is extremely limited at
> present.

Yes and no - there's the SGML way that always was formally part of HTML
but was never supported, and history repeats itself in the X* world,
but there are in fact several ways to perform inclusion using other
methods, like SSI includes.

But HTML itself is not suitable for "transclusion" in a far more serious
manner than the problem addressed by the idea of <q>. To begin with, HTML
(including XHTML) is defined so that you cannot simply include a document
verbatim into another. There's really no reason why things _couldn't_ be
that way. A document is an element, which is a tree structure. There's no
logical reason why the root element could re-appear in the tree. But
allowing this would be a major change.

On the more practical side, assume that we include (sorry,
transclude) just a fragment of a document, an element or sequence
of elements that can validly be put inside a <blockquote>.
What does this _mean_, apart from the fact that virtually all interested
parties will understand the element as meaning 'indent'?

For example, does any software that creates a table of content from the
headings in a document (this is one of the few kinds of meaningful
processing of structural markup that really exists) actually pay attention
to <blockquote>? If <blockquote> contains any headings, they should surely
be excluded from the ToC. Or does any software that checks the recommended
use of headings (no skipping of levels etc.) process <blockquote>
adequately, effectively as a separate realm?

My point is that nobody takes quotation markup seriously now except a few
enthusiasts just for the sake of principle.

> <q> was just simply ahead of its time

Actually the problem was that it was not taken into HTML _from the
beginning_. There would be no fundamental problem then.

> It also doesn't help that a major browser implements <q> incorrectly.
> which contributes to why authors don't use it.

Yes, and it's too late to change that now. Authors are used to using
quotation marks. Such things don't change easily. Actually it is easier to
change a few browsers than to change people.

And there's nothing fundamentally wrong with quotation marks.
If you write any software that tries to recognize quotations from
Web pages, it would be just a theoretical exercise to play with
<q> or <blockquote>, and the latter would give you wrong results
far more often than not. Recognizing "..." would be much more relevant.

So how does this compute regarding markup? Well, what we would actually
need is markup for indicating that "..." is _not_ a quotation, when
it's actually used for something else! What I mean is that markup could be
used to disambiguate the meaning of quotation marks, rather than replace
them.

> However, even if <q> and transclusion worked as they should work,
> there would still be the problem that the flattened text that results from
> stripping away the markup is not the same as one would want if one
> produced a plain text file, which should be the goal for a Text Markup
> Language, whether or not it is eXtensible or Hyper.

I don't quite agree. Markup can be _essential_ in the sense that
the fundamental meaning of text is thoroughly changed by it. It is
debatable whether <h1> is essential. <blockquote> is, in the general case.
If you remove <ol> and <li> markup, does the resulting string of
characters really correspond to the intended meaning? The HTML 2.0
specification required that <em> and <strong> be rendered as different
from each other and from normal text. To me, this reflects an idea of
essential markup. (For some odd reason, the requirement has been dropped,
and some browsers actually fail to comply with it.)

Some markup can just be omitted without affecting the fundamental meaning
of a document. Some markup is essential. The HTML specifications have
never tried to draw the line. It's not simply structural vs.
presentational. Structural markup can be non-essential (and in poor usage,
presentational markup can be essential).

> <quote><mark>"</mark>This is a quotation.<mark>"</mark></quote>

We might just as well have
<question>How are you<mark>?</mark></question>
and this might make sense for some applications.

But if a markup language contains a large number of elements and
attributes that _could_ be used, it can become very confusing.
There's already the WAI recommendation that says that quotations should be
indicated in markup and not using quotation marks. This is a symptom
of building recommendations on wishful thinking (and actually
reduces accessibility if followed).

Actually, I think it might be best to start from scratch. Deprecate
<blockquote> and <q>, and say that quotations should be indicated as
quotations using suitable wordings and punctuation characters.
Define <quote> as an element that can be used both at block level
and as inline markup, which authors _may_ use to indicate quotations
for the purpose of automated analysis and processing and which should
not be expected to affect rendering by default in any way but which
user agents _may_ use as additional information when e.g. choosing
how some text is spoken. That is, a UA could decide that
<quote>"..."</quote> is an actual quotation whereas "..." might
be just an indication of "metaphorical" use of a word, for example.

(The whole block vs. inline distinction is a mess, and should not be
carried over to any new markup elements.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Friday, 13 February 2004 02:07:21 UTC