Re: Correct usage of the q element from Jukka K. Korpela on 2004-02-15 (www-html@w3.org from February 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sun, 15 Feb 2004 18:58:15 +0200 (EET)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.58.0402151833460.8544@korppi.cs.tut.fi>
On Fri, 13 Feb 2004, Christoph Päper wrote:

> *Jukka K. Korpela*:
> > On Thu, 12 Feb 2004, Ernest Cline wrote:
> >
> >> The problem is tho, support for transclusion is extremely limited at
> >> present.
> >
> > Yes and no - there's the SGML way that always was formally part
> > of HTML but was never supported,
>
> Are you speaking about entities?

Yes. The point is that if browser vendors had wanted to implement it, they
could have done so and for once claim conformance to specifications.
Remembering that useful features even simpler than that have remained
unimplemented, I'm sceptic about any new features that are essentially
more complex. In fact, I think HTML specifications should be augmented
with what so many people want: simple include. It can be handled at a
different level (preprocessing or server processing), but when added to
HTML in a simple way, it would be useful in many cases, harmless in
others:
<include src="...">
alternate content for user agents that do not support include,
such as a link
</include>
(defined as simple, "seamless" inclusion).

> I wish they could be defined language dependent, thus
>
>   <q lang="en">&sq;Life's a bitch. And then you die.&eq;</q>
>
> would be rendered with high-66 and high-99, whereas in
>
>   <q lang="fr">&sq;L'État, c'est moi.&eq;</q>
>
> the entity references are computed to « + thin space and thin space + ».

I think that is is very descriptive of the problems. Very knowledgeable
people seem to think that quotation style should depend on the language of
the quoted text, not on the language of the content.

Moreover, the browser would need to have support to over 7 000 languages,
or it would discriminate against some (most) languages and support just
some of them. The first alternative is hardly realistic. The quotation
styles have not even been _described_ adequately. Even version 3 of the
Unicode standard presented wrong examples - it showed a French quotation
without those thin spaces. So can you expect the programmers of a normal
browser to create _correct_ support to the languages of the world?

Language support is nice to have, when we get it, if we get it as
reasonably correct. But basic rendering, such as the presence of quotation
marks, should not depend on such support.

> > If you write any software that tries to recognize quotations from
> > Web pages, it would be just a theoretical exercise to play with
> > <q> or <blockquote>, and the latter would give you wrong results
> > far more often than not. Recognizing "..." would be much more relevant.
>
> That's not as simple as you make it sound here, though, realizing the very
> different pairings of quotation marks throughout the Latin alphabet world
> (e.g. »...« vs. «...» vs. »...»).

I'm pretty sure that Ascii quotation marks "..." dominate over all other
quotation marks, despite not being correct punctuation in any language, as
far as I know. They are simply the de fact surrogate. Even guillemets are
not used much, although they are technically almost as safe as Ascii
quotation marks. Well, we have the IE misbehavior that it may split a line
between a right-pointing guillemet and an immediately (i.e., no space)
following word. Until leading browsers get such simple character-level
things right, I wouldn't expect them to implement correctly anything
i18n related that is essentially more complex.

My point was that at present, by recognizing "..." as a quotation, you
guess right far more often than by looking at <blockquote> and <q>.
So whatever various programs _might_ do, it's not realistic to think that
they will take any markup for quotations any more seriously than authors
have done. In special applications, such as site-specific indexing in a
well-managed authoring environment, things can be very different, but then
again, the management can tell authors to write the quotations in a
particular style at character level.

> > (The whole block vs. inline distinction is a mess, and should not be
> > carried over to any new markup elements.)
>
> <del>element</del><ins>language</ins>

Maybe so, but I was thinking about XHTML 2.0, which I would classify as a
dialect of HTML (as currently sketched, it's in practice closer to HTML 4
than HTML 4 was to HTML 3.2, though some of the differences imply that
XHTML 2.0 documents would not display correctly on current browsers).
Introducing <quote> as a text-level counterpart of <blockquote> means
carrying the distinction to new elements. So would <blockcode> - which
sounds like something we need, but it is simpler to allow <code>
to contain block elements.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Sunday, 15 February 2004 11:58:18 UTC