Re: [XHTML2] CITELANG, TITLELANG attributes from Jukka K. Korpela on 2004-07-28 (www-html@w3.org from July 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Wed, 28 Jul 2004 16:47:38 +0300 (EEST)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.58.0407281610490.7916@korppi.cs.tut.fi>
On Wed, 28 Jul 2004, Ian Hickson wrote:

> HTML pages are dynamic.

I think you mean "HTML documents can be modified by client-side scripts".
The word "dynamic" is just a buzzword, though sometimes useful.

> Script can dynamically modify elements,
> attributes, and so forth, all on the fly, even while the "title" is being
> displayed (and then authors expect the UI to be updated on the fly too).

And this implies some "cost", as compared with just displaying or
analyzing a "static" document. The ease of implementing modifications
should not be a decisive factor in markup language design, and it largely
depends on the DOM and the scripting language more than on the markup
language.

> It's pretty simply really. As a general rule, things you want to have
> render inline should be in elements, and things you want to have render in
> UI should be in attributes.

That sounds like an awfully browser-technocratic view on the matter.
Remember that there might be no UI proper. For example, a document might
just get printed on paper, or scanned by an indexing robot. The markup
language specification should define the meanings of elements and
attributes, not the way they are processed.

For search engine indexing, for example, it would be simplest if
attributes were used to describe properties of elements only, typically
using coded values, not to include textual content of any kind.
An indexing robot could thus ignore all attribute and process element
content only, unless it decides to use element properties like xml:lang.
And this is probably how robots currently work in a typical case - but
obviously they _should_ index things like advisory titles, alternate texts
for images, and summaries for tables (perhaps giving them more weight than
copy text!).

> > Sorry I fail to see the point here. Surely XHTML specifications need to
> > define the semantics of valid constructs only.
>
> That's the mentality that got us into the Tag Soup mess -- by not defining
> what should happen when the author makes a mistake, you end up forcing
> every UA to copy the market leader's error handling.

I think we have exactly opposite views on this. Tag Soup mess is based on
treating tags as commands, typically treating them as visual formatting
commands more than anything else. It might have been a good idea to
document such processing years ago, but it's too late to document Tag Soup
HTML now. Still less do we need to document Dynamic Element Soup. If you
modify an HTML document as a tree structure and produce something that
does not comply with HTML syntax, you should be on your own; anything that
says what _should_ happen then will just encourage authors to play such
games.

> > But as I learned in this thread (thanks Anne!), the current draft has
> > made <summary> an element, which sounds logical. Are you saying that
> > this should be taken back?
>
> No, because in the case of <summary> I would expect the content to be
> shown inline, instead of the rest of the table.

Do you mean it should be treated as a <caption> of second grade?

The way I think <summary> should be defined is this: conforming user
agents are required to present the content of the element in a manner that
associates it with the table _or_ to indicate that a summary is available
and make it optionally accessible (e.g., by displaying an icon that will
show the summary on mouseover or when clicked); and speech browsers would
normally be expected to take the first option. It should be optional but
always accessible information. That way, authors could really make use of
it and rely on it.

> Much like "alt" should never have been an attribute.

Indeed. But it was easier to implement as an attribute. :-)

> Given that HTML4 says the attribute is "for user agents rendering to
> non-visual media", it's unclear what you expect desktop UAs to actually
> _do_ with it.

Fair point. The definition is unfortunately restrictive - and this is one
reason why the summary attribute has been used so little or with strange
dummy content like summary="" or summary="layout table" or
with pathetic content like summary="statistics". Authors will either not
use an attribute or will use it in strange ways until they have some
experience on how the attribute might actually _work_ in browsers, i.e.
how it might affect the browsing experience.

Pragmatically we might say that the summary attribute or element should be
primarily written for those users who do not see the table (or do not see
it as a whole). But it can be useful to explain the structure of a table
even to those users who do see the table - not all people can intuitively
see the structure as the author expects.

> Unicode language tags -- like Unicode BFCs -- will probably be quite
> unpopular with experts, especially in a markup context.

And completely unknown to others. Exactly my point; I tried to make a
reductio ad absurdum: if the W3C requires markup of all language change
(for compliance with WAI even at the most elementary level, A) _and_
defines a markup language that does not permit such indications, then it
is saying that authors should use Unicode language tags, which is
undescribably absurd. So either WAI rules should be relaxed or the markup
language redesigned as regards to text data in attributes.

> How would you mark up mixed languages in text/plain documents?

I would not. If I need to indicate language changes in text, I will use
markup (or invent a markup of a kind of my own, using textual explanations
instead of tags).

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 28 July 2004 09:47:53 UTC