Re: [XHTML2] CITELANG, TITLELANG attributes from Ian Hickson on 2004-07-28 (www-html@w3.org from July 2004)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 28 Jul 2004 09:20:52 +0000 (UTC)
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: www-html@w3.org
Message-ID: <Pine.LNX.4.58.0407280842190.2401@dhalsim.dreamhost.com>
On Wed, 28 Jul 2004, Jukka K. Korpela wrote:
> On Tue, 27 Jul 2004, Ian Hickson wrote:
>> On Wed, 28 Jul 2004, Jukka K. Korpela wrote:
>>>
>>> If the TITLE attributes were replaced by TITLE elements, for example,
>>> making each TITLE element specify, by definition, an informative title
>>> for its parent element, the problem would vanish in a puff of logic: the
>>> TITLE element could and should be allowed to have normal inline content,
>>> including elements with their own LANG (oops, sorry, xml:lang)
>>> attributes.
>>
>> IMHO the resulting increase in complexity in implementations, error
>> handling, and DOM access, far outweighs the extremely rare and highly
>> theoretical benefit of being able to make different parts of the text as
>> being in different languages.
>
> The whole issue of language markup is currently theoretical only, for
> almost all purposes - user agents that make use of it tend to make
> _wrong_ use of it (say, arbitrarily changing fonts, instead of doing
> something really language-dependent).

There are _some_ uses for more general language information -- Google's
filtering of content based on language, for instance. But I agree it is
largely not used.


> What complexity are you referring to? What complexity would follow from
> the simple idea of nesting an element inside another instead of using an
> attribute?

When the title is an attribute, getting the title string to pass to a
function which is then going to display the title consists of:

   1. Get the value of the attribute.

If the title is the textual content of a child element, you have to:

   1. Walk the child list looking for a matching element.
   2. Define what happens if there is more than one matching element.
   3. Walk all the descendants of that element, finding all the text
      nodes.
   4. Define what should happen with elements such as <select> that might
      have found themselves children of the element, and what to do with
      elements representing other titles that are nested inside this one.
   5. Expand any bidi formatting (which, in CSS UAs, requires performing
      the CSS cascade, inheritance, and computation steps) so that the
      resulting string has the appropriate bidi formatting characters.
   6. Concatenate the text nodes from step 3 interspersed with the bidi
      formatting characters from step 5.
   7. Define how the element should be hidden from normal rendering.
   8. Define whether the contents of the element should be selected
      when the contents of the document are all selected.

To summarise, elements are _hard_.

Note that simply saying "it must be the first element" or "you must not
nest these elements" and so forth doesn't get you out of any of this,
since it is trivial to mutate the DOM to get it into these states. The
behaviour has to be well-defined in all these cases.


> To take an analogous case, we currently have the CAPTION element which
> may be used (only) inside a TABLE element and the SUMMARY attribute that
> may be used for a TABLE element.

Great example. Implementing "summary" in a meaningful way is significantly
easier than implementing "caption". By orders of magnitude.


> I don't see the possibility as extremely rare. Consider a link - a
> typical element to which we might wish to assign a TITLE. If the
> document where the link appears is in French and the linked document is
> in German, for example, it would be very natural to make the "advisory
> title" contain the name of the linked document in both French and in
> German, in many cases.

That is an very rare case. Add to this the likelyhood of the author
actually bothering with _any_ language markup at all, and you have an
extremely rare case.


> If you think about the potential benefits of language markup (which are,
> after all, the only reason for considering language markup at all), then
> surely they apply to "advisory titles" as well.

But do these potential benegits outweigh the costs? I'm not convinced.


> For example, we would like to have a speech browser read the title using
> adequate algorithms for speech generation for each language. And
> "advisory titles" are typical examples of _short_ texts where heuristics
> so often fail - if you just try to guess whether the language changes
> within such a text, from the characteristics of the short string itself,
> you can't be very successful.

Like I said. Highly theoretical. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 28 July 2004 05:21:10 UTC