- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Wed, 28 Jul 2004 14:36:42 +0300 (EEST)
- To: www-html@w3.org
On Wed, 28 Jul 2004, Ian Hickson wrote: > There are _some_ uses for more general language information -- Google's > filtering of content based on language, for instance. As far as I know, Google uses its own guesswork ("heuristics") when deciding the language of a document, ignoring both HTTP headers and language markup in (X)HTML. I still think that it is worthwhile to make language information available, hoping that search engines and other parties start making use of it. And maybe it's a good thing that many authors mistakenly believe that their lang or xml:lang attributes are of some use - since if authors use them, there are more reasons to make use of them in search engines. The role of XHTML specifications in this process is that language information mechanisms should be both well-defined and easy to use to authors. > When the title is an attribute, getting the title string to pass to a > function which is then going to display the title consists of: > > 1. Get the value of the attribute. > > If the title is the textual content of a child element, you have to: [ do something more complex ] I'm not sure I understand the complexity. Can't a parser simply recognize each <title> element as it sees it and associate it with the internal data structure corresponding to the parent element? > To summarise, elements are _hard_. I still don't see the problem, I'm afraid, but if elements are _hard_, then the problem is in the very idea of markup, which revolves around elements. Attributes are just properties of elements. If you change something that is in essence a container for textual data (which might need some inline markup), hence something that should be an element in markup, into an attribute containing plain text, for efficiency of implementation, then I think it's time to consider where this all would end. > Note that simply saying "it must be the first element" or "you must not > nest these elements" and so forth doesn't get you out of any of this, > since it is trivial to mutate the DOM to get it into these states. The > behaviour has to be well-defined in all these cases. Sorry I fail to see the point here. Surely XHTML specifications need to define the semantics of valid constructs only. > > To take an analogous case, we currently have the CAPTION element which > > may be used (only) inside a TABLE element and the SUMMARY attribute that > > may be used for a TABLE element. > > Great example. Implementing "summary" in a meaningful way is significantly > easier than implementing "caption". By orders of magnitude. But as I learned in this thread (thanks Anne!), the current draft has made <summary> an element, which sounds logical. Are you saying that this should be taken back? (And all browsers implement the "caption" element, though poorly, whereas "summary" is virtually unimplemented, there's a mismatch between actual browser behavior and the difference in the difficulty of implementation that you refer to.) > > I don't see the possibility as extremely rare. Consider a link - a > > typical element to which we might wish to assign a TITLE. If the > > document where the link appears is in French and the linked document is > > in German, for example, it would be very natural to make the "advisory > > title" contain the name of the linked document in both French and in > > German, in many cases. > > That is an very rare case. Is it? To the extent that documents contain a mixture of languages in some sense, this seems to be a very typical example. And if there is no mixture of languages, the hreflang and citelang issue becomes a non-issue. Well, I guess we have partly moved to other directions as well. > > For example, we would like to have a speech browser read the title using > > adequate algorithms for speech generation for each language. And > > "advisory titles" are typical examples of _short_ texts where heuristics > > so often fail - if you just try to guess whether the language changes > > within such a text, from the characteristics of the short string itself, > > you can't be very successful. > > Like I said. Highly theoretical. :-) If we regard such issues as negligible, then I think many parts of the WAI recommendations should be rewritten. I'm especially thinking about the _priority 1_ requirement that all changes in language in a document be indicated in markup. It is highly illogical to make such requirements and to define the markup language so that not all changes _can_ be indicated. Or should we read the current (and planned) situation so that authors are required to use Unicode language tags inside attribute values if there is a single foreign word in any such attribute? -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 28 July 2004 07:37:47 UTC