RE: meta content-language

> The spec could make multiple language tags in Content-Language non-
> conforming and could make processing pick the first language tag.

In addition to being incompatible with existing Web content, I really don't see why we need to change the Content-Language meta tag from indicating the target audience to indicating the processing language. Since browsers don't make use of this information today for processing the text, we'd be better to make existing practice formalized than to change semantics. 

> > 2. the meta approach is really not used by anything according to
> the
> > tests I
> > did
> Given that people do put and have put language declarations there,
> is it good to keep ignoring that data?

We don't have to ignore it. We can use that data for its most useful purpose, which is as metadata about the author's intentions (much like "keyword" was supposed to work).

> Of course, if the data is *wrong* significantly more often than
> lang='' (assuming that the correctness level of lang='' establishes
> an
> implicit data quality baseline), it would be good to ignore it. My
> guess is that HTTP-level Content-Language is more likely to be
> wrong
> (it sure is less obvious to diagnose) than any HTML-level
> declaration.

You could insert the never-ending saga of <meta> charset vs. HTTP charset here for comparison purposes :-). 

> > 3. the question of inheritance is unclear when using the meta
> > statement for
> > declaring the text-processing language
> The spec now makes it clear.

... and Richard and I are trying to get you to make a different bit of clarity here.

I would add: having a over-arching "default text processing language" above the <html> element would probably create additional problems for implementation of CSS :lang pseudo-attribute, etc., that do language selection in documents by having something outside the parse tree affect the value of the (implied) xml:lang/html lang. 

> > If the meta statement continues to be allowed, I suggest that it
> is
> > used in
> > the same way as a Content-Language declaration in the HTTP header,
> > ie. as
> > metadata about the document as a whole, but that such usage is
> kept
> > separate
> > from use for defining the language of a range of content. As far
> as
> > I can
> > tell, although Frontpage uses it and people on the Web recommend
> its
> > use, it
> > has no effect at all on content, and wouldn't be missed if it
> were
> > dropped.
> What purpose does metadata serve if it isn't actionable?

There are many uses for finding out the author's intended audience. A document, for example, might be mostly in Japanese although it serves an English-speaking audience. For example, it might be examples of Japanese writing with short descriptions in English. Other documents might be side-by-side (parallel) translations. The text processing language in these cases will follow specific spans of text; the audience, however, might not be one of the two streams of text.

Another use would be with language negotiation. The text processing language isn't as interesting as the author's intended audience in this case. A server might implement BCP 47's Lookup or Filtering algorithms against a user's Accept-Language to select content. Having the author's intended audience(s) in a Content-Language <meta> tag would facilitate that more readily.

Anyway, that's my €0,02.


Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Friday, 15 August 2008 15:27:36 UTC