W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: XMLLiterals and language

From: Martin Duerst <duerst@w3.org>
Date: Wed, 19 Jan 2005 14:58:01 +0900
Message-Id: <6.0.0.20.2.20050119143451.026eb988@localhost>
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, Reto Bachmann-Gmuer <reto@gmuer.ch>
Cc: www-rdf-interest@w3.org, www-international@w3.org

At 20:12 05/01/18, Jeremy Carroll wrote:

 >I suggest the following markup for that example:
 ><ex:Book>
 >     <dc:title rdf:parseType="Literal"><span xml:lang="de"><span
 >  xml:lang="la">Carpe diem</span></span></dc:title>
 ></ex:Book>
 >
 >
 >but I'm not wholly up on the conventions for using lang tags to indicate 
one language quote inside another ... (cc-ing to www-international for a 
further opinion).
 >I think if you want to know about the language of a piece of XHTML you 
have to process it as XHTML, hmmmm, I suppose for an XHTML page there is 
often metadata about the page such as the accept-language headers which 
give some sort of overview.

I'm not really sure this flies. "Carpe diem" is either German or Latin.
You can't have it both ways. Strictly speaking, xml:lang applies to
element and attribute content, but the only content that xml:lang="de"
could apply to would be the two letters "la", and they are clearly
independent of language. The GEO Task Force (now the GEO WG) has
worked quite a bit on making the distinction between "document processing
language" (maybe better "text processing language") and document
metadata at http://www.w3.org/International/questions/qa-http-and-lang.

The typical case where this distinction becomes relevant is a mixed-
language document, where each piece of the document can be in a
different language, and the primary language of the document as a wholecan 
be either in one of these languages (if the other languages are just
inserts) or can be in more than one language (e.g. for a document
with two languages in parallel).

It seems to me that what Reto is looking for is a way to define
a "primary language" for a small piece of data that itself is in
a different language. Because such divergent cases are very rare,
it seems they have been overlooked up to now.

To me, the right thing to do seems to be to define the "primary"
or "intended" language separately (e.g. with a separate property),
but to define that property so that it defaults to the text
processing language.

Please note that e.g. SMIL and SVG have a <switch> statement, (see e.g.
http://www.w3.org/TR/2003/REC-SVG11-20030114/struct.html#SwitchElement)
and use the attribute systemLanguage (not the best name, userLanguage
would be better) to indicate the 'indended' language.

Regards,    Martin. 
Received on Wednesday, 19 January 2005 07:12:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT