Re: XMLLiterals and language from Martin Duerst on 2005-01-21 (www-rdf-interest@w3.org from January 2005)

From: Martin Duerst <duerst@w3.org>
Date: Fri, 21 Jan 2005 09:14:08 +0900
To: Reto Bachmann-Gmuer <reto@gmuer.ch>
Cc: Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-rdf-interest@w3.org, www-international@w3.org
Message-Id: <6.0.0.20.2.20050121085505.0791ae08@localhost>

At 01:14 05/01/20, Reto Bachmann-Gmuer wrote:
 >
 >Martin Duerst wrote:
 >
 >>It seems to me that what Reto is looking for is a way to define
 >>a "primary language" for a small piece of data that itself is in
 >>a different language. Because such divergent cases are very rare,
 >>it seems they have been overlooked up to now.
 >>
 >>
 >I don't think this cases are that rare, looking at German computer books 
many titles consist only of English words, however they are the German titles

Yes, but then there is also the question of whether these are really
still English, or already German. This is a question that always
comes up with loanwords that are gradually integrated into a language.
English in particular has integrated a huge amount of Latin (and also
of French, if one goes back in history) words, but most people just
used them as part of English.

The process of adopting a word from another language is of course
a gradual one, so that it's very difficult to define the line where
the word 'moves' from one language to another. For tagging purposes,
a certain variation just has to be accepted.

 >(the first is relevant for pronunciation, the latter for semantic processing).

For pronunciation, the fact that a word it is used in German can be
as important as the fact that the word is of Latin origin. Same for
words from English, although there might be more variation between
different German speakers than for words of Latin origin.

This reminds me of cases where e.g. hyphenation and pronunciation
or some other processing aspects are based on different languages.
Again, xml:lang doesn't deal with that; the solution is to use
xml:lang to tag what you think the language 'is', or what it is
mostly, and to use other means (stylesheets,...) to indicate
processing for diverging cases.

Also, there are even German words where without any additional
information, pronunciation is pretty much going to fail. So
if you need to make sure you get the correct pronunciation,
you better make sure you have a dedicated means of indicating it.

 >>To me, the right thing to do seems to be to define the "primary"
 >>or "intended" language separately (e.g. with a separate property),
 >>but to define that property so that it defaults to the text
 >>processing language.
 >>
 >Having a primary language for Literals would be fine, however I think the 
text processing language (specified in the xml) should default to the 
primary language (which imho should be defined by means of rdf) rather than 
the other way round. This seems more coherent with plain-literals and 
particularly it does not require RDF-Processors to understand and parse XML 
in order to do things like filtering by language.

Well, I can only agree.

The way it was originally intended, there wasn't any conflict between
xml:lang and 'defined by means of rdf', as there still isn't such a conflict
for plain literals. The problem is that as defined currently, RDF ignores
xml:lang on XML Literals, and does not have a way in the model to add
language information to XML Literals in the same way as for plain literals.

Regards,     Martin.

Received on Friday, 21 January 2005 03:23:31 UTC