- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 21 Jan 2005 09:14:08 +0900
- To: Reto Bachmann-Gmuer <reto@gmuer.ch>
- Cc: Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-rdf-interest@w3.org, www-international@w3.org
At 01:14 05/01/20, Reto Bachmann-Gmuer wrote: > >Martin Duerst wrote: > >>It seems to me that what Reto is looking for is a way to define >>a "primary language" for a small piece of data that itself is in >>a different language. Because such divergent cases are very rare, >>it seems they have been overlooked up to now. >> >> >I don't think this cases are that rare, looking at German computer books many titles consist only of English words, however they are the German titles Yes, but then there is also the question of whether these are really still English, or already German. This is a question that always comes up with loanwords that are gradually integrated into a language. English in particular has integrated a huge amount of Latin (and also of French, if one goes back in history) words, but most people just used them as part of English. The process of adopting a word from another language is of course a gradual one, so that it's very difficult to define the line where the word 'moves' from one language to another. For tagging purposes, a certain variation just has to be accepted. >(the first is relevant for pronunciation, the latter for semantic processing). For pronunciation, the fact that a word it is used in German can be as important as the fact that the word is of Latin origin. Same for words from English, although there might be more variation between different German speakers than for words of Latin origin. This reminds me of cases where e.g. hyphenation and pronunciation or some other processing aspects are based on different languages. Again, xml:lang doesn't deal with that; the solution is to use xml:lang to tag what you think the language 'is', or what it is mostly, and to use other means (stylesheets,...) to indicate processing for diverging cases. Also, there are even German words where without any additional information, pronunciation is pretty much going to fail. So if you need to make sure you get the correct pronunciation, you better make sure you have a dedicated means of indicating it. >>To me, the right thing to do seems to be to define the "primary" >>or "intended" language separately (e.g. with a separate property), >>but to define that property so that it defaults to the text >>processing language. >> >Having a primary language for Literals would be fine, however I think the text processing language (specified in the xml) should default to the primary language (which imho should be defined by means of rdf) rather than the other way round. This seems more coherent with plain-literals and particularly it does not require RDF-Processors to understand and parse XML in order to do things like filtering by language. Well, I can only agree. The way it was originally intended, there wasn't any conflict between xml:lang and 'defined by means of rdf', as there still isn't such a conflict for plain literals. The problem is that as defined currently, RDF ignores xml:lang on XML Literals, and does not have a way in the model to add language information to XML Literals in the same way as for plain literals. Regards, Martin.
Received on Friday, 21 January 2005 03:23:31 UTC