W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: XMLLiterals and language

From: Reto Bachmann-Gmuer <reto@gmuer.ch>
Date: Wed, 19 Jan 2005 23:04:46 +0100
Message-ID: <41EED97E.3070104@gmuer.ch>
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
CC: Martin Duerst <duerst@w3.org>, www-rdf-interest@w3.org, www-international@w3.org

Jeremy Carroll wrote:

>I am not at all convinced that this issue is irrelevant outside the
>semantic web domain. e.g. a text-to-speech system should, pronounce
>english words quite differently when in an italian mode, since italian
>speakers typically use italian pronounciation rules for english words
>being used in italian sentences. As an English mother-tongue speaker,
>with reasonable italian the most difficult sentences I find to
>understand are such mixed sentences.
>
><span xml:lang="it">
>Abbiamo fatto questo lavoro per il progetto
><span xml:lang="en">"Question How"</span>
></span>
>
>the words "question how" are pronounced quite differently from in
>English (even when the mother tongue italian speaker is a fluent english
>speaker). (bitter experience here!)
>  
>
I did not mean to say that the distinction between 
context-language/specific language could not be meaningful outside the 
semantic web. But if I wrote a German book called "Semantic Web" I 
certainly wouldn't describe it with

<ex:Book rdf:about="#semBook">
<dc:title rdf:parseType="Literal">
<span xml:lang="de"><span xml:lang="en">Semantic Web</span>
</dc:title>
<ex:Book>

if this says that the title should be pronounced with a German accent. 
But I'd like the following to express that the German title of the book 
consists of English words (which is relevant for text-to-speech systems).

<ex:Book rdf:about="#semBook">
<dc:title rdf:parseType="Literal" xml:lang="de"><span 
xml:lang="en">Semantic Web</span></dc:title>
<ex:Book>

Supposing ex:Book is defined as the abstraction over the different 
translations of a book (i.e. "Das Kapital" and "Capital" are the same 
ex:Book) the following should express the title of the book in different 
translations:

<ex:Book rdf:about="#semBook">
<dc:title rdf:parseType="Literal" xml:lang="de"><span 
xml:lang="en">Semantic Web</span></dc:title>
<dc:title rdf:parseType="Literal" xml:lang="en">Semantic Web</dc:title>
<dc:title rdf:parseType="Literal" xml:lang="fr">Web sématique</dc:title>
<ex:Book>

I'm aware that this could be expressed on a graph level using a more 
complex vocabulary. However given that properties defined in many 
vocabularies make perfect sense both with plain- and with xhtml-literals 
they should be usable in a similar way. While one could argue that a 
rdfs:description is not meant to be layouted html, at least marking a 
foreign-language quotation within the rdfs:description so that 
text-to-speech systems can read it correctly seems something that should 
be promoted and implementable in an easy and standardized way.

reto

>Jeremy
>
>Reto Bachmann-Gmuer wrote:
>  
>
>>Martin Duerst wrote:
>>
>>    
>>
>>>It seems to me that what Reto is looking for is a way to define
>>>a "primary language" for a small piece of data that itself is in
>>>a different language. Because such divergent cases are very rare,
>>>it seems they have been overlooked up to now.
>>> 
>>>
>>>      
>>>
>>I don't think this cases are that rare, looking at German computer books 
>>many titles consist only of English words, however they are the German 
>>titles (the first is relevant for pronunciation, the latter for semantic 
>>processing).
>>
>>    
>>
>>>To me, the right thing to do seems to be to define the "primary"
>>>or "intended" language separately (e.g. with a separate property),
>>>but to define that property so that it defaults to the text
>>>processing language.
>>>
>>>      
>>>
>>Having a primary language for Literals would be fine, however I think 
>>the text processing language (specified in the xml) should default to 
>>the primary language (which imho should be defined by means of rdf) 
>>rather than the other way round. This seems more coherent with 
>>plain-literals and particularly it does not require RDF-Processors to 
>>understand and parse XML in order to do things like filtering by language.
>>
>>    
>>
>>>I'm glad to report that I just found the 'payload' module in
>>>RSS 1.1 (http://inamidst.com/rss1.1/payload) that uses XML
>>>Literals rather than encoding. Great!
>>>      
>>>
>>That's cool, and it would be cooler with the possibility to specify a 
>>language for the whole payload (even when some of the rare cases apply).
>>
>>reto
>>
>>
>>    
>>
>
>
>
>  
>
Received on Wednesday, 19 January 2005 21:56:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT