W3C home > Mailing lists > Public > www-rdf-interest@w3.org > January 2005

RE: Language X within scope of language Y

From: Peter Constable <petercon@microsoft.com>
Date: Wed, 19 Jan 2005 09:24:18 -0800
Message-ID: <F8ACB1B494D9734783AAB114D0CE68FE04CD96D2@RED-MSG-52.redmond.corp.microsoft.com>
To: "Misha Wolf" <Misha.Wolf@reuters.com>, <www-rdf-interest@w3.org>, <www-international@w3.org>
Cc: <ietf-languages@iana.org>

I would agree that "en-IT" expresses "English as written/spoken in
Italy", but I think it is going too far to say that <foo
xml:lang="en-IT"> is what should be used for English content that is
expected to be reproduced by a text-to-speech processor with an Italian
accent.

I would say that they semantics for xml:lang="ll-CC" should be based on
normal usage, which would be "language ll as written/spoken by a
native-speaking community in country CC". In the case of "en-IT", that
would mean "English as written/spoken by a native-speaker English
community presumed to exist in Italy".

If you have a case in which processing must apply certain rules from
language B to content that is nominally in language A, such as applying
an Italian or German accent to English text, or treating English text as
the German label for something and wanting to apply English rules for
semantics and German rules for something else, then I think the
relationships that need to be described go beyond the capabilities of a
single attribute like xml:lang.


Peter Constable



> -----Original Message-----
> From: ietf-languages-bounces@alvestrand.no [mailto:ietf-languages-
> bounces@alvestrand.no] On Behalf Of Misha Wolf
> Sent: Wednesday, January 19, 2005 8:54 AM
> To: www-rdf-interest@w3.org; www-international@w3.org
> Cc: ietf-languages@iana.org
> Subject: Language X within scope of language Y
> 
> [IETF Languages list copied]
> 
> I think that we must not try to redefine the meaning of:
> 
>    <foo xml:lang="Y">
>       ...
>       <bar xml:lang="X">
>       ...
> 
> I agree that "en-IT" expresses "English as written/spoken in Italy",
> but that wasn't, I think, the problem that Reto was writing about in:
> http://lists.w3.org/Archives/Public/www-rdf-interest/2005Jan/0125.html
> 
> Misha
> 
> 
> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org] On Behalf Of Stephen Deach
> Sent: 19 January 2005 16:39
> To: Jeremy Carroll; Reto Bachmann-Gmuer
> Cc: Martin Duerst; www-rdf-interest@w3.org; www-international@w3.org
> Subject: Re: XMLLiterals and language
> 
> 
> Isn't encoding dialect the purpose of the variant component of a
locale
> specifier.
> 
> Also,
>    What's wrong with "en-IT" for English as spoken in Italy ?
> 
> 
> At 2005.01.19-16:29(+0000), Jeremy Carroll wrote:
> 
> 
> 
> >I am not at all convinced that this issue is irrelevant outside the
> >semantic web domain. e.g. a text-to-speech system should, pronounce
> >english words quite differently when in an italian mode, since
italian
> >speakers typically use italian pronounciation rules for english words
> >being used in italian sentences. As an English mother-tongue speaker,
> >with reasonable italian the most difficult sentences I find to
> >understand are such mixed sentences.
> >
> ><span xml:lang="it">
> >Abbiamo fatto questo lavoro per il progetto
> ><span xml:lang="en">"Question How"</span>
> ></span>
> >
> >the words "question how" are pronounced quite differently from in
> >English (even when the mother tongue italian speaker is a fluent
> english
> >speaker). (bitter experience here!)
> >
> >Jeremy
> >
> >Reto Bachmann-Gmuer wrote:
> > >
> > > Martin Duerst wrote:
> > >
> > >> It seems to me that what Reto is looking for is a way to define
> > >> a "primary language" for a small piece of data that itself is in
> > >> a different language. Because such divergent cases are very rare,
> > >> it seems they have been overlooked up to now.
> > >>
> > >>
> > > I don't think this cases are that rare, looking at German computer
> books
> > > many titles consist only of English words, however they are the
> German
> > > titles (the first is relevant for pronunciation, the latter for
> semantic
> > > processing).
> > >
> > >> To me, the right thing to do seems to be to define the "primary"
> > >> or "intended" language separately (e.g. with a separate
property),
> > >> but to define that property so that it defaults to the text
> > >> processing language.
> > >>
> > > Having a primary language for Literals would be fine, however I
> think
> > > the text processing language (specified in the xml) should default
> to
> > > the primary language (which imho should be defined by means of
rdf)
> > > rather than the other way round. This seems more coherent with
> > > plain-literals and particularly it does not require RDF-Processors
> to
> > > understand and parse XML in order to do things like filtering by
> language.
> > >
> > >> I'm glad to report that I just found the 'payload' module in
> > >> RSS 1.1 (http://inamidst.com/rss1.1/payload) that uses XML
> > >> Literals rather than encoding. Great!
> > >
> > >
> > > That's cool, and it would be cooler with the possibility to
specify
> a
> > > language for the whole payload (even when some of the rare cases
> apply).
> > >
> > > reto
> > >
> > >
> 
> 
> ---Steve Deach
>     sdeach@adobe.com
> 
> 
> 
> 
> 
> -----------------------------------------------------------------
>         Visit our Internet site at http://www.reuters.com
> 
> Get closer to the financial markets with Reuters Messaging - for more
> information and to register, visit http://www.reuters.com/messaging
> 
> Any views expressed in this message are those of  the  individual
> sender,  except  where  the sender specifically states them to be
> the views of Reuters Ltd.
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
Received on Thursday, 20 January 2005 04:41:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:52:12 GMT