Re: ACTION-161 "Talk to shaun about BCP47 compatibility" from Shaun McCance on 2012-07-08 (public-multilingualweb-lt@w3.org from July 2012)

From: Shaun McCance <shaunm@gnome.org>
Date: Sun, 08 Jul 2012 18:06:47 -0400
To: public-multilingualweb-lt@w3.org
Message-ID: <1341785207.2196.46.camel@recto>
On Sun, 2012-07-08 at 18:38 +0200, Felix Sasaki wrote:
> Hi Shaun,
> 
> 
> with
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0010.html
> as a basis, at 
> http://www.w3.org/2012/07/05-mlw-lt-minutes.html#item13
> we discussed autoLanguageProcessingRule.
> 
> 
> One aspect that came up was whether this should be specific to
> transliteration - Yves mentioned that you have implemented this not
> only for transliteration, but also for machine translation.

As Yves pointed out already, I don't have this implemented. It's
just something that I brought up as a possibility on the ITS IG
mailing list last year:

http://lists.w3.org/Archives/Public/public-i18n-its-ig/2011Apr/0004.html

With respect to the "t" extension, my understanding is that it
identifies already localized content. The first example in the
RFC is "ja-t-it" to mean "Japanese translated from Italian".

I'm interested in marking how the source content ought to be
translated, not marking how the translated content was done.
(I'm actually personally only interested in transliteration,
but I recognize that it's a subset of a larger problem that
should have a more generic solution.)

Please correct me if I've misunderstood the "t" extension.

Aside: I haven't since pushed for this data category because
my translators were less enthusiastic about it than I thought
they'd be. I think I didn't do a good enough job explaining
how it would reduce their workload. Then again, translation
memory probably hides most of the work anyway.

--
Shaun

> That leads to the question what the relation to BCP 47 "t" extension
> should be. See as an input the RFC for the "t" extension
> http://tools.ietf.org/html/rfc6497
> which has transliteration as an example
> und-Latn-t-und-cyrl
> 
> 
> and the discussion at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0155.html
> (
> >> 5) WRT to the tags that Mark mentioned in 1. below: are the "transform"
> >> XML files here
> >> http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47 the
> )
> This discussion showed that the fields for the "t" extension include
> also values for machine translation, see
> http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47/transform_mt.xml
> [
> <key extension="t" name="t0" description="Machine Translation:
> 
> 8                 Used to indicate content that has been machine
> translated, or a request for a particular type of machine translation
> of content.
> 
> 9                 The first subfield in a sequence would typically be
> a 'platform' or vendor designation." since="21.0.2">
> 
> 10                   <type name="und" description="The choice of
> machine translation is not specified. Used when the only information
> known (or requested) is that the text was machine translated."
> since="21.0.2" />
> 
> 
> ]
> 
> 
> For other "transform" fields, see
> http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47/transform.xml
> We now want to make sure that - if we provide a data category
> "autoLanguageProcessingRule" - that this is somehow consistent with
> the BCP 47 approach, or that at least we have a good story why it
> doesn't need to be consistent. Do you have any thoughts about this?
> 
> 
> Looking very much forward to your feedback,
> 
> 
> Felix
> 
> 
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
> 
>
Received on Sunday, 8 July 2012 22:07:11 UTC