Re: BCP 47 "t" extension follow up and locale identifier definition from Felix Sasaki on 2012-06-26 (public-multilingualweb-lt@w3.org from June 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 26 Jun 2012 14:58:55 +0200
To: Mark Davis ☕ <mark@macchiato.com>, "Phillips, Addison" <addison@lab126.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <CAL58czr051AXyPMLBnQMSnbGdH8hgLyChk6ugU3zTOzUJs0pTw@mail.gmail.com>
(Apologies for cross-posting and thanks to Addison for pointing out
www-international),

Thanks for your feedback, Addison and Mark. To summarize the main points:

1) We use BCP 47 language tags in a dedicated piece of markup, e.g.
<span its:filterLocale=”de-ch,fr-ch,it-ch”>Swiss legal notice, only to be
taken into account for localization into a swiss locale</span>
2) We use komma as the delimiter instead of semicolon, see "span" element
above
3) We need to make the relation to BCP 47 filtering clear.
4) We don't need text to point out the "u" extension - people may or may
not use it, but if we go for BCP47 people can use any extension they want.
5) WRT to the tags that Mark mentioned in 1. below: are the "transform" XML
files here
http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47 the
currently registered fields for transforms?

Felix

2012/6/25 Mark Davis ☕ <mark@macchiato.com>

> > Since the "t" extension is also meant to express process related
> information, we want to coordinate the values that can be used via that
> extension with what we define - or just refer to them. What would be the
> best way to achieve this?
>
> There are two possible ways that work work.
>
>    1. Reference LDML for the tags, and propose registrations for any
>    additional ones you need.
>    2. Do #1, but because you have a separate field (that doesn't have to
>    be a BCP47 tag), you can reserve strings that could not be BCP47 subtags
>    for your own use.
>
> We do something similar for short TZ identifiers. We use UN LOCODE codes
> where they exist; where they don't, we use codes that are longer or shorter
> so that they will not collide with future UN LOCODE codes.
>
> > locales...
>
> I agree with Addison on all of the locale issues.
>
> ------------------------------
> Mark <https://plus.google.com/114199149796022210033>
> *
> *
> *— Il meglio è l’inimico del bene —*
> **
>
>
>
> On Mon, Jun 25, 2012 at 12:06 PM, Phillips, Addison <addison@lab126.com>wrote:
>
>> I have a number of thoughts about the locales question. I have not talked
>> to Mark about this and he may not agree with any or all of the below.****
>>
>> ** **
>>
>> It would be more useful, in my opinion, to define section 5.1.3 as a BCP
>> 47 language priority list (with language tags between the separators). I
>> would tend to prefer commas to semi-colons (since these are more common in
>> HTML and in headers, etc.). This “filter” isn’t quite the same thing as BCP
>> 47 “filtering” matching schemes (or is it?) and that probably should be
>> highlighted.****
>>
>> ** **
>>
>> The link to the locale ID section didn’t work, but I found it searching
>> the document. I was dismayed to see the underscore conversion. What purpose
>> does it serve? I’ve found that using language tags with no
>> hyphen/underscore mapping makes for a cleaner, less complex implementation.
>> For one thing, a common case is likely to be a mapping directly between the
>> two. Inserting a transformation adds needless complexity at the markup
>> level. [An implementation can internally map it, if necessary.]****
>>
>> ** **
>>
>> I don’t particularly care for the somewhat artificial distinction between
>> a language tag and a locale identifier in the document. BCP 47 makes having
>> such a separation much less relevant. That is, “de-DE” is a perfectly
>> useful locale identifier—and it’s a valid language tag as well. The “u”
>> extension doesn’t ruin this relationship: “de-DE-u-co-phonebk” is also a
>> valid language tag (besides being useful as a locale identifier). The extra
>> subtags may be ignorable in a translation process, but this doesn’t ruin a
>> locale identifier’s utility as a language tag. Where I encounter the most
>> issues tends to be when mapping must be done between the two concepts
>> instead of tags being useful in both contexts.****
>>
>> ** **
>>
>> I do recognize that you need a separate **field** for “locale” (how
>> language materials are packaged/delivered) from the source or target
>> language of the content in ITS. But I think that the identifiers themselves
>> should not be different from one another. For example, I can see something
>> like the following:****
>>
>> ** **
>>
>>    <someElement xml:lang=”zh-Hans” its:filterLocale=”zh-CN”>中文
>> </someElement>****
>>
>> ** **
>>
>> Finally, you go out of your way to say:****
>>
>> ** **
>>
>> --****
>>
>> Implementations of ITS 2.0 are not expected to process the "u" extension
>> for further locale information as defined in RFC 6067<http://tools.ietf.org/html/rfc6067>
>> .****
>>
>> --****
>>
>> ** **
>>
>> I think you should reconsider this text: it’s not normative but might be
>> read as a normative direction, and implementations of ITS 2.0 might need to
>> interact with the “u” extension: Java 7, for example, has several built-in
>> locales that make use of the extension. I think what you mean is that the
>> extension is ignored in language fields (such as sourceLanguage, etc.)??*
>> ***
>>
>> ** **
>>
>> <chair hat=”on”> btw, you should use the www-international@ list instead
>> of our public WG list going forwards. I have moved public-i18n-core@ to
>> bcc: and copied the winter list for you :-)****
>>
>> ** **
>>
>> Addison****
>>
>> ** **
>>
>> Addison Phillips****
>>
>> Globalization Architect (Lab126)****
>>
>> Chair (W3C I18N WG)****
>>
>> ** **
>>
>> Internationalization is not a feature.****
>>
>> It is an architecture.****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* Felix Sasaki [mailto:fsasaki@w3.org]
>> *Sent:* Monday, June 25, 2012 5:53 AM
>> *To:* Mark Davis
>> *Cc:* public-multilingualweb-lt@w3.org; public-i18n-core@w3.org
>> *Subject:* BCP 47 "t" extension follow up and locale identifier
>> definition****
>>
>> ** **
>>
>> Dear Mark, with CC to the MultilingualWeb LT and the i18n core public
>> list,****
>>
>> ** **
>>
>> I have an action ACTION-133 to follow up on the BCP 47 discussion we had
>> with your contribution on 12 June. Thanks again for your presentation.***
>> *
>>
>> ** **
>>
>> In our requirements document ****
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/****
>>
>> we have several requirements related to processes, see e.g.****
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/#Process_Model
>> ****
>>
>> ** **
>>
>> Since the "t" extension is also meant to express process related
>> information, we want to coordinate the values that can be used via that
>> extension with what we define - or just refer to them. What would be the
>> best way to achieve this?****
>>
>> ** **
>>
>> A related issue: we need to specify information about locales, see e.g.**
>> **
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/#locale-filter****
>>
>> The current thinking about locale identifiers is here****
>>
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/#Identification_of_Language_and_Local
>> ****
>>
>> At the Dublin workshop there was already some feedback from Richard
>> (IIRC): if we have a dedicated field for a local identifier, than a basic
>> BCP 47 language tag (without the underscore conversion) might do it. Do you
>> have any thoughts on this?****
>>
>>
>> Thanks a lot for your feedback in advance,****
>>
>> ** **
>>
>> Felix****
>>
>> ** **
>>
>> --
>> Felix Sasaki****
>>
>> DFKI / W3C Fellow****
>>
>> ** **
>>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 26 June 2012 12:59:26 UTC