W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: BCP 47 "t" extension follow up and locale identifier definition

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 27 Jun 2012 00:14:28 +0200
Message-ID: <CAL58czrM-+phCwb84TmsA9yYWdav0yd+JFn7ZOyj3VxNaN_P-g@mail.gmail.com>
To: Mark Davis ☕ <mark@macchiato.com>
Cc: "Phillips, Addison" <addison@lab126.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "www-international@w3.org" <www-international@w3.org>
Great, thank you for your feedback, Mark and Addison. We will create a
section about the LocaleFilter data category and come back to you for a
review,

Felix

2012/6/26 Mark Davis ☕ <mark@macchiato.com>

> 5) Yes
>
> ------------------------------
> Mark <https://plus.google.com/114199149796022210033>
> *
> *
> *— Il meglio è l’inimico del bene —*
> **
>
>
>
> On Tue, Jun 26, 2012 at 7:49 AM, Phillips, Addison <addison@lab126.com>wrote:
>
>> Hi Felix,****
>>
>> ** **
>>
>> You missed: “remove the hyphen-to-underscore conversion” :-). Otherwise,
>> looks like what we’d suggested.****
>>
>> ** **
>>
>> Addison****
>>
>> ** **
>>
>> *From:* Felix Sasaki [mailto:fsasaki@w3.org]
>> *Sent:* Tuesday, June 26, 2012 5:59 AM
>> *To:* Mark Davis ☕; Phillips, Addison; public-multilingualweb-lt@w3.org;
>> www-international@w3.org
>> *Subject:* Re: BCP 47 "t" extension follow up and locale identifier
>> definition****
>>
>> ** **
>>
>> (Apologies for cross-posting and thanks to Addison for pointing out
>> www-international),****
>>
>> ** **
>>
>> Thanks for your feedback, Addison and Mark. To summarize the main points:
>> ****
>>
>> ** **
>>
>> 1) We use BCP 47 language tags in a dedicated piece of markup, e.g. ****
>>
>> <span its:filterLocale=”de-ch,fr-ch,it-ch”>Swiss legal notice, only to be
>> taken into account for localization into a swiss locale</span>****
>>
>> 2) We use komma as the delimiter instead of semicolon, see "span" element
>> above****
>>
>> 3) We need to make the relation to BCP 47 filtering clear. ****
>>
>> 4) We don't need text to point out the "u" extension - people may or may
>> not use it, but if we go for BCP47 people can use any extension they want.
>> ****
>>
>> 5) WRT to the tags that Mark mentioned in 1. below: are the "transform"
>> XML files here
>> http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47 the
>> currently registered fields for transforms? ****
>>
>> ** **
>>
>> Felix****
>>
>> ** **
>>
>> 2012/6/25 Mark Davis ☕ <mark@macchiato.com>****
>>
>> > Since the "t" extension is also meant to express process related
>> information, we want to coordinate the values that can be used via that
>> extension with what we define - or just refer to them. What would be the
>> best way to achieve this?****
>>
>> There are two possible ways that work work.****
>>
>>    1. Reference LDML for the tags, and propose registrations for any
>>    additional ones you need.****
>>    2. Do #1, but because you have a separate field (that doesn't have to
>>    be a BCP47 tag), you can reserve strings that could not be BCP47 subtags
>>    for your own use.****
>>
>> We do something similar for short TZ identifiers. We use UN LOCODE codes
>> where they exist; where they don't, we use codes that are longer or shorter
>> so that they will not collide with future UN LOCODE codes.****
>>
>> ** **
>>
>> > locales...****
>>
>> ** **
>>
>> I agree with Addison on all of the locale issues.
>> ****
>>
>> ** **
>> ------------------------------
>>
>> Mark <https://plus.google.com/114199149796022210033>****
>>
>> ** **
>>
>> *— Il meglio è l’inimico del bene —*****
>>
>>
>>
>> ****
>>
>> On Mon, Jun 25, 2012 at 12:06 PM, Phillips, Addison <addison@lab126.com>
>> wrote:****
>>
>> I have a number of thoughts about the locales question. I have not talked
>> to Mark about this and he may not agree with any or all of the below.****
>>
>>  ****
>>
>> It would be more useful, in my opinion, to define section 5.1.3 as a BCP
>> 47 language priority list (with language tags between the separators). I
>> would tend to prefer commas to semi-colons (since these are more common in
>> HTML and in headers, etc.). This “filter” isn’t quite the same thing as BCP
>> 47 “filtering” matching schemes (or is it?) and that probably should be
>> highlighted.****
>>
>>  ****
>>
>> The link to the locale ID section didn’t work, but I found it searching
>> the document. I was dismayed to see the underscore conversion. What purpose
>> does it serve? I’ve found that using language tags with no
>> hyphen/underscore mapping makes for a cleaner, less complex implementation.
>> For one thing, a common case is likely to be a mapping directly between the
>> two. Inserting a transformation adds needless complexity at the markup
>> level. [An implementation can internally map it, if necessary.]****
>>
>>  ****
>>
>> I don’t particularly care for the somewhat artificial distinction between
>> a language tag and a locale identifier in the document. BCP 47 makes having
>> such a separation much less relevant. That is, “de-DE” is a perfectly
>> useful locale identifier—and it’s a valid language tag as well. The “u”
>> extension doesn’t ruin this relationship: “de-DE-u-co-phonebk” is also a
>> valid language tag (besides being useful as a locale identifier). The extra
>> subtags may be ignorable in a translation process, but this doesn’t ruin a
>> locale identifier’s utility as a language tag. Where I encounter the most
>> issues tends to be when mapping must be done between the two concepts
>> instead of tags being useful in both contexts.****
>>
>>  ****
>>
>> I do recognize that you need a separate **field** for “locale” (how
>> language materials are packaged/delivered) from the source or target
>> language of the content in ITS. But I think that the identifiers themselves
>> should not be different from one another. For example, I can see something
>> like the following:****
>>
>>  ****
>>
>>    <someElement xml:lang=”zh-Hans” its:filterLocale=”zh-CN”>中文
>> </someElement>****
>>
>>  ****
>>
>> Finally, you go out of your way to say:****
>>
>>  ****
>>
>> --****
>>
>> Implementations of ITS 2.0 are not expected to process the "u" extension
>> for further locale information as defined in RFC 6067<http://tools.ietf.org/html/rfc6067>
>> .****
>>
>> --****
>>
>>  ****
>>
>> I think you should reconsider this text: it’s not normative but might be
>> read as a normative direction, and implementations of ITS 2.0 might need to
>> interact with the “u” extension: Java 7, for example, has several built-in
>> locales that make use of the extension. I think what you mean is that the
>> extension is ignored in language fields (such as sourceLanguage, etc.)??*
>> ***
>>
>>  ****
>>
>> <chair hat=”on”> btw, you should use the www-international@ list instead
>> of our public WG list going forwards. I have moved public-i18n-core@ to
>> bcc: and copied the winter list for you :-)****
>>
>>  ****
>>
>> Addison****
>>
>>  ****
>>
>> Addison Phillips****
>>
>> Globalization Architect (Lab126)****
>>
>> Chair (W3C I18N WG)****
>>
>>  ****
>>
>> Internationalization is not a feature.****
>>
>> It is an architecture.****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> *From:* Felix Sasaki [mailto:fsasaki@w3.org]
>> *Sent:* Monday, June 25, 2012 5:53 AM
>> *To:* Mark Davis
>> *Cc:* public-multilingualweb-lt@w3.org; public-i18n-core@w3.org
>> *Subject:* BCP 47 "t" extension follow up and locale identifier
>> definition****
>>
>>  ****
>>
>> Dear Mark, with CC to the MultilingualWeb LT and the i18n core public
>> list,****
>>
>>  ****
>>
>> I have an action ACTION-133 to follow up on the BCP 47 discussion we had
>> with your contribution on 12 June. Thanks again for your presentation.***
>> *
>>
>>  ****
>>
>> In our requirements document ****
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/****
>>
>> we have several requirements related to processes, see e.g.****
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/#Process_Model
>> ****
>>
>>  ****
>>
>> Since the "t" extension is also meant to express process related
>> information, we want to coordinate the values that can be used via that
>> extension with what we define - or just refer to them. What would be the
>> best way to achieve this?****
>>
>>  ****
>>
>> A related issue: we need to specify information about locales, see e.g.**
>> **
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/#locale-filter****
>>
>> The current thinking about locale identifiers is here****
>>
>>
>> http://www.w3.org/TR/2012/WD-its2req-20120524/#Identification_of_Language_and_Local
>> ****
>>
>> At the Dublin workshop there was already some feedback from Richard
>> (IIRC): if we have a dedicated field for a local identifier, than a basic
>> BCP 47 language tag (without the underscore conversion) might do it. Do you
>> have any thoughts on this?****
>>
>>
>> Thanks a lot for your feedback in advance,****
>>
>>  ****
>>
>> Felix****
>>
>>  ****
>>
>> --
>> Felix Sasaki****
>>
>> DFKI / W3C Fellow****
>>
>>  ****
>>
>> ** **
>>
>>
>>
>> ****
>>
>> ** **
>>
>> --
>> Felix Sasaki****
>>
>> DFKI / W3C Fellow****
>>
>> ** **
>>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 26 June 2012 22:14:55 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:56 UTC