W3C home > Mailing lists > Public > www-international@w3.org > April to June 2012

RE: BCP 47 "t" extension follow up and locale identifier definition

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 26 Jun 2012 07:49:13 -0700
To: Felix Sasaki <fsasaki@w3.org>, Mark Davis ☕ <mark@macchiato.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476AAC7B677D@EX-SEA31-D.ant.amazon.com>
Hi Felix,

You missed: “remove the hyphen-to-underscore conversion” :-). Otherwise, looks like what we’d suggested.


From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Tuesday, June 26, 2012 5:59 AM
To: Mark Davis ☕; Phillips, Addison; public-multilingualweb-lt@w3.org; www-international@w3.org
Subject: Re: BCP 47 "t" extension follow up and locale identifier definition

(Apologies for cross-posting and thanks to Addison for pointing out www-international),

Thanks for your feedback, Addison and Mark. To summarize the main points:

1) We use BCP 47 language tags in a dedicated piece of markup, e.g.
<span its:filterLocale=”de-ch,fr-ch,it-ch”>Swiss legal notice, only to be taken into account for localization into a swiss locale</span>
2) We use komma as the delimiter instead of semicolon, see "span" element above
3) We need to make the relation to BCP 47 filtering clear.
4) We don't need text to point out the "u" extension - people may or may not use it, but if we go for BCP47 people can use any extension they want.
5) WRT to the tags that Mark mentioned in 1. below: are the "transform" XML files here http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47 the currently registered fields for transforms?


2012/6/25 Mark Davis ☕ <mark@macchiato.com<mailto:mark@macchiato.com>>
> Since the "t" extension is also meant to express process related information, we want to coordinate the values that can be used via that extension with what we define - or just refer to them. What would be the best way to achieve this?
There are two possible ways that work work.

 1.  Reference LDML for the tags, and propose registrations for any additional ones you need.
 2.  Do #1, but because you have a separate field (that doesn't have to be a BCP47 tag), you can reserve strings that could not be BCP47 subtags for your own use.
We do something similar for short TZ identifiers. We use UN LOCODE codes where they exist; where they don't, we use codes that are longer or shorter so that they will not collide with future UN LOCODE codes.

> locales...

I agree with Addison on all of the locale issues.


— Il meglio è l’inimico del bene —

On Mon, Jun 25, 2012 at 12:06 PM, Phillips, Addison <addison@lab126.com<mailto:addison@lab126.com>> wrote:
I have a number of thoughts about the locales question. I have not talked to Mark about this and he may not agree with any or all of the below.

It would be more useful, in my opinion, to define section 5.1.3 as a BCP 47 language priority list (with language tags between the separators). I would tend to prefer commas to semi-colons (since these are more common in HTML and in headers, etc.). This “filter” isn’t quite the same thing as BCP 47 “filtering” matching schemes (or is it?) and that probably should be highlighted.

The link to the locale ID section didn’t work, but I found it searching the document. I was dismayed to see the underscore conversion. What purpose does it serve? I’ve found that using language tags with no hyphen/underscore mapping makes for a cleaner, less complex implementation. For one thing, a common case is likely to be a mapping directly between the two. Inserting a transformation adds needless complexity at the markup level. [An implementation can internally map it, if necessary.]

I don’t particularly care for the somewhat artificial distinction between a language tag and a locale identifier in the document. BCP 47 makes having such a separation much less relevant. That is, “de-DE” is a perfectly useful locale identifier—and it’s a valid language tag as well. The “u” extension doesn’t ruin this relationship: “de-DE-u-co-phonebk” is also a valid language tag (besides being useful as a locale identifier). The extra subtags may be ignorable in a translation process, but this doesn’t ruin a locale identifier’s utility as a language tag. Where I encounter the most issues tends to be when mapping must be done between the two concepts instead of tags being useful in both contexts.

I do recognize that you need a separate *field* for “locale” (how language materials are packaged/delivered) from the source or target language of the content in ITS. But I think that the identifiers themselves should not be different from one another. For example, I can see something like the following:

   <someElement xml:lang=”zh-Hans” its:filterLocale=”zh-CN”>中文</someElement>

Finally, you go out of your way to say:

Implementations of ITS 2.0 are not expected to process the "u" extension for further locale information as defined in RFC 6067<http://tools.ietf.org/html/rfc6067>.

I think you should reconsider this text: it’s not normative but might be read as a normative direction, and implementations of ITS 2.0 might need to interact with the “u” extension: Java 7, for example, has several built-in locales that make use of the extension. I think what you mean is that the extension is ignored in language fields (such as sourceLanguage, etc.)??

<chair hat=”on”> btw, you should use the www-international@ list instead of our public WG list going forwards. I have moved public-i18n-core@ to bcc: and copied the winter list for you :-)


Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

From: Felix Sasaki [mailto:fsasaki@w3.org]<mailto:[mailto:fsasaki@w3.org]>
Sent: Monday, June 25, 2012 5:53 AM
To: Mark Davis
Cc: public-multilingualweb-lt@w3.org<mailto:public-multilingualweb-lt@w3.org>; public-i18n-core@w3.org<mailto:public-i18n-core@w3.org>
Subject: BCP 47 "t" extension follow up and locale identifier definition

Dear Mark, with CC to the MultilingualWeb LT and the i18n core public list,

I have an action ACTION-133 to follow up on the BCP 47 discussion we had with your contribution on 12 June. Thanks again for your presentation.

In our requirements document

we have several requirements related to processes, see e.g.

Since the "t" extension is also meant to express process related information, we want to coordinate the values that can be used via that extension with what we define - or just refer to them. What would be the best way to achieve this?

A related issue: we need to specify information about locales, see e.g.

The current thinking about locale identifiers is here

At the Dublin workshop there was already some feedback from Richard (IIRC): if we have a dedicated field for a local identifier, than a basic BCP 47 language tag (without the underscore conversion) might do it. Do you have any thoughts on this?

Thanks a lot for your feedback in advance,


Felix Sasaki
DFKI / W3C Fellow

Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 26 June 2012 14:49:52 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:33 UTC