W3C home > Mailing lists > Public > www-international@w3.org > April to June 2012

RE: BCP 47 "t" extension follow up and locale identifier definition

From: Phillips, Addison <addison@lab126.com>
Date: Mon, 25 Jun 2012 12:06:06 -0700
To: Felix Sasaki <fsasaki@w3.org>, Mark Davis <mark@macchiato.com>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476AAC7B6139@EX-SEA31-D.ant.amazon.com>
I have a number of thoughts about the locales question. I have not talked to Mark about this and he may not agree with any or all of the below.

It would be more useful, in my opinion, to define section 5.1.3 as a BCP 47 language priority list (with language tags between the separators). I would tend to prefer commas to semi-colons (since these are more common in HTML and in headers, etc.). This “filter” isn’t quite the same thing as BCP 47 “filtering” matching schemes (or is it?) and that probably should be highlighted.

The link to the locale ID section didn’t work, but I found it searching the document. I was dismayed to see the underscore conversion. What purpose does it serve? I’ve found that using language tags with no hyphen/underscore mapping makes for a cleaner, less complex implementation. For one thing, a common case is likely to be a mapping directly between the two. Inserting a transformation adds needless complexity at the markup level. [An implementation can internally map it, if necessary.]

I don’t particularly care for the somewhat artificial distinction between a language tag and a locale identifier in the document. BCP 47 makes having such a separation much less relevant. That is, “de-DE” is a perfectly useful locale identifier—and it’s a valid language tag as well. The “u” extension doesn’t ruin this relationship: “de-DE-u-co-phonebk” is also a valid language tag (besides being useful as a locale identifier). The extra subtags may be ignorable in a translation process, but this doesn’t ruin a locale identifier’s utility as a language tag. Where I encounter the most issues tends to be when mapping must be done between the two concepts instead of tags being useful in both contexts.

I do recognize that you need a separate *field* for “locale” (how language materials are packaged/delivered) from the source or target language of the content in ITS. But I think that the identifiers themselves should not be different from one another. For example, I can see something like the following:

   <someElement xml:lang=”zh-Hans” its:filterLocale=”zh-CN”>中文</someElement>

Finally, you go out of your way to say:

--
Implementations of ITS 2.0 are not expected to process the "u" extension for further locale information as defined in RFC 6067<http://tools.ietf.org/html/rfc6067>.
--

I think you should reconsider this text: it’s not normative but might be read as a normative direction, and implementations of ITS 2.0 might need to interact with the “u” extension: Java 7, for example, has several built-in locales that make use of the extension. I think what you mean is that the extension is ignored in language fields (such as sourceLanguage, etc.)??

<chair hat=”on”> btw, you should use the www-international@ list instead of our public WG list going forwards. I have moved public-i18n-core@ to bcc: and copied the winter list for you :-)

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.





From: Felix Sasaki [mailto:fsasaki@w3.org]<mailto:[mailto:fsasaki@w3.org]>
Sent: Monday, June 25, 2012 5:53 AM
To: Mark Davis
Cc: public-multilingualweb-lt@w3.org<mailto:public-multilingualweb-lt@w3.org>; public-i18n-core@w3.org<mailto:public-i18n-core@w3.org>
Subject: BCP 47 "t" extension follow up and locale identifier definition

Dear Mark, with CC to the MultilingualWeb LT and the i18n core public list,

I have an action ACTION-133 to follow up on the BCP 47 discussion we had with your contribution on 12 June. Thanks again for your presentation.

In our requirements document
http://www.w3.org/TR/2012/WD-its2req-20120524/

we have several requirements related to processes, see e.g.
http://www.w3.org/TR/2012/WD-its2req-20120524/#Process_Model


Since the "t" extension is also meant to express process related information, we want to coordinate the values that can be used via that extension with what we define - or just refer to them. What would be the best way to achieve this?

A related issue: we need to specify information about locales, see e.g.
http://www.w3.org/TR/2012/WD-its2req-20120524/#locale-filter

The current thinking about locale identifiers is here
http://www.w3.org/TR/2012/WD-its2req-20120524/#Identification_of_Language_and_Local

At the Dublin workshop there was already some feedback from Richard (IIRC): if we have a dedicated field for a local identifier, than a basic BCP 47 language tag (without the underscore conversion) might do it. Do you have any thoughts on this?

Thanks a lot for your feedback in advance,

Felix

--
Felix Sasaki
DFKI / W3C Fellow

Received on Monday, 25 June 2012 19:06:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 25 June 2012 19:06:38 GMT