RE: ITS 2.0 LocaleFilter definition (Re: [Moderator Action] RE: BCP 47 "t" extension follow up and locale identifier definition) from Phillips, Addison on 2012-08-07 (www-international@w3.org from July to September 2012)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 7 Aug 2012 08:56:17 -0700
To: Felix Sasaki <fsasaki@w3.org>, Mark Davis ☕ <mark@macchiato.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC347708FAF4F1F@EX-SEA31-D.ant.amazon.com>
Hello Felix,

Thanks for the update. A few minor comments.

The examples show something like:

  <legalnotice
    its:localeFilterList="en-CA, fr-CA">
   <para>This legal notice is only for Canadian locales.</para>


However, if one is doing extended filtering, the range *-CA would cover all Canadian locales, including, for example, minority languages, such as the various aboriginal languages or Chinese.

However, I think it’s valuable to have a comma-separated language priority list as an example, which is what I think you’re actually demonstrating here.

In any case, thanks for the changes. This looks good.

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.




From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Tuesday, August 07, 2012 2:28 AM
To: Phillips, Addison; Mark Davis ☕; public-multilingualweb-lt@w3.org; www-international@w3.org
Subject: Re: ITS 2.0 LocaleFilter definition (Re: [Moderator Action] RE: BCP 47 "t" extension follow up and locale identifier definition)

Hi Addison, all again,

FYI, I had an action item to discuss extended filtering in the MLW-LT working group. It looks like we will have consensus on having extended filtering, see the latest edits at

http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#LocaleFilter


Best,

Felix
2012/7/23 Felix Sasaki <fsasaki@w3.org<mailto:fsasaki@w3.org>>
Hi Addison, all,

coming back to the "locale" definition in ITS 2.0 we had discussed a while ago: Shaun McCane from the MLW-LT working group has created a locale filter definition, see

http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#LocaleFilter


I think this implements 1-4 below - do you want to have a look?

Thanks,

Felix
2012/6/26 Phillips, Addison <addison@lab126.com<mailto:addison@lab126.com>>
Hi Felix,

You missed: “remove the hyphen-to-underscore conversion” :-). Otherwise, looks like what we’d suggested.

Addison

From: Felix Sasaki [mailto:fsasaki@w3.org<mailto:fsasaki@w3.org>]
Sent: Tuesday, June 26, 2012 5:59 AM
To: Mark Davis ☕; Phillips, Addison; public-multilingualweb-lt@w3.org<mailto:public-multilingualweb-lt@w3.org>; www-international@w3.org<mailto:www-international@w3.org>
Subject: Re: BCP 47 "t" extension follow up and locale identifier definition

(Apologies for cross-posting and thanks to Addison for pointing out www-international),

Thanks for your feedback, Addison and Mark. To summarize the main points:

1) We use BCP 47 language tags in a dedicated piece of markup, e.g.
<span its:filterLocale=”de-ch,fr-ch,it-ch”>Swiss legal notice, only to be taken into account for localization into a swiss locale</span>
2) We use komma as the delimiter instead of semicolon, see "span" element above
3) We need to make the relation to BCP 47 filtering clear.
4) We don't need text to point out the "u" extension - people may or may not use it, but if we go for BCP47 people can use any extension they want.
5) WRT to the tags that Mark mentioned in 1. below: are the "transform" XML files here http://unicode.org/cldr/trac/browser/tags/release-21-0-2/common/bcp47 the currently registered fields for transforms?

Felix

2012/6/25 Mark Davis ☕ <mark@macchiato.com<mailto:mark@macchiato.com>>
> Since the "t" extension is also meant to express process related information, we want to coordinate the values that can be used via that extension with what we define - or just refer to them. What would be the best way to achieve this?
There are two possible ways that work work.

 1.  Reference LDML for the tags, and propose registrations for any additional ones you need.
 2.  Do #1, but because you have a separate field (that doesn't have to be a BCP47 tag), you can reserve strings that could not be BCP47 subtags for your own use.
We do something similar for short TZ identifiers. We use UN LOCODE codes where they exist; where they don't, we use codes that are longer or shorter so that they will not collide with future UN LOCODE codes.

> locales...

I agree with Addison on all of the locale issues.

________________________________
Mark<https://plus.google.com/114199149796022210033>

— Il meglio è l’inimico del bene —

On Mon, Jun 25, 2012 at 12:06 PM, Phillips, Addison <addison@lab126.com<mailto:addison@lab126.com>> wrote:
I have a number of thoughts about the locales question. I have not talked to Mark about this and he may not agree with any or all of the below.

It would be more useful, in my opinion, to define section 5.1.3 as a BCP 47 language priority list (with language tags between the separators). I would tend to prefer commas to semi-colons (since these are more common in HTML and in headers, etc.). This “filter” isn’t quite the same thing as BCP 47 “filtering” matching schemes (or is it?) and that probably should be highlighted.

The link to the locale ID section didn’t work, but I found it searching the document. I was dismayed to see the underscore conversion. What purpose does it serve? I’ve found that using language tags with no hyphen/underscore mapping makes for a cleaner, less complex implementation. For one thing, a common case is likely to be a mapping directly between the two. Inserting a transformation adds needless complexity at the markup level. [An implementation can internally map it, if necessary.]

I don’t particularly care for the somewhat artificial distinction between a language tag and a locale identifier in the document. BCP 47 makes having such a separation much less relevant. That is, “de-DE” is a perfectly useful locale identifier—and it’s a valid language tag as well. The “u” extension doesn’t ruin this relationship: “de-DE-u-co-phonebk” is also a valid language tag (besides being useful as a locale identifier). The extra subtags may be ignorable in a translation process, but this doesn’t ruin a locale identifier’s utility as a language tag. Where I encounter the most issues tends to be when mapping must be done between the two concepts instead of tags being useful in both contexts.

I do recognize that you need a separate *field* for “locale” (how language materials are packaged/delivered) from the source or target language of the content in ITS. But I think that the identifiers themselves should not be different from one another. For example, I can see something like the following:

   <someElement xml:lang=”zh-Hans” its:filterLocale=”zh-CN”>中文</someElement>

Finally, you go out of your way to say:

--
Implementations of ITS 2.0 are not expected to process the "u" extension for further locale information as defined in RFC 6067<http://tools.ietf.org/html/rfc6067>.
--

I think you should reconsider this text: it’s not normative but might be read as a normative direction, and implementations of ITS 2.0 might need to interact with the “u” extension: Java 7, for example, has several built-in locales that make use of the extension. I think what you mean is that the extension is ignored in language fields (such as sourceLanguage, etc.)??

<chair hat=”on”> btw, you should use the www-international@ list instead of our public WG list going forwards. I have moved public-i18n-core@ to bcc: and copied the winter list for you :-)

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.





From: Felix Sasaki [mailto:fsasaki@w3.org]<mailto:[mailto:fsasaki@w3.org]>
Sent: Monday, June 25, 2012 5:53 AM
To: Mark Davis
Cc: public-multilingualweb-lt@w3.org<mailto:public-multilingualweb-lt@w3.org>; public-i18n-core@w3.org<mailto:public-i18n-core@w3.org>
Subject: BCP 47 "t" extension follow up and locale identifier definition

Dear Mark, with CC to the MultilingualWeb LT and the i18n core public list,

I have an action ACTION-133 to follow up on the BCP 47 discussion we had with your contribution on 12 June. Thanks again for your presentation.

In our requirements document
http://www.w3.org/TR/2012/WD-its2req-20120524/

we have several requirements related to processes, see e.g.
http://www.w3.org/TR/2012/WD-its2req-20120524/#Process_Model


Since the "t" extension is also meant to express process related information, we want to coordinate the values that can be used via that extension with what we define - or just refer to them. What would be the best way to achieve this?

A related issue: we need to specify information about locales, see e.g.
http://www.w3.org/TR/2012/WD-its2req-20120524/#locale-filter

The current thinking about locale identifiers is here
http://www.w3.org/TR/2012/WD-its2req-20120524/#Identification_of_Language_and_Local

At the Dublin workshop there was already some feedback from Richard (IIRC): if we have a dedicated field for a local identifier, than a basic BCP 47 language tag (without the underscore conversion) might do it. Do you have any thoughts on this?

Thanks a lot for your feedback in advance,

Felix

--
Felix Sasaki
DFKI / W3C Fellow





--
Felix Sasaki
DFKI / W3C Fellow




--
Felix Sasaki
DFKI / W3C Fellow




--
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 7 August 2012 15:56:54 UTC