Re: Comment on LTLI WD from Mark Davis on 2006-05-02 (www-i18n-comments@w3.org from May 2006)

From: Mark Davis <mark.davis@icu-project.org>
Date: Tue, 02 May 2006 10:25:49 -0700
To: Felix Sasaki <fsasaki@w3.org>
CC: Addison Phillips <addison@yahoo-inc.com>, www-i18n-comments@w3.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <4457961D.9050400@icu-project.org>
>
>
> There is not yet an Internet standard for locale identifiers. However, 
> there is one for natural language identifiers, [RFC 3066bis] 
> <http://www.w3.org/International/core/langtags/#rfc3066bis>. Since 
> these language identifiers can imply a locale and in the absence of a 
> standard for locale interchange, language identifiers are often used 
> by software as the source for locale identification. Language and 
> locale are distinct properties and should not be used interchangeably, 
> but there is a relationship between these parameters in the area of 
> resource selection and localization.
>
> The danger of using one for the other lies in the distinction between 
> them. A language preference controls only the language of the textual 
> content, while locale objects are used to control culturally affected 
> (software) behavior within the system. For example, making the 
> assumption that the language parameter /ja/ (Japanese) means the data 
> should be presented in the locale-determined format for Japan could be 
> a mistake if the requester actually lives and works in Australia.
>
This overstates the issue. There is no a danger using a language tag for 
locale identification. The danger is in presuming that the region code 
in the language tag is a reliable indication of the physical location or 
governing policies for the user. There is also the issue of whether this 
document is to give workable recommendations, or only survey the field. 
I find the former more useful.

Here is a suggested reformulation, drawing on Addison's message of 4/27.

=>

The notion of a locale is a computing concept, not a real world object. The actual definition depends entirely on the operating environment, programming language, and application's requirements. However, virtually all specifications of locale identifiers share some core features, and allow for the creation of functional, interoperable applications.

The minimal requirement is the ability to specify the natural language; thus there is industry convergence on the use of [RFC 3066bis] as the core of a locale identifier. <http://www.w3.org/International/core/langtags/#rfc3066bis> For example, [CLDR] uses  <http://www.w3.org/International/core/langtags/#rfc3066bis>[RFC 3066bis] <http://www.w3.org/International/core/langtags/#rfc3066bis> as the core of a locale identifier, and provides syntax for extensions for non-linguistic information, such as preferred currency or timezone. [other examples...]

F <http://www.w3.org/International/core/langtags/#rfc3066bis>or locale identifiers <http://www.w3.org/International/core/langtags/#rfc3066bis> it is common (and recommended) to allow either "_" or "-" as subtag delimiters on input, and canonicalize to "_" for uniqueness on output. When extracting a language identifier from a locale identifier, any "_" separators must be converted to "-", and any extensions need to be either removed or encapsulated as extensions (such as with "x-" syntax). <http://www.w3.org/International/core/langtags/#rfc3066bis>

There is one area with a significant semantic difference between locale and language identifiers. In locale identifiers, the region code is often presumed to be a indication of the physical location or governing policies for the user; this is not the case for language identifiers, where the region is used only to discriminate regional variants in language usage. Thus some degree of caution should be used when heuristically using language identifiers as locale identifiers.



Felix Sasaki wrote:
> Hi Addison, Mark, all,
>
> I started implementing these comments, and the discussion on the locale
> versus language example at
> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0020.html .
>
> please have a look at  http://www.w3.org/International/core/langtags/ .
> I have not used change markup, since in this early stage I expect e a
> lot of changes.
>
>
> Mark Davis wrote:
>   
>> I think we need to have a clear discussion about what constitutes a
>> locale before progressing further. For my mind (language, timezone),
>> such as (en_US, Etc/GMT) is one of the clearest cases of a locale, so I
>> don't know what your mental image of a locale is.
>>
>> Addison Phillips wrote:
>>     
>>> Hi folks! Nice to see this work progressing...
>>>
>>> ---
>>> Section 1.1: The text describing locales is vague and/or possibly
>>> sloppy. I think you would be better off being very clear the RFC
>>> 3066/successor refers to language identification ONLY. Locales can be
>>> inferred from language identifiers (i.e. Accept-Language) or use
>>> identical tags in data items (elements, attributes, headers, etc.)
>>> that serve only the purpose of locale identification. This will help
>>> preserve (for example) clarity in specs such as XSL F&O where there
>>> has never been a locale identifier...
>>>       
>
> I made a new try, please have a look.
>
>   
>>> Section 1.2: eliminate comma from first sentence.
>>>       
>
> done.
>
>   
>>> Section 1.2: "However, such formats might apply the definitions made
>>> in this specification, see e.g. [LDML]." This sentence is unclear.
>>> Change to say: "One possible source of locale data and data formats is
>>> [LDML]"??
>>>       
>
> done.
>
>   
>>> Section 1.3: "Web Service Internationalization" should read "Web
>>> services Internationalization"
>>>       
>
> done.
>
>   
>>> Section 1.3/1.4: Section 1.3 and Section 1.4 should be a single section.
>>>       
>
> done.
>
>   
>>> Section 2.2:
>>>       
>
> following Martin's proposal at
> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0006.html
> , this is now a subsection 1.4.
>
>  This section mixes languages and locales as if they were
>   
>>> the same thing. I think this is dangerous. We spent a lot of time in
>>> WSTF building text to deal with this in a purposeful way. Language
>>> tags are for languages. Locales can be inferred from language tags
>>> (the locale mechanism used inside your programming environment may use
>>> very different identifiers, cf. LCIDs). Thus item (2) in the list is
>>> wrong.
>>>
>>> Comment: I think you should import text (with minor editing) from Web
>>> Services Usage Scenarios to describe languages and locales and only
>>> then launch into values. In particular, I commend you to Section 3.1
>>> and Section 3.1.1 of
>>> http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730
>>>       
>
> I reused and adapted section 3.1.1 of ws-i18n-scenarios, please have a look.
>
>   
>>> Section 2.2: The following is correctly identified as a Bad Thing, but
>>> I would suggest you remove it altogether 
>>>       
>
> done.
>
> because you suggest that it
>   
>>> is sometimes okay to infer this. This is just bad practice or an
>>> application assumption ("default currency"). In fact, this is Section
>>> I-018 of WSUS
>>> (http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#S-018)
>>> "Note that sometimes information is heuristically inferred from
>>> language or locale identifiers. For example, software might infer that
>>> if the locale is "fr-FR" that the user's preferred currency is EUR.
>>> However, that is only a guess because that locale ID does not specify
>>> the preferred currency. The user may actually be living in the UK, and
>>> do most transactions in GBP"
>>>
>>> Section 2.2: Example 1: This is a bad example because time zone is
>>> always orthogonal to locale (and language). If you're going to say
>>> anything about time zones, you should probably require the use of
>>> Olson identifiers in specifications (a subject beyond the scope of
>>> this document??)
>>>       
>
> I got rid of the example.
>
>   
>>> Section 2.3: references are to RFC 3066bis? Should be to draft-matching.
>>>       
>
> done & changed in response to Martin's comment, is now section 2.2.
>
>   
>>> Section 3: Item 3: Specifications that define operations on language
>>> values really should accept both basic and extended ranges.
>>>       
>
> does that mean that we break nearly all existing operations on language
> values? I'm looking for a conformance criterion which allows CSS and
> folks to say "in CSS 2.0, we do basic ranges, and that's fine". A new
> version of CSS or spec XXX should do both, but I don't want to break
> existing RECs.
>
>
>   
>>> What's
>>> important to specify is the matching scheme itself.
>>>
>>> Item 5: I don't like this item at all. 
>>>       
>
> I got rid of it.
>
> If you want to use an IRI to
>   
>>> point to some "information item", fine: that's your own choice and
>>> none of our business. But this requirement as written means nothing
>>> and will only serve to confuse people. I think you'd be better off
>>> sticking with saying something like "use the same format for locale
>>> IDs as language tags". If someone can propose a workable IRI solution,
>>> you can then incorporate that. The point (I think) is to avoid having
>>> nine ways of identifying a locale.
>>>
>>> Editorial: In the note, this phrase "are conform to these criteria"
>>> should say "conformant"
>>>       
>
> done.
>
>   
>>> General: I really think you should write about language identification
>>> and then about inferring locale from it. In particular, I would
>>> suggest that you consider adding something like these requirements:
>>>       
>
> I'd like to discuss these proposals with the core group first (see "cc"
> of this mail).
>
>
>   
>>> - Specifications MUST NOT use the xml:lang attribute to convey locale
>>> information. // specs must not promote poor behavior. Xml:lang
>>> identifies natural language usage in a document.
>>>       
>
> o.k.
>
>   
>>> - Specifications MUST define the default behavior for matching of
>>> language content (see draft-matching, Section 3.4.1)
>>>       
>
> same concern as above: danger of breaking existing RECs. We will get *a
> lot* of last call comments with such a criterion ..
>
>   
>>> - Specifications that use HTTP 1.1 SHOULD allow an application to
>>> infer a user's locale preferences from the HTTP Accept-Language
>>> header. // or something like this, eh?
>>>       
>
> how does this criterion relate to the following? It sounds like "HTTP
> 1.1" will be an exception to the following criterion?
>
>   
>>> - Specifications that define the exchange of locale information MUST
>>> define locale identifiers in terms of RFC 3066bis language tags and
>>> MAY define specific extensions or private-use codes to identify
>>> additional information. // this is the big one
>>>       
>
>
> Looking forward for more feedback.
>
> Best regards, Felix.
>
>
>   
>>> ----
>>> As always, my best regards,
>>>
>>> Addison
>>>
>>> Addison Phillips
>>> Internationalization Architect - Yahoo! Inc.
>>>
>>> Internationalization is an architecture.
>>> It is not a feature.
>>>
>>>
>>>
>>>   
>>>       
>
>
>
Received on Tuesday, 2 May 2006 17:26:17 UTC