Re: Comment on LTLI WD from Mark Davis on 2006-05-02 (public-i18n-core@w3.org from April to June 2006)

From: Mark Davis <mark.davis@icu-project.org>
Date: Tue, 02 May 2006 10:28:19 -0700
To: Mark Davis <mark.davis@icu-project.org>
CC: Felix Sasaki <fsasaki@w3.org>, Addison Phillips <addison@yahoo-inc.com>, www-i18n-comments@w3.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <445796B3.9010001@icu-project.org>
Sorry for the extra tags; Thunderbird converted some superfluous tags 
left in when I copied and modified the text.

Mark

Mark Davis wrote:
>
>>
>>
>> There is not yet an Internet standard for locale identifiers. 
>> However, there is one for natural language identifiers, [RFC 3066bis] 
>> <http://www.w3.org/International/core/langtags/#rfc3066bis>. Since 
>> these language identifiers can imply a locale and in the absence of a 
>> standard for locale interchange, language identifiers are often used 
>> by software as the source for locale identification. Language and 
>> locale are distinct properties and should not be used 
>> interchangeably, but there is a relationship between these parameters 
>> in the area of resource selection and localization.
>>
>> The danger of using one for the other lies in the distinction between 
>> them. A language preference controls only the language of the textual 
>> content, while locale objects are used to control culturally affected 
>> (software) behavior within the system. For example, making the 
>> assumption that the language parameter /ja/ (Japanese) means the data 
>> should be presented in the locale-determined format for Japan could 
>> be a mistake if the requester actually lives and works in Australia.
>>
> This overstates the issue. There is no a danger using a language tag 
> for locale identification. The danger is in presuming that the region 
> code in the language tag is a reliable indication of the physical 
> location or governing policies for the user. There is also the issue 
> of whether this document is to give workable recommendations, or only 
> survey the field. I find the former more useful.
>
> Here is a suggested reformulation, drawing on Addison's message of 4/27.
>
> =>
>
> The notion of a locale is a computing concept, not a real world 
> object. The actual definition depends entirely on the operating 
> environment, programming language, and application's requirements. 
> However, virtually all specifications of locale identifiers share some 
> core features, and allow for the creation of functional, interoperable 
> applications.
>
> The minimal requirement is the ability to specify the natural 
> language; thus there is industry convergence on the use of [RFC 
> 3066bis] as the core of a locale identifier. 
> <http://www.w3.org/International/core/langtags/#rfc3066bis> For 
> example, [CLDR] uses  
> <http://www.w3.org/International/core/langtags/#rfc3066bis>[RFC 
> 3066bis] <http://www.w3.org/International/core/langtags/#rfc3066bis> 
> as the core of a locale identifier, and provides syntax for extensions 
> for non-linguistic information, such as preferred currency or 
> timezone. [other examples...]
>
> F <http://www.w3.org/International/core/langtags/#rfc3066bis>or locale 
> identifiers 
> <http://www.w3.org/International/core/langtags/#rfc3066bis> it is 
> common (and recommended) to allow either "_" or "-" as subtag 
> delimiters on input, and canonicalize to "_" for uniqueness on output. 
> When extracting a language identifier from a locale identifier, any 
> "_" separators must be converted to "-", and any extensions need to be 
> either removed or encapsulated as extensions (such as with "x-" 
> syntax). <http://www.w3.org/International/core/langtags/#rfc3066bis>
>
> There is one area with a significant semantic difference between 
> locale and language identifiers. In locale identifiers, the region 
> code is often presumed to be a indication of the physical location or 
> governing policies for the user; this is not the case for language 
> identifiers, where the region is used only to discriminate regional 
> variants in language usage. Thus some degree of caution should be used 
> when heuristically using language identifiers as locale identifiers.
>
>
>
> Felix Sasaki wrote:
>> Hi Addison, Mark, all,
>>
>> I started implementing these comments, and the discussion on the locale
>> versus language example at
>> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0020.html 
>> .
>>
>> please have a look at  http://www.w3.org/International/core/langtags/ .
>> I have not used change markup, since in this early stage I expect e a
>> lot of changes.
>>
>>
>> Mark Davis wrote:
>>  
>>> I think we need to have a clear discussion about what constitutes a
>>> locale before progressing further. For my mind (language, timezone),
>>> such as (en_US, Etc/GMT) is one of the clearest cases of a locale, so I
>>> don't know what your mental image of a locale is.
>>>
>>> Addison Phillips wrote:
>>>    
>>>> Hi folks! Nice to see this work progressing...
>>>>
>>>> ---
>>>> Section 1.1: The text describing locales is vague and/or possibly
>>>> sloppy. I think you would be better off being very clear the RFC
>>>> 3066/successor refers to language identification ONLY. Locales can be
>>>> inferred from language identifiers (i.e. Accept-Language) or use
>>>> identical tags in data items (elements, attributes, headers, etc.)
>>>> that serve only the purpose of locale identification. This will help
>>>> preserve (for example) clarity in specs such as XSL F&O where there
>>>> has never been a locale identifier...
>>>>       
>>
>> I made a new try, please have a look.
>>
>>  
>>>> Section 1.2: eliminate comma from first sentence.
>>>>       
>>
>> done.
>>
>>  
>>>> Section 1.2: "However, such formats might apply the definitions made
>>>> in this specification, see e.g. [LDML]." This sentence is unclear.
>>>> Change to say: "One possible source of locale data and data formats is
>>>> [LDML]"??
>>>>       
>>
>> done.
>>
>>  
>>>> Section 1.3: "Web Service Internationalization" should read "Web
>>>> services Internationalization"
>>>>       
>>
>> done.
>>
>>  
>>>> Section 1.3/1.4: Section 1.3 and Section 1.4 should be a single 
>>>> section.
>>>>       
>>
>> done.
>>
>>  
>>>> Section 2.2:
>>>>       
>>
>> following Martin's proposal at
>> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0006.html
>> , this is now a subsection 1.4.
>>
>>  This section mixes languages and locales as if they were
>>  
>>>> the same thing. I think this is dangerous. We spent a lot of time in
>>>> WSTF building text to deal with this in a purposeful way. Language
>>>> tags are for languages. Locales can be inferred from language tags
>>>> (the locale mechanism used inside your programming environment may use
>>>> very different identifiers, cf. LCIDs). Thus item (2) in the list is
>>>> wrong.
>>>>
>>>> Comment: I think you should import text (with minor editing) from Web
>>>> Services Usage Scenarios to describe languages and locales and only
>>>> then launch into values. In particular, I commend you to Section 3.1
>>>> and Section 3.1.1 of
>>>> http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730
>>>>       
>>
>> I reused and adapted section 3.1.1 of ws-i18n-scenarios, please have 
>> a look.
>>
>>  
>>>> Section 2.2: The following is correctly identified as a Bad Thing, but
>>>> I would suggest you remove it altogether       
>>
>> done.
>>
>> because you suggest that it
>>  
>>>> is sometimes okay to infer this. This is just bad practice or an
>>>> application assumption ("default currency"). In fact, this is Section
>>>> I-018 of WSUS
>>>> (http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#S-018)
>>>> "Note that sometimes information is heuristically inferred from
>>>> language or locale identifiers. For example, software might infer that
>>>> if the locale is "fr-FR" that the user's preferred currency is EUR.
>>>> However, that is only a guess because that locale ID does not specify
>>>> the preferred currency. The user may actually be living in the UK, and
>>>> do most transactions in GBP"
>>>>
>>>> Section 2.2: Example 1: This is a bad example because time zone is
>>>> always orthogonal to locale (and language). If you're going to say
>>>> anything about time zones, you should probably require the use of
>>>> Olson identifiers in specifications (a subject beyond the scope of
>>>> this document??)
>>>>       
>>
>> I got rid of the example.
>>
>>  
>>>> Section 2.3: references are to RFC 3066bis? Should be to 
>>>> draft-matching.
>>>>       
>>
>> done & changed in response to Martin's comment, is now section 2.2.
>>
>>  
>>>> Section 3: Item 3: Specifications that define operations on language
>>>> values really should accept both basic and extended ranges.
>>>>       
>>
>> does that mean that we break nearly all existing operations on language
>> values? I'm looking for a conformance criterion which allows CSS and
>> folks to say "in CSS 2.0, we do basic ranges, and that's fine". A new
>> version of CSS or spec XXX should do both, but I don't want to break
>> existing RECs.
>>
>>
>>  
>>>> What's
>>>> important to specify is the matching scheme itself.
>>>>
>>>> Item 5: I don't like this item at all.       
>>
>> I got rid of it.
>>
>> If you want to use an IRI to
>>  
>>>> point to some "information item", fine: that's your own choice and
>>>> none of our business. But this requirement as written means nothing
>>>> and will only serve to confuse people. I think you'd be better off
>>>> sticking with saying something like "use the same format for locale
>>>> IDs as language tags". If someone can propose a workable IRI solution,
>>>> you can then incorporate that. The point (I think) is to avoid having
>>>> nine ways of identifying a locale.
>>>>
>>>> Editorial: In the note, this phrase "are conform to these criteria"
>>>> should say "conformant"
>>>>       
>>
>> done.
>>
>>  
>>>> General: I really think you should write about language identification
>>>> and then about inferring locale from it. In particular, I would
>>>> suggest that you consider adding something like these requirements:
>>>>       
>>
>> I'd like to discuss these proposals with the core group first (see "cc"
>> of this mail).
>>
>>
>>  
>>>> - Specifications MUST NOT use the xml:lang attribute to convey locale
>>>> information. // specs must not promote poor behavior. Xml:lang
>>>> identifies natural language usage in a document.
>>>>       
>>
>> o.k.
>>
>>  
>>>> - Specifications MUST define the default behavior for matching of
>>>> language content (see draft-matching, Section 3.4.1)
>>>>       
>>
>> same concern as above: danger of breaking existing RECs. We will get *a
>> lot* of last call comments with such a criterion ..
>>
>>  
>>>> - Specifications that use HTTP 1.1 SHOULD allow an application to
>>>> infer a user's locale preferences from the HTTP Accept-Language
>>>> header. // or something like this, eh?
>>>>       
>>
>> how does this criterion relate to the following? It sounds like "HTTP
>> 1.1" will be an exception to the following criterion?
>>
>>  
>>>> - Specifications that define the exchange of locale information MUST
>>>> define locale identifiers in terms of RFC 3066bis language tags and
>>>> MAY define specific extensions or private-use codes to identify
>>>> additional information. // this is the big one
>>>>       
>>
>>
>> Looking forward for more feedback.
>>
>> Best regards, Felix.
>>
>>
>>  
>>>> ----
>>>> As always, my best regards,
>>>>
>>>> Addison
>>>>
>>>> Addison Phillips
>>>> Internationalization Architect - Yahoo! Inc.
>>>>
>>>> Internationalization is an architecture.
>>>> It is not a feature.
>>>>
>>>>
>>>>
>>>>         
>>
>>
>>   
>
>
>
Received on Tuesday, 2 May 2006 17:28:36 UTC