- From: Mark Davis <mark.davis@icu-project.org>
- Date: Tue, 02 May 2006 10:28:19 -0700
- To: Mark Davis <mark.davis@icu-project.org>
- CC: Felix Sasaki <fsasaki@w3.org>, Addison Phillips <addison@yahoo-inc.com>, www-i18n-comments@w3.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Sorry for the extra tags; Thunderbird converted some superfluous tags
left in when I copied and modified the text.
Mark
Mark Davis wrote:
>
>>
>>
>> There is not yet an Internet standard for locale identifiers.
>> However, there is one for natural language identifiers, [RFC 3066bis]
>> <http://www.w3.org/International/core/langtags/#rfc3066bis>. Since
>> these language identifiers can imply a locale and in the absence of a
>> standard for locale interchange, language identifiers are often used
>> by software as the source for locale identification. Language and
>> locale are distinct properties and should not be used
>> interchangeably, but there is a relationship between these parameters
>> in the area of resource selection and localization.
>>
>> The danger of using one for the other lies in the distinction between
>> them. A language preference controls only the language of the textual
>> content, while locale objects are used to control culturally affected
>> (software) behavior within the system. For example, making the
>> assumption that the language parameter /ja/ (Japanese) means the data
>> should be presented in the locale-determined format for Japan could
>> be a mistake if the requester actually lives and works in Australia.
>>
> This overstates the issue. There is no a danger using a language tag
> for locale identification. The danger is in presuming that the region
> code in the language tag is a reliable indication of the physical
> location or governing policies for the user. There is also the issue
> of whether this document is to give workable recommendations, or only
> survey the field. I find the former more useful.
>
> Here is a suggested reformulation, drawing on Addison's message of 4/27.
>
> =>
>
> The notion of a locale is a computing concept, not a real world
> object. The actual definition depends entirely on the operating
> environment, programming language, and application's requirements.
> However, virtually all specifications of locale identifiers share some
> core features, and allow for the creation of functional, interoperable
> applications.
>
> The minimal requirement is the ability to specify the natural
> language; thus there is industry convergence on the use of [RFC
> 3066bis] as the core of a locale identifier.
> <http://www.w3.org/International/core/langtags/#rfc3066bis> For
> example, [CLDR] uses
> <http://www.w3.org/International/core/langtags/#rfc3066bis>[RFC
> 3066bis] <http://www.w3.org/International/core/langtags/#rfc3066bis>
> as the core of a locale identifier, and provides syntax for extensions
> for non-linguistic information, such as preferred currency or
> timezone. [other examples...]
>
> F <http://www.w3.org/International/core/langtags/#rfc3066bis>or locale
> identifiers
> <http://www.w3.org/International/core/langtags/#rfc3066bis> it is
> common (and recommended) to allow either "_" or "-" as subtag
> delimiters on input, and canonicalize to "_" for uniqueness on output.
> When extracting a language identifier from a locale identifier, any
> "_" separators must be converted to "-", and any extensions need to be
> either removed or encapsulated as extensions (such as with "x-"
> syntax). <http://www.w3.org/International/core/langtags/#rfc3066bis>
>
> There is one area with a significant semantic difference between
> locale and language identifiers. In locale identifiers, the region
> code is often presumed to be a indication of the physical location or
> governing policies for the user; this is not the case for language
> identifiers, where the region is used only to discriminate regional
> variants in language usage. Thus some degree of caution should be used
> when heuristically using language identifiers as locale identifiers.
>
>
>
> Felix Sasaki wrote:
>> Hi Addison, Mark, all,
>>
>> I started implementing these comments, and the discussion on the locale
>> versus language example at
>> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0020.html
>> .
>>
>> please have a look at http://www.w3.org/International/core/langtags/ .
>> I have not used change markup, since in this early stage I expect e a
>> lot of changes.
>>
>>
>> Mark Davis wrote:
>>
>>> I think we need to have a clear discussion about what constitutes a
>>> locale before progressing further. For my mind (language, timezone),
>>> such as (en_US, Etc/GMT) is one of the clearest cases of a locale, so I
>>> don't know what your mental image of a locale is.
>>>
>>> Addison Phillips wrote:
>>>
>>>> Hi folks! Nice to see this work progressing...
>>>>
>>>> ---
>>>> Section 1.1: The text describing locales is vague and/or possibly
>>>> sloppy. I think you would be better off being very clear the RFC
>>>> 3066/successor refers to language identification ONLY. Locales can be
>>>> inferred from language identifiers (i.e. Accept-Language) or use
>>>> identical tags in data items (elements, attributes, headers, etc.)
>>>> that serve only the purpose of locale identification. This will help
>>>> preserve (for example) clarity in specs such as XSL F&O where there
>>>> has never been a locale identifier...
>>>>
>>
>> I made a new try, please have a look.
>>
>>
>>>> Section 1.2: eliminate comma from first sentence.
>>>>
>>
>> done.
>>
>>
>>>> Section 1.2: "However, such formats might apply the definitions made
>>>> in this specification, see e.g. [LDML]." This sentence is unclear.
>>>> Change to say: "One possible source of locale data and data formats is
>>>> [LDML]"??
>>>>
>>
>> done.
>>
>>
>>>> Section 1.3: "Web Service Internationalization" should read "Web
>>>> services Internationalization"
>>>>
>>
>> done.
>>
>>
>>>> Section 1.3/1.4: Section 1.3 and Section 1.4 should be a single
>>>> section.
>>>>
>>
>> done.
>>
>>
>>>> Section 2.2:
>>>>
>>
>> following Martin's proposal at
>> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0006.html
>> , this is now a subsection 1.4.
>>
>> This section mixes languages and locales as if they were
>>
>>>> the same thing. I think this is dangerous. We spent a lot of time in
>>>> WSTF building text to deal with this in a purposeful way. Language
>>>> tags are for languages. Locales can be inferred from language tags
>>>> (the locale mechanism used inside your programming environment may use
>>>> very different identifiers, cf. LCIDs). Thus item (2) in the list is
>>>> wrong.
>>>>
>>>> Comment: I think you should import text (with minor editing) from Web
>>>> Services Usage Scenarios to describe languages and locales and only
>>>> then launch into values. In particular, I commend you to Section 3.1
>>>> and Section 3.1.1 of
>>>> http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730
>>>>
>>
>> I reused and adapted section 3.1.1 of ws-i18n-scenarios, please have
>> a look.
>>
>>
>>>> Section 2.2: The following is correctly identified as a Bad Thing, but
>>>> I would suggest you remove it altogether
>>
>> done.
>>
>> because you suggest that it
>>
>>>> is sometimes okay to infer this. This is just bad practice or an
>>>> application assumption ("default currency"). In fact, this is Section
>>>> I-018 of WSUS
>>>> (http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#S-018)
>>>> "Note that sometimes information is heuristically inferred from
>>>> language or locale identifiers. For example, software might infer that
>>>> if the locale is "fr-FR" that the user's preferred currency is EUR.
>>>> However, that is only a guess because that locale ID does not specify
>>>> the preferred currency. The user may actually be living in the UK, and
>>>> do most transactions in GBP"
>>>>
>>>> Section 2.2: Example 1: This is a bad example because time zone is
>>>> always orthogonal to locale (and language). If you're going to say
>>>> anything about time zones, you should probably require the use of
>>>> Olson identifiers in specifications (a subject beyond the scope of
>>>> this document??)
>>>>
>>
>> I got rid of the example.
>>
>>
>>>> Section 2.3: references are to RFC 3066bis? Should be to
>>>> draft-matching.
>>>>
>>
>> done & changed in response to Martin's comment, is now section 2.2.
>>
>>
>>>> Section 3: Item 3: Specifications that define operations on language
>>>> values really should accept both basic and extended ranges.
>>>>
>>
>> does that mean that we break nearly all existing operations on language
>> values? I'm looking for a conformance criterion which allows CSS and
>> folks to say "in CSS 2.0, we do basic ranges, and that's fine". A new
>> version of CSS or spec XXX should do both, but I don't want to break
>> existing RECs.
>>
>>
>>
>>>> What's
>>>> important to specify is the matching scheme itself.
>>>>
>>>> Item 5: I don't like this item at all.
>>
>> I got rid of it.
>>
>> If you want to use an IRI to
>>
>>>> point to some "information item", fine: that's your own choice and
>>>> none of our business. But this requirement as written means nothing
>>>> and will only serve to confuse people. I think you'd be better off
>>>> sticking with saying something like "use the same format for locale
>>>> IDs as language tags". If someone can propose a workable IRI solution,
>>>> you can then incorporate that. The point (I think) is to avoid having
>>>> nine ways of identifying a locale.
>>>>
>>>> Editorial: In the note, this phrase "are conform to these criteria"
>>>> should say "conformant"
>>>>
>>
>> done.
>>
>>
>>>> General: I really think you should write about language identification
>>>> and then about inferring locale from it. In particular, I would
>>>> suggest that you consider adding something like these requirements:
>>>>
>>
>> I'd like to discuss these proposals with the core group first (see "cc"
>> of this mail).
>>
>>
>>
>>>> - Specifications MUST NOT use the xml:lang attribute to convey locale
>>>> information. // specs must not promote poor behavior. Xml:lang
>>>> identifies natural language usage in a document.
>>>>
>>
>> o.k.
>>
>>
>>>> - Specifications MUST define the default behavior for matching of
>>>> language content (see draft-matching, Section 3.4.1)
>>>>
>>
>> same concern as above: danger of breaking existing RECs. We will get *a
>> lot* of last call comments with such a criterion ..
>>
>>
>>>> - Specifications that use HTTP 1.1 SHOULD allow an application to
>>>> infer a user's locale preferences from the HTTP Accept-Language
>>>> header. // or something like this, eh?
>>>>
>>
>> how does this criterion relate to the following? It sounds like "HTTP
>> 1.1" will be an exception to the following criterion?
>>
>>
>>>> - Specifications that define the exchange of locale information MUST
>>>> define locale identifiers in terms of RFC 3066bis language tags and
>>>> MAY define specific extensions or private-use codes to identify
>>>> additional information. // this is the big one
>>>>
>>
>>
>> Looking forward for more feedback.
>>
>> Best regards, Felix.
>>
>>
>>
>>>> ----
>>>> As always, my best regards,
>>>>
>>>> Addison
>>>>
>>>> Addison Phillips
>>>> Internationalization Architect - Yahoo! Inc.
>>>>
>>>> Internationalization is an architecture.
>>>> It is not a feature.
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>
>
>
Received on Tuesday, 2 May 2006 17:28:36 UTC