Re: Comment on LTLI WD from Mark Davis on 2006-04-27 (public-i18n-core@w3.org from April to June 2006)

From: Mark Davis <mark.davis@icu-project.org>
Date: Thu, 27 Apr 2006 09:49:02 -0700
To: Felix Sasaki <fsasaki@w3.org>
CC: Addison Phillips <addison@yahoo-inc.com>, www-i18n-comments@w3.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <4450F5FE.2050409@icu-project.org>
Part of the problem is that the Java Locale is really misnamed (mea 
culpa!!). It, like the CLDR locale or the ICU locale, is really a 
language (with a bit of extra cruft since it wasn't clearly separated 
originally).

And part of the issue is the "locale" means *such* different things to 
different people. If you define it as a set of preferences associated 
with a particularly user community, it is overly broad. If you narrow it 
to a given user community with a shared language and physical location, 
it gets narrower; but perhaps too narrow. But there is a broad range of 
interpretation. For a given company, the people in a given postal code 
might be the granularity they need; or a given tax region (eg city + 
county + state in the US); or a given timezone. But physical location is 
also too narrow, since what you might want is the set of users 
associated with a given policy (eg subject to US tax law) no matter 
where they are physically located.

Felix Sasaki wrote:
> cc'ing also to public-i18n-core, so that Mary can see the discussion,
>
> During the last i18n core call, Mary was surprised that Mark proposed
> time zone as a case of a locale, since in Java it is separated from the
> Locale class. Mary said also she would propose a different example, so I
> would like to wait a bit for that.
>
> - Felix
>
> Mark Davis wrote:
>   
>> I think we need to have a clear discussion about what constitutes a
>> locale before progressing further. For my mind (language, timezone),
>> such as (en_US, Etc/GMT) is one of the clearest cases of a locale, so I
>> don't know what your mental image of a locale is.
>>
>> Addison Phillips wrote:
>>     
>>> Hi folks! Nice to see this work progressing...
>>>
>>> ---
>>> Section 1.1: The text describing locales is vague and/or possibly
>>> sloppy. I think you would be better off being very clear the RFC
>>> 3066/successor refers to language identification ONLY. Locales can be
>>> inferred from language identifiers (i.e. Accept-Language) or use
>>> identical tags in data items (elements, attributes, headers, etc.)
>>> that serve only the purpose of locale identification. This will help
>>> preserve (for example) clarity in specs such as XSL F&O where there
>>> has never been a locale identifier...
>>>
>>> Section 1.2: eliminate comma from first sentence.
>>>
>>> Section 1.2: "However, such formats might apply the definitions made
>>> in this specification, see e.g. [LDML]." This sentence is unclear.
>>> Change to say: "One possible source of locale data and data formats is
>>> [LDML]"??
>>>
>>> Section 1.3: "Web Service Internationalization" should read "Web
>>> services Internationalization"
>>>
>>> Section 1.3/1.4: Section 1.3 and Section 1.4 should be a single section.
>>>
>>> Section 2.2: This section mixes languages and locales as if they were
>>> the same thing. I think this is dangerous. We spent a lot of time in
>>> WSTF building text to deal with this in a purposeful way. Language
>>> tags are for languages. Locales can be inferred from language tags
>>> (the locale mechanism used inside your programming environment may use
>>> very different identifiers, cf. LCIDs). Thus item (2) in the list is
>>> wrong.
>>>
>>> Comment: I think you should import text (with minor editing) from Web
>>> Services Usage Scenarios to describe languages and locales and only
>>> then launch into values. In particular, I commend you to Section 3.1
>>> and Section 3.1.1 of
>>> http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730
>>> Section 2.2: The following is correctly identified as a Bad Thing, but
>>> I would suggest you remove it altogether because you suggest that it
>>> is sometimes okay to infer this. This is just bad practice or an
>>> application assumption ("default currency"). In fact, this is Section
>>> I-018 of WSUS
>>> (http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#S-018)
>>> "Note that sometimes information is heuristically inferred from
>>> language or locale identifiers. For example, software might infer that
>>> if the locale is "fr-FR" that the user's preferred currency is EUR.
>>> However, that is only a guess because that locale ID does not specify
>>> the preferred currency. The user may actually be living in the UK, and
>>> do most transactions in GBP"
>>>
>>> Section 2.2: Example 1: This is a bad example because time zone is
>>> always orthogonal to locale (and language). If you're going to say
>>> anything about time zones, you should probably require the use of
>>> Olson identifiers in specifications (a subject beyond the scope of
>>> this document??)
>>>
>>> Section 2.3: references are to RFC 3066bis? Should be to draft-matching.
>>>
>>> Section 3: Item 3: Specifications that define operations on language
>>> values really should accept both basic and extended ranges. What's
>>> important to specify is the matching scheme itself.
>>>
>>> Item 5: I don't like this item at all. If you want to use an IRI to
>>> point to some "information item", fine: that's your own choice and
>>> none of our business. But this requirement as written means nothing
>>> and will only serve to confuse people. I think you'd be better off
>>> sticking with saying something like "use the same format for locale
>>> IDs as language tags". If someone can propose a workable IRI solution,
>>> you can then incorporate that. The point (I think) is to avoid having
>>> nine ways of identifying a locale.
>>>
>>> Editorial: In the note, this phrase "are conform to these criteria"
>>> should say "conformant"
>>>
>>> General: I really think you should write about language identification
>>> and then about inferring locale from it. In particular, I would
>>> suggest that you consider adding something like these requirements:
>>>
>>> - Specifications MUST NOT use the xml:lang attribute to convey locale
>>> information. // specs must not promote poor behavior. Xml:lang
>>> identifies natural language usage in a document.
>>>
>>> - Specifications MUST define the default behavior for matching of
>>> language content (see draft-matching, Section 3.4.1)
>>>
>>> - Specifications that use HTTP 1.1 SHOULD allow an application to
>>> infer a user's locale preferences from the HTTP Accept-Language
>>> header. // or something like this, eh?
>>>
>>> - Specifications that define the exchange of locale information MUST
>>> define locale identifiers in terms of RFC 3066bis language tags and
>>> MAY define specific extensions or private-use codes to identify
>>> additional information. // this is the big one
>>>
>>> ----
>>> As always, my best regards,
>>>
>>> Addison
>>>
>>> Addison Phillips
>>> Internationalization Architect - Yahoo! Inc.
>>>
>>> Internationalization is an architecture.
>>> It is not a feature.
>>>
>>>
>>>
>>>   
>>>       
>
>
>
Received on Thursday, 27 April 2006 16:49:31 UTC