RE: Comment on LTLI WD from Addison Phillips on 2006-04-27 (public-i18n-core@w3.org from April to June 2006)

From: Addison Phillips <addison@yahoo-inc.com>
Date: Thu, 27 Apr 2006 12:02:40 -0700
To: "'Mark Davis'" <mark.davis@icu-project.org>, "'Felix Sasaki'" <fsasaki@w3.org>
Cc: <www-i18n-comments@w3.org>, <public-i18n-core@w3.org>
Message-ID: <000a01c66a2d$28672a00$9fcd15ac@ds.corp.yahoo.com>
Locale is a vague computing concept, not a real world object. As you say, what it is depends entirely on your operating environment, programming language, and application's requirements. 

It is the biggest of the big knobs in setting personal preferences (notably language, but also regionally or culturally affected items). It is an identifier used to activate functionality in our APIs, which is distinct from language codes in W3C document formats (i.e. distinct from document metadata).

My problem with redefining what locales are and what they do in this document is that we already have too many problems explaining how to build properly internationalized programs or properties. I pointed back to the Web Services Usage Scenarios document because it took a step in the direction of making this clear. 

Virtually all "programming constructs" which share the name locale (or are intended to be a surrogate/replacement for one) are similar to your description of a Java Locale. Exchanging "gross locale identifiers" for these allows both the unwashed masses and we ourselves to create applications that are well-enabled and interoperable.

I think LTLI should not be in the business of criticizing everyone's locale model. The document should focus on "classical" locales and describe how to exchange this fragment of information. The "gross locale identifier", in my opinion, should use an RFC 3066bis tag, since most environments can directly or indirectly map such a tag to a locale (and already have functionality that does this). 

A separate document (WS-I18N) describes what you are talking about, Mark: the mix of parameters necessary to create truly globalized Web applications. This includes language preference, time zone, geocode, and other information necessary to a rich global experience. I think that is the right approach. 

For this document, I think that language and locale should be clearly defined from a Web perspective and the identification of each clearly spelled out. This means, IMO, avoiding ancillary or confusing topics in favor of more generic examples.

Addison

Addison Phillips
Internationalization Architect - Yahoo! Inc.

Internationalization is an architecture.
It is not a feature. 
> -----Original Message-----
> From: Mark Davis [mailto:mark.davis@icu-project.org]
> Sent: 2006年4月27日 9:49
> To: Felix Sasaki
> Cc: Addison Phillips; www-i18n-comments@w3.org; public-i18n-core@w3.org
> Subject: Re: Comment on LTLI WD
> 
> Part of the problem is that the Java Locale is really misnamed (mea
> culpa!!). It, like the CLDR locale or the ICU locale, is really a
> language (with a bit of extra cruft since it wasn't clearly separated
> originally).
> 
> And part of the issue is the "locale" means *such* different things to
> different people. If you define it as a set of preferences associated
> with a particularly user community, it is overly broad. If you narrow it
> to a given user community with a shared language and physical location,
> it gets narrower; but perhaps too narrow. But there is a broad range of
> interpretation. For a given company, the people in a given postal code
> might be the granularity they need; or a given tax region (eg city +
> county + state in the US); or a given timezone. But physical location is
> also too narrow, since what you might want is the set of users
> associated with a given policy (eg subject to US tax law) no matter
> where they are physically located.
> 
> Felix Sasaki wrote:
> > cc'ing also to public-i18n-core, so that Mary can see the discussion,
> >
> > During the last i18n core call, Mary was surprised that Mark proposed
> > time zone as a case of a locale, since in Java it is separated from the
> > Locale class. Mary said also she would propose a different example, so I
> > would like to wait a bit for that.
> >
> > - Felix
> >
> > Mark Davis wrote:
> >
> >> I think we need to have a clear discussion about what constitutes a
> >> locale before progressing further. For my mind (language, timezone),
> >> such as (en_US, Etc/GMT) is one of the clearest cases of a locale, so I
> >> don't know what your mental image of a locale is.
> >>
> >> Addison Phillips wrote:
> >>
> >>> Hi folks! Nice to see this work progressing...
> >>>
> >>> ---
> >>> Section 1.1: The text describing locales is vague and/or possibly
> >>> sloppy. I think you would be better off being very clear the RFC
> >>> 3066/successor refers to language identification ONLY. Locales can be
> >>> inferred from language identifiers (i.e. Accept-Language) or use
> >>> identical tags in data items (elements, attributes, headers, etc.)
> >>> that serve only the purpose of locale identification. This will help
> >>> preserve (for example) clarity in specs such as XSL F&O where there
> >>> has never been a locale identifier...
> >>>
> >>> Section 1.2: eliminate comma from first sentence.
> >>>
> >>> Section 1.2: "However, such formats might apply the definitions made
> >>> in this specification, see e.g. [LDML]." This sentence is unclear.
> >>> Change to say: "One possible source of locale data and data formats is
> >>> [LDML]"??
> >>>
> >>> Section 1.3: "Web Service Internationalization" should read "Web
> >>> services Internationalization"
> >>>
> >>> Section 1.3/1.4: Section 1.3 and Section 1.4 should be a single
> section.
> >>>
> >>> Section 2.2: This section mixes languages and locales as if they were
> >>> the same thing. I think this is dangerous. We spent a lot of time in
> >>> WSTF building text to deal with this in a purposeful way. Language
> >>> tags are for languages. Locales can be inferred from language tags
> >>> (the locale mechanism used inside your programming environment may use
> >>> very different identifiers, cf. LCIDs). Thus item (2) in the list is
> >>> wrong.
> >>>
> >>> Comment: I think you should import text (with minor editing) from Web
> >>> Services Usage Scenarios to describe languages and locales and only
> >>> then launch into values. In particular, I commend you to Section 3.1
> >>> and Section 3.1.1 of
> >>> http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730
> >>> Section 2.2: The following is correctly identified as a Bad Thing, but
> >>> I would suggest you remove it altogether because you suggest that it
> >>> is sometimes okay to infer this. This is just bad practice or an
> >>> application assumption ("default currency"). In fact, this is Section
> >>> I-018 of WSUS
> >>> (http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#S-018)
> >>> "Note that sometimes information is heuristically inferred from
> >>> language or locale identifiers. For example, software might infer that
> >>> if the locale is "fr-FR" that the user's preferred currency is EUR.
> >>> However, that is only a guess because that locale ID does not specify
> >>> the preferred currency. The user may actually be living in the UK, and
> >>> do most transactions in GBP"
> >>>
> >>> Section 2.2: Example 1: This is a bad example because time zone is
> >>> always orthogonal to locale (and language). If you're going to say
> >>> anything about time zones, you should probably require the use of
> >>> Olson identifiers in specifications (a subject beyond the scope of
> >>> this document??)
> >>>
> >>> Section 2.3: references are to RFC 3066bis? Should be to draft-
> matching.
> >>>
> >>> Section 3: Item 3: Specifications that define operations on language
> >>> values really should accept both basic and extended ranges. What's
> >>> important to specify is the matching scheme itself.
> >>>
> >>> Item 5: I don't like this item at all. If you want to use an IRI to
> >>> point to some "information item", fine: that's your own choice and
> >>> none of our business. But this requirement as written means nothing
> >>> and will only serve to confuse people. I think you'd be better off
> >>> sticking with saying something like "use the same format for locale
> >>> IDs as language tags". If someone can propose a workable IRI solution,
> >>> you can then incorporate that. The point (I think) is to avoid having
> >>> nine ways of identifying a locale.
> >>>
> >>> Editorial: In the note, this phrase "are conform to these criteria"
> >>> should say "conformant"
> >>>
> >>> General: I really think you should write about language identification
> >>> and then about inferring locale from it. In particular, I would
> >>> suggest that you consider adding something like these requirements:
> >>>
> >>> - Specifications MUST NOT use the xml:lang attribute to convey locale
> >>> information. // specs must not promote poor behavior. Xml:lang
> >>> identifies natural language usage in a document.
> >>>
> >>> - Specifications MUST define the default behavior for matching of
> >>> language content (see draft-matching, Section 3.4.1)
> >>>
> >>> - Specifications that use HTTP 1.1 SHOULD allow an application to
> >>> infer a user's locale preferences from the HTTP Accept-Language
> >>> header. // or something like this, eh?
> >>>
> >>> - Specifications that define the exchange of locale information MUST
> >>> define locale identifiers in terms of RFC 3066bis language tags and
> >>> MAY define specific extensions or private-use codes to identify
> >>> additional information. // this is the big one
> >>>
> >>> ----
> >>> As always, my best regards,
> >>>
> >>> Addison
> >>>
> >>> Addison Phillips
> >>> Internationalization Architect - Yahoo! Inc.
> >>>
> >>> Internationalization is an architecture.
> >>> It is not a feature.
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
Received on Thursday, 27 April 2006 19:04:11 UTC