Re: [WSTF] My view... [long] from Addison Phillips [wM] on 2003-01-16 (public-i18n-ws@w3.org from January 2003)

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Thu, 16 Jan 2003 10:57:09 -0800
To: Debasish Banerjee <debasish@us.ibm.com>
CC: Martin Duerst <duerst@w3.org>, public-i18n-ws@w3.org
Message-ID: <3E270085.5060303@webmethods.com>
Hi Deb,

Response below.

Addison

Debasish Banerjee wrote:
> 
> Addison and Martin,
> 
> Is there a W3C group working on the concept of standardizing locale?

This is it ;-). The problem is deciding what to standardize and how to 
standardize it. W3C's mission doesn't include maintaining registies.

What we're really all talking about in this group right now is the need 
to decide what to standardize about locales and what the approach is. In 
my opinion, we need to standardize locale negotiation, hence my interest 
in tags or "simple structures". I think it a difficult-if-not-impossible 
task to standardize the actual data stores and detailed locale setting 
or preference exchange.

For the rest of this message I'm going to refer to tags and tagging, as 
a synonym for "tags or simple structures", okay?

> 
> The problems that Addison pointed out are indeed thorny, and all of us have
> probably faced some of them while designing internationalized portable
> applications . In this context, I have a couple of questions for the table.
> 
> 1.  Shall we try to define i18nContext data structure covering all the
> imaginable locale-affected categories (Addison's 'langauge'-->'EURO' list
> can be a starting point), or shall we restrict ourselves to a few most
> important fields (language, and country for sure, make code set too where
> applicable) and push the rest to that 'catch-all' variant filed, or shall
> we go for an open-ended structure something like ULocale (again as Addison
> pointed out, there can be problems with interpreting the not-so-common
> fields)?

Not "all imaginable" categories. In my experience, vendor specific 
extensions are useless in a heterogenous environment, but vendors 
persist in using them. Providing an extension mechanism allows an escape 
from objections.

ULocale is open-ended from that perspective. I made it open-ended to 
address the specific concerns of Tex and others who felt that existing 
locale identifiers were "too restrictive". I don't care so much about 
the additional fields, except as a way of getting by objections... at 
least at first I didn't. Later I found that I needed them to deal with 
variants.

Deciding to include or not include the "optional" fields as "core" (and 
which ones) should be a discussion that is held at some point. I think 
my current list (far more than) exhausts my needs.

 From that point of view, it might even make sense to define:

a) the tagging standard with only its required fields plus the 
extenstion mechanism.
b) specific extensions as separate "add-on" documents. For example, we 
could define the "currency" keyword and map it to the ISO currency 
codes. Or we could define the "calendar" keyword and provide a list of 
potential values. Implementers could choose to ignore these extensions.

This allows the project to be broken up into achievable pieces, while 
helping eliminate objections.
> 
> 2.  Shall we prescribe the architecture of a pluggable locale adapter for
> Web service containers? The adapters for specific platforms can perform all
> the necessary conversions. The conversion will not be architected; at best
> we may provide some non-prescriptive conversion suggestions.

I think that any tag or structure definition will, by its very nature, 
describe the adapter requirements. If you look at HTTP's use of RFC3066 
or if you look at ULocale you'll see an exact definition of "how to read 
the tag". What that tag "means" or is mapped to in a specific 
implementation is not defined. That will have to be implementation 
dependent (hence your "adapter").

I actually think it makes sense NOT to define the exact behavior of 
every system. If we can just get the tags accepted/required, then 
i18n-aware companies (like, say, IBM) will exploit them and Visigoth 
companies will at least emit them correctly.

Non-prescriptive conversion suggestions I think would be necessary. 
Probably sample implementation(s) would be needed at first too.
> 
> Thanks,
> 
> Deb
> 
> 
>                                                                                                                                                     
>                       "Addison Phillips                                                                                                             
>                       [wM]"                    To:       "Martin Duerst" <duerst@w3.org>, <public-i18n-ws@w3.org>                                   
>                       <aphillips@webmet        cc:       Debasish Banerjee/Rochester/IBM@IBMUS                                                      
>                       hods.com>                Subject:  RE: [WSTF] My view... [long]                                                               
>                                                                                                                                                     
>                       01/15/2003 07:58                                                                                                              
>                       PM                                                                                                                            
>                                                                                                                                                     
>                                                                                                                                                     
> 
> 
> 
> 
> Hi Martin,
> 
> Thanks for the comments. A few notes below, which probably need more
> thought.
> 
> Addison
> 
> 
>>-----Original Message-----
>>From: public-i18n-ws-request@w3.org
>>[mailto:public-i18n-ws-request@w3.org]On Behalf Of Martin Duerst
>>Sent: Wednesday, January 15, 2003 2:33 PM
>>To: Addison Phillips [wM]; public-i18n-ws@w3.org
>>Cc: debasish@us.ibm.com
>>Subject: Re: [WSTF] My view... [long]
>>
>>
>>>I consider "locale" to be an environment setting or settings that govern
>>>how data is (to be) processed. Language preference may either be
>>
>>an aspect
>>
>>>of "locale" or may be a separate environment setting.
>>
>>And 'locale' also has several different aspects. That lets me
>>generally favor a hierarchy of
>>
>>   i18n context
>>      - language
>>      - date format
>>      - number format
>>      - sort order
>>      - ...
>>
>>rather than
>>
>>   language
>>   locale
>>     - date format
>>     - number format
>>     - sort order
>>     - ...
>>
> 
> 
> Yes, exactly so. But I have some concerns with "dissolving" locales into
> "i18n context" soup, as it were. The foremost is that most operating
> environments (the language the actual service is written in, as opposed to
> the container) want a native locale object to enable those various
> settings.
> So those items are "subsettings" of a locale object, and not necessarily a
> full-fledged item themselves.
> 
> I would put language as a sub-item of locale, but for the fact that there
> are existing systems that deal with language. The question is whether to
> coexist, thus:
> 
>   i18n context
>     - language
>     - locale
>         -language
>         -region etc.
> 
> Or even:
> 
>   language
>   i18n-context
>      - locale
>         - numberfmt, etc.
> 
> 
>>>In traditional software, the developer doesn't need to
>>
>>explicitly obtain a
>>
>>>Locale object or setting in order to write internationalized code. The
>>>default behavior of locale-aware functions obtain and use the
>>
>>locale from
>>
>>>the system environment. Yes, the developer must call the correct
>>
>>functions
>>
>>>or methods, but s/he must only obtain a locale when the desire is to
>>>override the user's preference (the default).
>>
>>I guess one important point may be that developers should not have to
>>care about language/locale issues when these are marginal (e.g. language
>>of error messages), but that there may be an advantage to force them
>>to have to think about such issues explicitly if they are central
>>to the application (e.g. currency conversions in some cases).
> 
> 
> 
> Yes, exactly. It also means that well-written code running on multi-locale
> capable Web service containers gets multi-locale operation for free. This
> may include a great deal of code that is merely well-written, not
> explicitly
> internationalized. This is, to my mind, the best indicator "we've done
> something right".
> 
>>
>>>---
>>>Aside: These last two scenarios Deb described in an off-list email last
>>>week. These are "runAsClient", "runAsSpecified" and "runAsServer"
>>>scenarios, Deb.
>>>---
>>
>>Deb or Addison, could you send these scenarios to the list or to Kentaro
>>for inclusion in the document?
> 
> 
> The email thread is a bit chaotic. We'd probably have to forward the whole
> series.
> 
>>
>>>I'm on record as favoring tags over an XML structure, but I'm willing to
>>>be persuaded (and try persuasion ;-)).
>>
>>I think the question of whether to use a tag or a (simple) structure
>>is a syntax question, and thus could work both ways. A more important
>>question in my view is whether we want the solution to effectively
>>have a limited number of values (e.g. language x country (x variant)),
>>or whether we want to be open-ended, and how the tag/simple structure
>>gets associated with the actual data for the language/locale.
>>
> 
> 
> "Tag vs. structure" might then better be described as "tag" vs. "data". If
> the specific setting transfer is not our goal, then whether we use a URN or
> an XML structure is merely syntax, as you say.
> 
> Having played with ULocale tags for the better part of a year, I have a
> better feel for where limited vs. open work than I did originally. ULocale
> tags are "open-ended" insofar as they allow an arbitrary number of
> additional, optional, or vendor-defined fields. The problem is getting
> agreement on what those fields and their values mean. So, in practice, they
> are closed.
> 
> What I've found is that there is a distinct set of "values" that have
> meaning and may actually be necessary for total interoperability. These
> are:
> 
> - language
> - region
> - charset (UNIX processes need it, can't get away from it)
> - collation
> - script (writing system, as with Trad and Simplified, or Japanese
> kana-only
> locale)
> - orthography (as Bokmal/Nynorsk)
> - currency (euro?)
> - other variant (all one-time-events, e.g. EURO)
> 
> This historical problem with variant is that it is a catch-all field that
> does too many (different) things. In my own work, I've balled the last
> three
> up into "variant" and used script only on occasion. The remaining four
> (lang, region, charset, collation) maintain their importance to me, even in
> an all Unicode environment.
> 
> For me it's because I sometimes need to do Java->POSIX. Although, as Deb
> points out, you probably wouldn't write an ML Web services container in
> XPG4-style C, you might use a Java container to invoke services or connect
> to resources (via JCA for example) that are written in such a style. Since
> the service invocation may be within the context of a native interface exec
> call, calling setlocale() on the service itself isn't quite the nasty
> problem that it would otherwise be and a complete POSIX locale is needed to
> invoke the service in a locale sensitive way.
> 
> That is, my Solaris box has a locale called "ja_JP" and it is quite
> different than "ja_JP.UTF-8" or "ja_JP.eucJP" or even "ja_JP@kana". "ja-JP"
> isn't enough information to know which locale to initialize and I can't
> even
> enumerate the locales in some environments (like XPG4) to do it via
> inspection. A C program that is smart might try to guess its way through
> the
> locales: this is like the plug-in idea that Deb suggested.
> 
> In any case, in modern programming this is less of an issue. But I still
> think that the variations I've listed are important
> in diverse cases, too many to just ignore them.
> 
>>>I have use cases galore from webMethods that relate to the
>>
>>various aspects
>>
>>>of the above.
>>
>>Please send them in so that we can integrate them in our doc.
> 
> 
> As I get to them. The problem is always time....::sigh::
> 
>>
>>>Some things to think about:
>>>
>>>--RFC3066 language tags (that is xx-YY) are sufficient for identifying
>>>Java locales and possibly C# CultureInfos, but not for POSIX,
>>
>>C/C++, Mac,
>>
>>>and other native programming environments.
>>
>>Can you tell us what's missing in the later cases? For Posix/C/C++,
>>clearly the encoding is missing, but with XML, this falls out of the
>>equation. Anything else?
> 
> 
> POSIX/C++: charset/encoding. Have to have them even though XML files are
> Unicode.
> Mac (non OS/X): script or script code
> Win32: collation (you can live without, of course)
> Databases: encoding or collation?
> Host systems: CCSID or code page?
> // ... // too many more to really list
> Note that 3066 tags don't cover Java variants (for example). As long as you
> can live with "loss-of-precision" like that, it's okay, but I don't think
> we
> can live with it.
> 
> 
>>
>>Regards,     Martin.
>>
>>
> 
> 
> 
> 
> 
> 
> 


-- 
Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.

+1 408.962.5487  mailto:aphillips@webmethods.com
-------------------------------------------
Internationalization is an architecture. It is not a feature.

Chair, W3C I18N WG Web Services Task Force
http://www.w3.org/International/ws
Received on Thursday, 16 January 2003 14:03:57 UTC