- From: Debasish Banerjee <debasish@us.ibm.com>
- Date: Thu, 16 Jan 2003 12:06:29 -0600
- To: "Addison Phillips [wM]" <aphillips@webmethods.com>
- Cc: "Martin Duerst" <duerst@w3.org>, public-i18n-ws@w3.org
Addison and Martin, Is there a W3C group working on the concept of standardizing locale? The problems that Addison pointed out are indeed thorny, and all of us have probably faced some of them while designing internationalized portable applications . In this context, I have a couple of questions for the table. 1. Shall we try to define i18nContext data structure covering all the imaginable locale-affected categories (Addison's 'langauge'-->'EURO' list can be a starting point), or shall we restrict ourselves to a few most important fields (language, and country for sure, make code set too where applicable) and push the rest to that 'catch-all' variant filed, or shall we go for an open-ended structure something like ULocale (again as Addison pointed out, there can be problems with interpreting the not-so-common fields)? 2. Shall we prescribe the architecture of a pluggable locale adapter for Web service containers? The adapters for specific platforms can perform all the necessary conversions. The conversion will not be architected; at best we may provide some non-prescriptive conversion suggestions. Thanks, Deb "Addison Phillips [wM]" To: "Martin Duerst" <duerst@w3.org>, <public-i18n-ws@w3.org> <aphillips@webmet cc: Debasish Banerjee/Rochester/IBM@IBMUS hods.com> Subject: RE: [WSTF] My view... [long] 01/15/2003 07:58 PM Hi Martin, Thanks for the comments. A few notes below, which probably need more thought. Addison > -----Original Message----- > From: public-i18n-ws-request@w3.org > [mailto:public-i18n-ws-request@w3.org]On Behalf Of Martin Duerst > Sent: Wednesday, January 15, 2003 2:33 PM > To: Addison Phillips [wM]; public-i18n-ws@w3.org > Cc: debasish@us.ibm.com > Subject: Re: [WSTF] My view... [long] > > > > >I consider "locale" to be an environment setting or settings that govern > >how data is (to be) processed. Language preference may either be > an aspect > >of "locale" or may be a separate environment setting. > > And 'locale' also has several different aspects. That lets me > generally favor a hierarchy of > > i18n context > - language > - date format > - number format > - sort order > - ... > > rather than > > language > locale > - date format > - number format > - sort order > - ... > Yes, exactly so. But I have some concerns with "dissolving" locales into "i18n context" soup, as it were. The foremost is that most operating environments (the language the actual service is written in, as opposed to the container) want a native locale object to enable those various settings. So those items are "subsettings" of a locale object, and not necessarily a full-fledged item themselves. I would put language as a sub-item of locale, but for the fact that there are existing systems that deal with language. The question is whether to coexist, thus: i18n context - language - locale -language -region etc. Or even: language i18n-context - locale - numberfmt, etc. > > >In traditional software, the developer doesn't need to > explicitly obtain a > >Locale object or setting in order to write internationalized code. The > >default behavior of locale-aware functions obtain and use the > locale from > >the system environment. Yes, the developer must call the correct > functions > >or methods, but s/he must only obtain a locale when the desire is to > >override the user's preference (the default). > > I guess one important point may be that developers should not have to > care about language/locale issues when these are marginal (e.g. language > of error messages), but that there may be an advantage to force them > to have to think about such issues explicitly if they are central > to the application (e.g. currency conversions in some cases). Yes, exactly. It also means that well-written code running on multi-locale capable Web service containers gets multi-locale operation for free. This may include a great deal of code that is merely well-written, not explicitly internationalized. This is, to my mind, the best indicator "we've done something right". > > > >--- > >Aside: These last two scenarios Deb described in an off-list email last > >week. These are "runAsClient", "runAsSpecified" and "runAsServer" > >scenarios, Deb. > >--- > > Deb or Addison, could you send these scenarios to the list or to Kentaro > for inclusion in the document? The email thread is a bit chaotic. We'd probably have to forward the whole series. > > > >I'm on record as favoring tags over an XML structure, but I'm willing to > >be persuaded (and try persuasion ;-)). > > I think the question of whether to use a tag or a (simple) structure > is a syntax question, and thus could work both ways. A more important > question in my view is whether we want the solution to effectively > have a limited number of values (e.g. language x country (x variant)), > or whether we want to be open-ended, and how the tag/simple structure > gets associated with the actual data for the language/locale. > "Tag vs. structure" might then better be described as "tag" vs. "data". If the specific setting transfer is not our goal, then whether we use a URN or an XML structure is merely syntax, as you say. Having played with ULocale tags for the better part of a year, I have a better feel for where limited vs. open work than I did originally. ULocale tags are "open-ended" insofar as they allow an arbitrary number of additional, optional, or vendor-defined fields. The problem is getting agreement on what those fields and their values mean. So, in practice, they are closed. What I've found is that there is a distinct set of "values" that have meaning and may actually be necessary for total interoperability. These are: - language - region - charset (UNIX processes need it, can't get away from it) - collation - script (writing system, as with Trad and Simplified, or Japanese kana-only locale) - orthography (as Bokmal/Nynorsk) - currency (euro?) - other variant (all one-time-events, e.g. EURO) This historical problem with variant is that it is a catch-all field that does too many (different) things. In my own work, I've balled the last three up into "variant" and used script only on occasion. The remaining four (lang, region, charset, collation) maintain their importance to me, even in an all Unicode environment. For me it's because I sometimes need to do Java->POSIX. Although, as Deb points out, you probably wouldn't write an ML Web services container in XPG4-style C, you might use a Java container to invoke services or connect to resources (via JCA for example) that are written in such a style. Since the service invocation may be within the context of a native interface exec call, calling setlocale() on the service itself isn't quite the nasty problem that it would otherwise be and a complete POSIX locale is needed to invoke the service in a locale sensitive way. That is, my Solaris box has a locale called "ja_JP" and it is quite different than "ja_JP.UTF-8" or "ja_JP.eucJP" or even "ja_JP@kana". "ja-JP" isn't enough information to know which locale to initialize and I can't even enumerate the locales in some environments (like XPG4) to do it via inspection. A C program that is smart might try to guess its way through the locales: this is like the plug-in idea that Deb suggested. In any case, in modern programming this is less of an issue. But I still think that the variations I've listed are important in diverse cases, too many to just ignore them. > > >I have use cases galore from webMethods that relate to the > various aspects > >of the above. > > Please send them in so that we can integrate them in our doc. As I get to them. The problem is always time....::sigh:: > > > >Some things to think about: > > > >--RFC3066 language tags (that is xx-YY) are sufficient for identifying > >Java locales and possibly C# CultureInfos, but not for POSIX, > C/C++, Mac, > >and other native programming environments. > > Can you tell us what's missing in the later cases? For Posix/C/C++, > clearly the encoding is missing, but with XML, this falls out of the > equation. Anything else? POSIX/C++: charset/encoding. Have to have them even though XML files are Unicode. Mac (non OS/X): script or script code Win32: collation (you can live without, of course) Databases: encoding or collation? Host systems: CCSID or code page? // ... // too many more to really list Note that 3066 tags don't cover Java variants (for example). As long as you can live with "loss-of-precision" like that, it's okay, but I don't think we can live with it. > > > Regards, Martin. > >
Received on Thursday, 16 January 2003 13:11:32 UTC