- From: Mark Davis <mark.davis@jtcsv.com>
- Date: Thu, 30 Jan 2003 19:19:38 -0800
- To: "Addison Phillips [wM]" <aphillips@webmethods.com>, "Martin Duerst" <duerst@w3.org>, <public-i18n-ws@w3.org>
- Cc: <debasish@us.ibm.com>
(I'm catching up on some mail here). I'd like to find out more about why you think the charset needs to be included in the locale specification. I believe that it is orthogonal to the whole notion. It may be necessary in the short term to have some kind of mechanism on a POSIX-style system to look up the "best" locale + charset combination given a locale, but that does not seem worth burdening the general concept of a locale with it. Mark ________ mark.davis@jtcsv.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ----- Original Message ----- From: "Addison Phillips [wM]" <aphillips@webmethods.com> To: "Martin Duerst" <duerst@w3.org>; <public-i18n-ws@w3.org> Cc: <debasish@us.ibm.com> Sent: Wednesday, January 15, 2003 17:58 Subject: RE: [WSTF] My view... [long] > > Hi Martin, > > Thanks for the comments. A few notes below, which probably need more > thought. > > Addison > > > -----Original Message----- > > From: public-i18n-ws-request@w3.org > > [mailto:public-i18n-ws-request@w3.org]On Behalf Of Martin Duerst > > Sent: Wednesday, January 15, 2003 2:33 PM > > To: Addison Phillips [wM]; public-i18n-ws@w3.org > > Cc: debasish@us.ibm.com > > Subject: Re: [WSTF] My view... [long] > > > > > > > >I consider "locale" to be an environment setting or settings that govern > > >how data is (to be) processed. Language preference may either be > > an aspect > > >of "locale" or may be a separate environment setting. > > > > And 'locale' also has several different aspects. That lets me > > generally favor a hierarchy of > > > > i18n context > > - language > > - date format > > - number format > > - sort order > > - ... > > > > rather than > > > > language > > locale > > - date format > > - number format > > - sort order > > - ... > > > > Yes, exactly so. But I have some concerns with "dissolving" locales into > "i18n context" soup, as it were. The foremost is that most operating > environments (the language the actual service is written in, as opposed to > the container) want a native locale object to enable those various settings. > So those items are "subsettings" of a locale object, and not necessarily a > full-fledged item themselves. > > I would put language as a sub-item of locale, but for the fact that there > are existing systems that deal with language. The question is whether to > coexist, thus: > > i18n context > - language > - locale > -language > -region etc. > > Or even: > > language > i18n-context > - locale > - numberfmt, etc. > > > > > >In traditional software, the developer doesn't need to > > explicitly obtain a > > >Locale object or setting in order to write internationalized code. The > > >default behavior of locale-aware functions obtain and use the > > locale from > > >the system environment. Yes, the developer must call the correct > > functions > > >or methods, but s/he must only obtain a locale when the desire is to > > >override the user's preference (the default). > > > > I guess one important point may be that developers should not have to > > care about language/locale issues when these are marginal (e.g. language > > of error messages), but that there may be an advantage to force them > > to have to think about such issues explicitly if they are central > > to the application (e.g. currency conversions in some cases). > > > Yes, exactly. It also means that well-written code running on multi-locale > capable Web service containers gets multi-locale operation for free. This > may include a great deal of code that is merely well-written, not explicitly > internationalized. This is, to my mind, the best indicator "we've done > something right". > > > > > > >--- > > >Aside: These last two scenarios Deb described in an off-list email last > > >week. These are "runAsClient", "runAsSpecified" and "runAsServer" > > >scenarios, Deb. > > >--- > > > > Deb or Addison, could you send these scenarios to the list or to Kentaro > > for inclusion in the document? > > The email thread is a bit chaotic. We'd probably have to forward the whole > series. > > > > > > >I'm on record as favoring tags over an XML structure, but I'm willing to > > >be persuaded (and try persuasion ;-)). > > > > I think the question of whether to use a tag or a (simple) structure > > is a syntax question, and thus could work both ways. A more important > > question in my view is whether we want the solution to effectively > > have a limited number of values (e.g. language x country (x variant)), > > or whether we want to be open-ended, and how the tag/simple structure > > gets associated with the actual data for the language/locale. > > > > "Tag vs. structure" might then better be described as "tag" vs. "data". If > the specific setting transfer is not our goal, then whether we use a URN or > an XML structure is merely syntax, as you say. > > Having played with ULocale tags for the better part of a year, I have a > better feel for where limited vs. open work than I did originally. ULocale > tags are "open-ended" insofar as they allow an arbitrary number of > additional, optional, or vendor-defined fields. The problem is getting > agreement on what those fields and their values mean. So, in practice, they > are closed. > > What I've found is that there is a distinct set of "values" that have > meaning and may actually be necessary for total interoperability. These are: > > - language > - region > - charset (UNIX processes need it, can't get away from it) > - collation > - script (writing system, as with Trad and Simplified, or Japanese kana-only > locale) > - orthography (as Bokmal/Nynorsk) > - currency (euro?) > - other variant (all one-time-events, e.g. EURO) > > This historical problem with variant is that it is a catch-all field that > does too many (different) things. In my own work, I've balled the last three > up into "variant" and used script only on occasion. The remaining four > (lang, region, charset, collation) maintain their importance to me, even in > an all Unicode environment. > > For me it's because I sometimes need to do Java->POSIX. Although, as Deb > points out, you probably wouldn't write an ML Web services container in > XPG4-style C, you might use a Java container to invoke services or connect > to resources (via JCA for example) that are written in such a style. Since > the service invocation may be within the context of a native interface exec > call, calling setlocale() on the service itself isn't quite the nasty > problem that it would otherwise be and a complete POSIX locale is needed to > invoke the service in a locale sensitive way. > > That is, my Solaris box has a locale called "ja_JP" and it is quite > different than "ja_JP.UTF-8" or "ja_JP.eucJP" or even "ja_JP@kana". "ja-JP" > isn't enough information to know which locale to initialize and I can't even > enumerate the locales in some environments (like XPG4) to do it via > inspection. A C program that is smart might try to guess its way through the > locales: this is like the plug-in idea that Deb suggested. > > In any case, in modern programming this is less of an issue. But I still > think that the variations I've listed are important > in diverse cases, too many to just ignore them. > > > > >I have use cases galore from webMethods that relate to the > > various aspects > > >of the above. > > > > Please send them in so that we can integrate them in our doc. > > As I get to them. The problem is always time....::sigh:: > > > > > > >Some things to think about: > > > > > >--RFC3066 language tags (that is xx-YY) are sufficient for identifying > > >Java locales and possibly C# CultureInfos, but not for POSIX, > > C/C++, Mac, > > >and other native programming environments. > > > > Can you tell us what's missing in the later cases? For Posix/C/C++, > > clearly the encoding is missing, but with XML, this falls out of the > > equation. Anything else? > > POSIX/C++: charset/encoding. Have to have them even though XML files are > Unicode. > Mac (non OS/X): script or script code > Win32: collation (you can live without, of course) > Databases: encoding or collation? > Host systems: CCSID or code page? > // ... // too many more to really list > Note that 3066 tags don't cover Java variants (for example). As long as you > can live with "loss-of-precision" like that, it's okay, but I don't think we > can live with it. > > > > > > > Regards, Martin. > > > > > >
Received on Thursday, 30 January 2003 22:21:30 UTC