Re: [WSTF] My view... [long] from Mark Davis on 2003-01-31 (public-i18n-ws@w3.org from January 2003)

From: Mark Davis <mark.davis@jtcsv.com>
Date: Thu, 30 Jan 2003 19:19:38 -0800
To: "Addison Phillips [wM]" <aphillips@webmethods.com>, "Martin Duerst" <duerst@w3.org>, <public-i18n-ws@w3.org>
Cc: <debasish@us.ibm.com>
Message-ID: <012901c2c8d7$97306410$7300a8c0@DAVIS1>
(I'm catching up on some mail here).

I'd like to find out more about why you think the charset needs to be
included in the locale specification. I believe that it is orthogonal to the
whole notion. It may be necessary in the short term to have some kind of
mechanism on a POSIX-style system to look up the "best" locale + charset
combination given a locale, but that does not seem worth burdening the
general concept of a locale with it.

Mark
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Addison Phillips [wM]" <aphillips@webmethods.com>
To: "Martin Duerst" <duerst@w3.org>; <public-i18n-ws@w3.org>
Cc: <debasish@us.ibm.com>
Sent: Wednesday, January 15, 2003 17:58
Subject: RE: [WSTF] My view... [long]


>
> Hi Martin,
>
> Thanks for the comments. A few notes below, which probably need more
> thought.
>
> Addison
>
> > -----Original Message-----
> > From: public-i18n-ws-request@w3.org
> > [mailto:public-i18n-ws-request@w3.org]On Behalf Of Martin Duerst
> > Sent: Wednesday, January 15, 2003 2:33 PM
> > To: Addison Phillips [wM]; public-i18n-ws@w3.org
> > Cc: debasish@us.ibm.com
> > Subject: Re: [WSTF] My view... [long]
> >
> > >
> > >I consider "locale" to be an environment setting or settings that
govern
> > >how data is (to be) processed. Language preference may either be
> > an aspect
> > >of "locale" or may be a separate environment setting.
> >
> > And 'locale' also has several different aspects. That lets me
> > generally favor a hierarchy of
> >
> >    i18n context
> >       - language
> >       - date format
> >       - number format
> >       - sort order
> >       - ...
> >
> > rather than
> >
> >    language
> >    locale
> >      - date format
> >      - number format
> >      - sort order
> >      - ...
> >
>
> Yes, exactly so. But I have some concerns with "dissolving" locales into
> "i18n context" soup, as it were. The foremost is that most operating
> environments (the language the actual service is written in, as opposed to
> the container) want a native locale object to enable those various
settings.
> So those items are "subsettings" of a locale object, and not necessarily a
> full-fledged item themselves.
>
> I would put language as a sub-item of locale, but for the fact that there
> are existing systems that deal with language. The question is whether to
> coexist, thus:
>
>   i18n context
>     - language
>     - locale
>         -language
>         -region etc.
>
> Or even:
>
>   language
>   i18n-context
>      - locale
>         - numberfmt, etc.
>
> >
> > >In traditional software, the developer doesn't need to
> > explicitly obtain a
> > >Locale object or setting in order to write internationalized code. The
> > >default behavior of locale-aware functions obtain and use the
> > locale from
> > >the system environment. Yes, the developer must call the correct
> > functions
> > >or methods, but s/he must only obtain a locale when the desire is to
> > >override the user's preference (the default).
> >
> > I guess one important point may be that developers should not have to
> > care about language/locale issues when these are marginal (e.g. language
> > of error messages), but that there may be an advantage to force them
> > to have to think about such issues explicitly if they are central
> > to the application (e.g. currency conversions in some cases).
>
>
> Yes, exactly. It also means that well-written code running on multi-locale
> capable Web service containers gets multi-locale operation for free. This
> may include a great deal of code that is merely well-written, not
explicitly
> internationalized. This is, to my mind, the best indicator "we've done
> something right".
> >
> >
> > >---
> > >Aside: These last two scenarios Deb described in an off-list email last
> > >week. These are "runAsClient", "runAsSpecified" and "runAsServer"
> > >scenarios, Deb.
> > >---
> >
> > Deb or Addison, could you send these scenarios to the list or to Kentaro
> > for inclusion in the document?
>
> The email thread is a bit chaotic. We'd probably have to forward the whole
> series.
> >
> >
> > >I'm on record as favoring tags over an XML structure, but I'm willing
to
> > >be persuaded (and try persuasion ;-)).
> >
> > I think the question of whether to use a tag or a (simple) structure
> > is a syntax question, and thus could work both ways. A more important
> > question in my view is whether we want the solution to effectively
> > have a limited number of values (e.g. language x country (x variant)),
> > or whether we want to be open-ended, and how the tag/simple structure
> > gets associated with the actual data for the language/locale.
> >
>
> "Tag vs. structure" might then better be described as "tag" vs. "data". If
> the specific setting transfer is not our goal, then whether we use a URN
or
> an XML structure is merely syntax, as you say.
>
> Having played with ULocale tags for the better part of a year, I have a
> better feel for where limited vs. open work than I did originally. ULocale
> tags are "open-ended" insofar as they allow an arbitrary number of
> additional, optional, or vendor-defined fields. The problem is getting
> agreement on what those fields and their values mean. So, in practice,
they
> are closed.
>
> What I've found is that there is a distinct set of "values" that have
> meaning and may actually be necessary for total interoperability. These
are:
>
> - language
> - region
> - charset (UNIX processes need it, can't get away from it)
> - collation
> - script (writing system, as with Trad and Simplified, or Japanese
kana-only
> locale)
> - orthography (as Bokmal/Nynorsk)
> - currency (euro?)
> - other variant (all one-time-events, e.g. EURO)
>
> This historical problem with variant is that it is a catch-all field that
> does too many (different) things. In my own work, I've balled the last
three
> up into "variant" and used script only on occasion. The remaining four
> (lang, region, charset, collation) maintain their importance to me, even
in
> an all Unicode environment.
>
> For me it's because I sometimes need to do Java->POSIX. Although, as Deb
> points out, you probably wouldn't write an ML Web services container in
> XPG4-style C, you might use a Java container to invoke services or connect
> to resources (via JCA for example) that are written in such a style. Since
> the service invocation may be within the context of a native interface
exec
> call, calling setlocale() on the service itself isn't quite the nasty
> problem that it would otherwise be and a complete POSIX locale is needed
to
> invoke the service in a locale sensitive way.
>
> That is, my Solaris box has a locale called "ja_JP" and it is quite
> different than "ja_JP.UTF-8" or "ja_JP.eucJP" or even "ja_JP@kana".
"ja-JP"
> isn't enough information to know which locale to initialize and I can't
even
> enumerate the locales in some environments (like XPG4) to do it via
> inspection. A C program that is smart might try to guess its way through
the
> locales: this is like the plug-in idea that Deb suggested.
>
> In any case, in modern programming this is less of an issue. But I still
> think that the variations I've listed are important
> in diverse cases, too many to just ignore them.
> >
> > >I have use cases galore from webMethods that relate to the
> > various aspects
> > >of the above.
> >
> > Please send them in so that we can integrate them in our doc.
>
> As I get to them. The problem is always time....::sigh::
> >
> >
> > >Some things to think about:
> > >
> > >--RFC3066 language tags (that is xx-YY) are sufficient for identifying
> > >Java locales and possibly C# CultureInfos, but not for POSIX,
> > C/C++, Mac,
> > >and other native programming environments.
> >
> > Can you tell us what's missing in the later cases? For Posix/C/C++,
> > clearly the encoding is missing, but with XML, this falls out of the
> > equation. Anything else?
>
> POSIX/C++: charset/encoding. Have to have them even though XML files are
> Unicode.
> Mac (non OS/X): script or script code
> Win32: collation (you can live without, of course)
> Databases: encoding or collation?
> Host systems: CCSID or code page?
> // ... // too many more to really list
> Note that 3066 tags don't cover Java variants (for example). As long as
you
> can live with "loss-of-precision" like that, it's okay, but I don't think
we
> can live with it.
>
> >
> >
> > Regards,     Martin.
> >
> >
>
>
Received on Thursday, 30 January 2003 22:21:30 UTC