RE: [WSTF] My view... [long]

Hi Martin,

Thanks for the comments. A few notes below, which probably need more
thought.

Addison

> -----Original Message-----
> From: public-i18n-ws-request@w3.org
> [mailto:public-i18n-ws-request@w3.org]On Behalf Of Martin Duerst
> Sent: Wednesday, January 15, 2003 2:33 PM
> To: Addison Phillips [wM]; public-i18n-ws@w3.org
> Cc: debasish@us.ibm.com
> Subject: Re: [WSTF] My view... [long]
>
> >
> >I consider "locale" to be an environment setting or settings that govern
> >how data is (to be) processed. Language preference may either be
> an aspect
> >of "locale" or may be a separate environment setting.
>
> And 'locale' also has several different aspects. That lets me
> generally favor a hierarchy of
>
>    i18n context
>       - language
>       - date format
>       - number format
>       - sort order
>       - ...
>
> rather than
>
>    language
>    locale
>      - date format
>      - number format
>      - sort order
>      - ...
>

Yes, exactly so. But I have some concerns with "dissolving" locales into
"i18n context" soup, as it were. The foremost is that most operating
environments (the language the actual service is written in, as opposed to
the container) want a native locale object to enable those various settings.
So those items are "subsettings" of a locale object, and not necessarily a
full-fledged item themselves.

I would put language as a sub-item of locale, but for the fact that there
are existing systems that deal with language. The question is whether to
coexist, thus:

  i18n context
    - language
    - locale
        -language
        -region etc.

Or even:

  language
  i18n-context
     - locale
        - numberfmt, etc.

>
> >In traditional software, the developer doesn't need to
> explicitly obtain a
> >Locale object or setting in order to write internationalized code. The
> >default behavior of locale-aware functions obtain and use the
> locale from
> >the system environment. Yes, the developer must call the correct
> functions
> >or methods, but s/he must only obtain a locale when the desire is to
> >override the user's preference (the default).
>
> I guess one important point may be that developers should not have to
> care about language/locale issues when these are marginal (e.g. language
> of error messages), but that there may be an advantage to force them
> to have to think about such issues explicitly if they are central
> to the application (e.g. currency conversions in some cases).


Yes, exactly. It also means that well-written code running on multi-locale
capable Web service containers gets multi-locale operation for free. This
may include a great deal of code that is merely well-written, not explicitly
internationalized. This is, to my mind, the best indicator "we've done
something right".
>
>
> >---
> >Aside: These last two scenarios Deb described in an off-list email last
> >week. These are "runAsClient", "runAsSpecified" and "runAsServer"
> >scenarios, Deb.
> >---
>
> Deb or Addison, could you send these scenarios to the list or to Kentaro
> for inclusion in the document?

The email thread is a bit chaotic. We'd probably have to forward the whole
series.
>
>
> >I'm on record as favoring tags over an XML structure, but I'm willing to
> >be persuaded (and try persuasion ;-)).
>
> I think the question of whether to use a tag or a (simple) structure
> is a syntax question, and thus could work both ways. A more important
> question in my view is whether we want the solution to effectively
> have a limited number of values (e.g. language x country (x variant)),
> or whether we want to be open-ended, and how the tag/simple structure
> gets associated with the actual data for the language/locale.
>

"Tag vs. structure" might then better be described as "tag" vs. "data". If
the specific setting transfer is not our goal, then whether we use a URN or
an XML structure is merely syntax, as you say.

Having played with ULocale tags for the better part of a year, I have a
better feel for where limited vs. open work than I did originally. ULocale
tags are "open-ended" insofar as they allow an arbitrary number of
additional, optional, or vendor-defined fields. The problem is getting
agreement on what those fields and their values mean. So, in practice, they
are closed.

What I've found is that there is a distinct set of "values" that have
meaning and may actually be necessary for total interoperability. These are:

- language
- region
- charset (UNIX processes need it, can't get away from it)
- collation
- script (writing system, as with Trad and Simplified, or Japanese kana-only
locale)
- orthography (as Bokmal/Nynorsk)
- currency (euro?)
- other variant (all one-time-events, e.g. EURO)

This historical problem with variant is that it is a catch-all field that
does too many (different) things. In my own work, I've balled the last three
up into "variant" and used script only on occasion. The remaining four
(lang, region, charset, collation) maintain their importance to me, even in
an all Unicode environment.

For me it's because I sometimes need to do Java->POSIX. Although, as Deb
points out, you probably wouldn't write an ML Web services container in
XPG4-style C, you might use a Java container to invoke services or connect
to resources (via JCA for example) that are written in such a style. Since
the service invocation may be within the context of a native interface exec
call, calling setlocale() on the service itself isn't quite the nasty
problem that it would otherwise be and a complete POSIX locale is needed to
invoke the service in a locale sensitive way.

That is, my Solaris box has a locale called "ja_JP" and it is quite
different than "ja_JP.UTF-8" or "ja_JP.eucJP" or even "ja_JP@kana". "ja-JP"
isn't enough information to know which locale to initialize and I can't even
enumerate the locales in some environments (like XPG4) to do it via
inspection. A C program that is smart might try to guess its way through the
locales: this is like the plug-in idea that Deb suggested.

In any case, in modern programming this is less of an issue. But I still
think that the variations I've listed are important
in diverse cases, too many to just ignore them.
>
> >I have use cases galore from webMethods that relate to the
> various aspects
> >of the above.
>
> Please send them in so that we can integrate them in our doc.

As I get to them. The problem is always time....::sigh::
>
>
> >Some things to think about:
> >
> >--RFC3066 language tags (that is xx-YY) are sufficient for identifying
> >Java locales and possibly C# CultureInfos, but not for POSIX,
> C/C++, Mac,
> >and other native programming environments.
>
> Can you tell us what's missing in the later cases? For Posix/C/C++,
> clearly the encoding is missing, but with XML, this falls out of the
> equation. Anything else?

POSIX/C++: charset/encoding. Have to have them even though XML files are
Unicode.
Mac (non OS/X): script or script code
Win32: collation (you can live without, of course)
Databases: encoding or collation?
Host systems: CCSID or code page?
// ... // too many more to really list
Note that 3066 tags don't cover Java variants (for example). As long as you
can live with "loss-of-precision" like that, it's okay, but I don't think we
can live with it.

>
>
> Regards,     Martin.
>
>

Received on Wednesday, 15 January 2003 20:58:52 UTC