RE: [WSTF] My view... [long] from Debasish Banerjee on 2003-01-16 (public-i18n-ws@w3.org from January 2003)

From: Debasish Banerjee <debasish@us.ibm.com>
Date: Thu, 16 Jan 2003 12:06:29 -0600
To: "Addison Phillips [wM]" <aphillips@webmethods.com>
Cc: "Martin Duerst" <duerst@w3.org>, public-i18n-ws@w3.org
Message-ID: <OF71905A74.99738BD9-ON86256CB0.005E23EA@us.ibm.com>
Addison and Martin,

Is there a W3C group working on the concept of standardizing locale?

The problems that Addison pointed out are indeed thorny, and all of us have
probably faced some of them while designing internationalized portable
applications . In this context, I have a couple of questions for the table.

1.  Shall we try to define i18nContext data structure covering all the
imaginable locale-affected categories (Addison's 'langauge'-->'EURO' list
can be a starting point), or shall we restrict ourselves to a few most
important fields (language, and country for sure, make code set too where
applicable) and push the rest to that 'catch-all' variant filed, or shall
we go for an open-ended structure something like ULocale (again as Addison
pointed out, there can be problems with interpreting the not-so-common
fields)?

2.  Shall we prescribe the architecture of a pluggable locale adapter for
Web service containers? The adapters for specific platforms can perform all
the necessary conversions. The conversion will not be architected; at best
we may provide some non-prescriptive conversion suggestions.

Thanks,

Deb


                                                                                                                                                    
                      "Addison Phillips                                                                                                             
                      [wM]"                    To:       "Martin Duerst" <duerst@w3.org>, <public-i18n-ws@w3.org>                                   
                      <aphillips@webmet        cc:       Debasish Banerjee/Rochester/IBM@IBMUS                                                      
                      hods.com>                Subject:  RE: [WSTF] My view... [long]                                                               
                                                                                                                                                    
                      01/15/2003 07:58                                                                                                              
                      PM                                                                                                                            
                                                                                                                                                    
                                                                                                                                                    




Hi Martin,

Thanks for the comments. A few notes below, which probably need more
thought.

Addison

> -----Original Message-----
> From: public-i18n-ws-request@w3.org
> [mailto:public-i18n-ws-request@w3.org]On Behalf Of Martin Duerst
> Sent: Wednesday, January 15, 2003 2:33 PM
> To: Addison Phillips [wM]; public-i18n-ws@w3.org
> Cc: debasish@us.ibm.com
> Subject: Re: [WSTF] My view... [long]
>
> >
> >I consider "locale" to be an environment setting or settings that govern
> >how data is (to be) processed. Language preference may either be
> an aspect
> >of "locale" or may be a separate environment setting.
>
> And 'locale' also has several different aspects. That lets me
> generally favor a hierarchy of
>
>    i18n context
>       - language
>       - date format
>       - number format
>       - sort order
>       - ...
>
> rather than
>
>    language
>    locale
>      - date format
>      - number format
>      - sort order
>      - ...
>

Yes, exactly so. But I have some concerns with "dissolving" locales into
"i18n context" soup, as it were. The foremost is that most operating
environments (the language the actual service is written in, as opposed to
the container) want a native locale object to enable those various
settings.
So those items are "subsettings" of a locale object, and not necessarily a
full-fledged item themselves.

I would put language as a sub-item of locale, but for the fact that there
are existing systems that deal with language. The question is whether to
coexist, thus:

  i18n context
    - language
    - locale
        -language
        -region etc.

Or even:

  language
  i18n-context
     - locale
        - numberfmt, etc.

>
> >In traditional software, the developer doesn't need to
> explicitly obtain a
> >Locale object or setting in order to write internationalized code. The
> >default behavior of locale-aware functions obtain and use the
> locale from
> >the system environment. Yes, the developer must call the correct
> functions
> >or methods, but s/he must only obtain a locale when the desire is to
> >override the user's preference (the default).
>
> I guess one important point may be that developers should not have to
> care about language/locale issues when these are marginal (e.g. language
> of error messages), but that there may be an advantage to force them
> to have to think about such issues explicitly if they are central
> to the application (e.g. currency conversions in some cases).


Yes, exactly. It also means that well-written code running on multi-locale
capable Web service containers gets multi-locale operation for free. This
may include a great deal of code that is merely well-written, not
explicitly
internationalized. This is, to my mind, the best indicator "we've done
something right".
>
>
> >---
> >Aside: These last two scenarios Deb described in an off-list email last
> >week. These are "runAsClient", "runAsSpecified" and "runAsServer"
> >scenarios, Deb.
> >---
>
> Deb or Addison, could you send these scenarios to the list or to Kentaro
> for inclusion in the document?

The email thread is a bit chaotic. We'd probably have to forward the whole
series.
>
>
> >I'm on record as favoring tags over an XML structure, but I'm willing to
> >be persuaded (and try persuasion ;-)).
>
> I think the question of whether to use a tag or a (simple) structure
> is a syntax question, and thus could work both ways. A more important
> question in my view is whether we want the solution to effectively
> have a limited number of values (e.g. language x country (x variant)),
> or whether we want to be open-ended, and how the tag/simple structure
> gets associated with the actual data for the language/locale.
>

"Tag vs. structure" might then better be described as "tag" vs. "data". If
the specific setting transfer is not our goal, then whether we use a URN or
an XML structure is merely syntax, as you say.

Having played with ULocale tags for the better part of a year, I have a
better feel for where limited vs. open work than I did originally. ULocale
tags are "open-ended" insofar as they allow an arbitrary number of
additional, optional, or vendor-defined fields. The problem is getting
agreement on what those fields and their values mean. So, in practice, they
are closed.

What I've found is that there is a distinct set of "values" that have
meaning and may actually be necessary for total interoperability. These
are:

- language
- region
- charset (UNIX processes need it, can't get away from it)
- collation
- script (writing system, as with Trad and Simplified, or Japanese
kana-only
locale)
- orthography (as Bokmal/Nynorsk)
- currency (euro?)
- other variant (all one-time-events, e.g. EURO)

This historical problem with variant is that it is a catch-all field that
does too many (different) things. In my own work, I've balled the last
three
up into "variant" and used script only on occasion. The remaining four
(lang, region, charset, collation) maintain their importance to me, even in
an all Unicode environment.

For me it's because I sometimes need to do Java->POSIX. Although, as Deb
points out, you probably wouldn't write an ML Web services container in
XPG4-style C, you might use a Java container to invoke services or connect
to resources (via JCA for example) that are written in such a style. Since
the service invocation may be within the context of a native interface exec
call, calling setlocale() on the service itself isn't quite the nasty
problem that it would otherwise be and a complete POSIX locale is needed to
invoke the service in a locale sensitive way.

That is, my Solaris box has a locale called "ja_JP" and it is quite
different than "ja_JP.UTF-8" or "ja_JP.eucJP" or even "ja_JP@kana". "ja-JP"
isn't enough information to know which locale to initialize and I can't
even
enumerate the locales in some environments (like XPG4) to do it via
inspection. A C program that is smart might try to guess its way through
the
locales: this is like the plug-in idea that Deb suggested.

In any case, in modern programming this is less of an issue. But I still
think that the variations I've listed are important
in diverse cases, too many to just ignore them.
>
> >I have use cases galore from webMethods that relate to the
> various aspects
> >of the above.
>
> Please send them in so that we can integrate them in our doc.

As I get to them. The problem is always time....::sigh::
>
>
> >Some things to think about:
> >
> >--RFC3066 language tags (that is xx-YY) are sufficient for identifying
> >Java locales and possibly C# CultureInfos, but not for POSIX,
> C/C++, Mac,
> >and other native programming environments.
>
> Can you tell us what's missing in the later cases? For Posix/C/C++,
> clearly the encoding is missing, but with XML, this falls out of the
> equation. Anything else?

POSIX/C++: charset/encoding. Have to have them even though XML files are
Unicode.
Mac (non OS/X): script or script code
Win32: collation (you can live without, of course)
Databases: encoding or collation?
Host systems: CCSID or code page?
// ... // too many more to really list
Note that 3066 tags don't cover Java variants (for example). As long as you
can live with "loss-of-precision" like that, it's okay, but I don't think
we
can live with it.

>
>
> Regards,     Martin.
>
>
Received on Thursday, 16 January 2003 13:11:32 UTC