RE: FW: [Distributed services] What is i18n on web services based webapplications

Hi Andrea,

Thanks for the response.

I think we're more or less in agreement here. We need both locale and
language. We need them in both directions (request and response).

I think W3C-I18N should try to make as much headway as possible, as quickly
as possible, to prevent over-loading (such as is sometimes the case now with
Accept-Language).

I have some (a great deal, actually) specific inter-linear responses below
(I've also deleted some of the text that isn't germane to my responses).

I would like to add an observation that there seems to be some desire amoung
folks to ditch the POSIX style locale format (lang_region). I admit that it
is an inexact mapping (will we ever get Chinese sorted out? what about
bi-lingual countries like Belgium and Canada, where the region should
probably come first?). But it is generally useful and matches the internal
locale structures of most of our platforms (even Microsoft's). I'm concerned
about the acceptance of something new (hence "odd") that might be
incompatible with existing APIs... this is a topic that should be considered
carefully.

In any event, if I had my druthers I would vote that we have these items
turned into functional RFCs as quickly as possible, so that other standards
groups can be coerced into adopting them ;-).

Best Regards,

Addison

Addison P. Phillips
Globalization Architect
webMethods, Inc.
432 Lakeside Drive
Sunnyvale, California, USA
+1 408.962.5487 (phone)
+1 408.210.3569 (mobile)
----------------------------------------
Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: A. Vine [mailto:andrea.vine@sun.com]
> Sent: Monday, April 08, 2002 11:43 AM
> To: Addison Phillips [wM] (by way of Martin Duerst <duerst@w3.org>)
> Cc: www-i18n-workshop@w3.org
> Subject: Re: FW: [Distributed services] What is i18n on web services
> based webapplications
>
>
> >
> > I fear that an infrastructure might not be accepted because it
> would not be
> > simple enough (we would understand it in this forum, but the
> authors of SOAP
> > et al aren't thinking all that much about it, as near as I can
> tell. There
> > is an expectation that xml:lang already does this, I believe.).
>
> But we all know it doesn't (I assume that's what you're implying here).

Yes. xml:lang is supposed to be a language attribute, after all, not a
locale tag.

>
> > ...... Correctly structured data is nearly always locale neutral
> > (the data model in SOAP for example sees to that).
>
> It's that "nearly" which will bite us.

Yes, but it is important to point out that this case applies very little of
the time. If one finds oneself tagging some data with a locale attribute, it
is probably a good idea to look at the data again for refactoring. There is
data that is locale affected (and let's distinguish "in a particular
language" from "locale affected" here), but in most cases well formed data
structures are surprisingly locale-independent. (It's display time that
makes the difference)

There is a tendency to make locales do too much, in my opinion. Country
codes or language tags are often what is actually needed when a locale is
called into play---or some other value may be important (like currency
code). These are not locales or even locale components, but specific data
elements or field values. Their relationship to locale is often a loose one.

But I digress.

The "nearly" does bite. But locales will not prevent it from biting. In
fact, I suspect that locale structures will lead to as much silliness as
having nothing. ("Well I tagged the record with a locale! What do you mean
you can't read the date?")

In the most recent issue of XML Journal, for example, I count at least four
articles that make fundamental Java coding errors with regard to character
encoding---every XML file is tagged with a hardcoded
"text/xml;charset=UTF-8" or <?xml version="1." encoding="utf-8"?> and nearly
every time you get something like:

   OutputStreamWriter osw = new OutputStreamWriter(myStream);

Gee, that's not going to work unless you just happen to be running a UNIX
UTF-8 locale. But we've successfully foisted Unicode off on folks, eh? I
believe the situation is analogous. Oh bother, I'm preaching to the choir
again!

>
> >
> > That said, there is a role for locale in XML and distributed
> systems. For
> > example, the sort order of a collated result set (series of
> records) in an
> > XML file should probably match the expected order of the client...
>
> Where possible, of course.  The client can request anything, but
> it's the server
> which must decide what it can and should serve.

Hence request and response being separate... and the need for fall-back
definitions.

>
> Of course the above is language-based, not (usually) locale-based.

Yes. But implementations are generally locale based, though. See locale
overloading complaints above ;-)

>
> > ......(can we formally co-opt
> > Accept-Language, seeing as almost everyone uses it as both "locale" and
> > "language choice"?)
>
> Prefer not.  This is an opportunity to separate language and
> locale.  Please
> let's take advantage of it, and maybe other protocols and apps
> will follow.

I do too. The question is whether too much prior art exists not to coopt. I
saw two presentations as IUC20 that used Accept-Language for locale
negotiation (and one that used it for both language and locale and
specifically distinguished the two).

> I'm
> not saying you can't have a language fr-CA, what I'm saying is
> you can have a
> locale US (or en_US just to keep the current naming convention) and have a
> language of fr-CA.  Servers now may not be able to accommodate
> fr-CA, but little
> by little they will *if* we provide the mechanism.  No mechanism,
> no chance for
> provision.

Also, there is the question of browser rotation. If we have to wait for IE 7
(or 8) to predominate, it could be a long time before we can use the
results... which will encourage misuse of existing things further.

>
> > and especially in Web Services (possibly via an
> > "xml:locale" tag). This is not the same thing as indicating the
> > locale/format of returned data. These are two separate things.
>
> If, as you say below, xml:lang is to be regarded strictly as an
> attribute (which
> I think is a good idea), then so goes xml:locale.  I believe
> requests for a
> language and for a locale require a separate *tag*, not an attribute.
> Accept-language is separated from Content-language.  What we want
> and what we
> get are 2 different things ;-}

Utterly agree.

>
> >
> >    1a. SOAP in particular needs to have a locale passing
> mechanism. There
> > needs to be thought given to how multiple hops on the way to
> the destination
> > handle locale, for example. I realize, from several long chats
> with our WS
> > folks, that SOAP has gone out of their way to avoid protocols,
> but I firmly
> > believe that the locale passing belongs in the envelope and I think it
> > should be defined at the most abstract level so that the most
> > implementations inherit good i18n behavior. Call me crazy.
>
....
>
> I don't see the relevance of hops.  A client requests a locale -
> the request.
> Regardless of the number of hops, doesn't the request remain the same?
> The server serves a locale - the content locale.  Again, why
> should hops change
> this?  Are you suggesting that there are other services which
> will take the data
> and format it according to the locale, without a separate SOAP
> envelope with a
> separate request-locale and content-locale?  Or am I missing something?

My point isn't clear. If you put the locale in the transport (e.g.
Accept-Language) then it tends to get stripped off by the first server in
the chain. This is not very useful. Putting it in the SOAP envelope
guarantees that it will get to the destination.

In addition, there is the problem of doing housekeeping (such as logging) in
the server that does the passing. It may need to embellish the locale list
for the purpose of local processing (whereas the payload still has the
original client locale on it). I'm theorizing here, not really seriously so.
My point is in the previous paragraph.

>
> I vote for a tag.  Its use can be optional, but when it is used, it should
> always be via that particular tag and its attributes.
>

I agree. Actually, a structured tag similar to Accept-Language might be most
useful. I get a lot out of multi-valued sets like A-L.

> >
> > 3. The rules for handling these requests (chaining, negotiation,
> > defaulting/fallbacks, what to do if no locale is requested,
> what to do if
> > multiple locales are requested, etc.) would also be a useful adjunct
> > (perhaps this is part of the locales group's project already??)
>
> Yes, absolutely.
>
> >
....
>
> Andrea Vine
> iPlanet i18n architect
>

Received on Monday, 8 April 2002 17:20:40 UTC