RE: [Comment on WS-I18N WD] from Phillips, Addison on 2008-06-16 (www-international@w3.org from April to June 2008)

From: Phillips, Addison <addison@amazon.com>
Date: Mon, 16 Jun 2008 12:33:58 -0700
To: Dan Chiba <dan.chiba@oracle.com>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA013B11C6CC@EX-SEA5-D.ant.amazon.com>
> Can you
> > better enumerate why these should be promoted to full-fledged
> elements?
> >
> Explicitly specifying how to indicate the common details will help
> achieve interoperability and promote use of this mechanism. Because
> LDML
> is the way to describe and exchange locale definitions and not
> designed
> to indicate locale settings, it is complex and becomes ambiguous
> when it
> used under the <i18n:preferences> element.

I tend to agree about the problems of using LDML: we might reevaluate its use in this context. However, I do tend to think that making a lot of top-level elements might not be the way to go. The needs of one application are not necessarily the needs of another. Down this path lies a lot of top level preferences that might interfere with one another. I like the simplicity of the "big locale knob" and then having a bunch of smaller "preferences" knobs that interact with it.

>
> For example, how to identify a collation is unclear. In the current
> draft, German phonebook collation is represented as follows:
>
> (03)   <i18n:preferences>
> (04)     <ldml:collation>
> (05)       <ldml:alias source="de_DE" type="phonebook"/>
> (06)     </ldml:collation>
> (07)   </i18n:preferences>
>
> A corresponding example in LDML looks like this:
>
>    <collation type="phonebook">
>      <alias source="de_DE">
>    </collation>

Collation is a notoriously problematic area for identification. I don't disagree with your analysis here.

>
> > 2. I've seen requests for a UI language separate from locale
> before,
> > but I'm not sure that they make a lot of sense. Which takes
> > precedence? What does it mean to have a German locale but French
> UI
> > messages? Other than writing I18N demos, what use case do you
> have for
> > this?
> >
> Internationalized software often supports different sets of
> languages
> and locales. A software project typically find support for many
> locales
> in the technology stack (e.g. date formatting), while the project
> may
> not have the resources to support as many languages. (Support for
> locales is usually free or cheap, language varies, can be very
> expensive.) So quite often product supports greater number of
> locales
> than translation. Then, each user is usually served in their
> preferred
> locale. However, the preferred language may not be supported, and
> then
> an alternate language would have to be chosen. The language item in
> my
> #3 is to indicate this language.

API support for locales is not the same thing as full localization, as you note. My JDK has about 150 locales baked into it, but ,of course, my Java based product doesn't have 150 localizations available to it.

On the other hand, does it make sense to advertise that a Web service supports a locale that it has no messages for? If the service normally has no user interface ("formatDate", "addInts", "sortStrings"), then the list of available locales might very well match the complete set available in the API. At the other end of the spectrum are AJAX interactions that build the UI in real time. Then only the messages that you actually have available are useful to advertise.

My concern here is that many services fall into a sort of middle category: they can service many locales, but only have a limited set of localizations. Messages from the services are necessarily constrained to the smaller set, while the service might actually be useful for a larger set.

My tendency is still to think that this is "locale" and not "language". It looks like a bug to get a message like: "There were « 1 234 » entries sorted on 14 juin." Where the locale was clearly one thing and the messages in another language.

What is missing in the current version is that we don't provide:

- a way to enumerate the available items
- a way to specify the complete set of preferences
- a reference to RFC 4647 Lookup (that is, locale-based resource negotiation)

>
> > My concern is that it will be very difficult for people to
> understand
> > the separate element's uses, especially since each of them is
> then
> > exposed to the BCP 47 Lookup negotiation mechanism. If we were to
> make
> > some changes here it would be to make <i18n:locale> a language
> > priority list for requests and a single-item for responses.
> >
> >
> Yes actually I think it can be very difficult to orchestrate
> services to
> use appropriate locales for each service and product the desired
> behavior as a whole web service application, even after WS-I18N is
> completed and becomes available for developers. My understanding is
> that
> this version of WS-I18N specification does not define locale
> negotiation.

Yes: it needs to (provide for it). That is what it is for. It is intended to indicate what a service is capable of ("this one: always in French"). And to specify what is preferred when it is optional ("please do this service for me in Japanese").

> >
> > 4. Charset, IMO, is a bad idea. I am not sure of a use case for
> it.
> > Would it imply that the response should use a specific encoding
> for
> > attachments or for the SOAP message? Isn't this the job of
> > Content-Type? I'm sure we can think of some very specific cases
> that
> > imply it, but it strikes me that the best way to discourage Bad
> > Behavior for this sort of thing is to make people create their
> own,
> > separate policy item for encoding management when they need it.
> (We've
> > spent years getting people used to the idea that Unicode is a
> Good
> > Thing, especially on the wire and that if you need some other
> encoding
> > you should transcode to/from it on your end.)
> >
> I think there are use cases because there are data in native
> encodings.

Yes, there is data in native encodings. But if you put it into a SOAP message body, you can (really must) transform it. Or you attach it as a binary thing with a Content-Type.

> We promote Unicode in every chance but in some cases it does seem a
> better idea to not force them to Unicode. For example, both of
> consumer
> and provider may have a native encoding, then forcing the service
> communication to Unicode may sound irrational. I agree it is the
> job of
> Content-Type to indicate the charset of a content. This might be
> used to
> indicate preferred charsets (reminds Accept-Charset).

I don't say that Unicode is forced upon people (although using SOAP is mighty close to forcing UTF-8). What I'm saying is that, as a parameter, it usually doesn't make a lot of sense. The data often has to be transcoded for the benefit of (for example) the XML processor anyway. The fact that data exists as some legacy encoding affects the results or operation of the service itself (you still can't store Japanese character data in a WE8ISO8859P1 database even if the Web service layer permits you to send it some). But it's not necessarily something that one can usefully specify at the service layer.

Anyway, I don't want to sound completely absolutist here. I know what kinds of cases you're thinking of and think they have merit.

Addison
Received on Monday, 16 June 2008 19:34:36 UTC