- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 18 Jun 2008 10:03:15 +0900
- To: Dan Chiba <dan.chiba@oracle.com>
- CC: "Phillips, Addison" <addison@amazon.com>, "www-international@w3.org" <www-international@w3.org>
Dan Chiba さんは書きました: > > Phillips, Addison wrote: >>>> On the other hand, does it make sense to advertise that a Web >>>> >>> service supports a locale that it has no messages for? If the >>> service normally has no user interface ("formatDate", "addInts", >>> "sortStrings"), then the list of available locales might very well >>> match the complete set available in the API. At the other end of >>> the spectrum are AJAX interactions that build the UI in real time. >>> Then only the messages that you actually have available are useful >>> to advertise. >>> I think it makes sense to advertise the set of supported locales. >>> >> >> That would tend to be the point of this work: we provide a way to say >> any of the following: >> >> - this service is locale-neutral; you may specify a locale, but it >> doesn't do anything to the service >> - this service has a specific default locale that it uses ("it is >> always in German"); they user can specify whatever they want, but the >> service always uses this one >> - this service has some specific (and specified) list of available >> locales (and by inference some default); the user may specify the >> locale to use and the service will do its best to match it from the >> specified list >> - this service is locale sensitive; the user may specify the locale >> to use and the service will do its best to match it, noting that a >> list is not provided >> > I think it is very desirable to provide a way to discover supported > locales as well. Then it would be possible for the service consumer to > specify the desired locale, knowing the locale will be used for the > service operation. Generally, the locale should be determined based on > the policy the WS Policy framework does not provide a policy negotiation mechanism. I would be very reluctant to spend time on developing such a mechanism. Although I understand your desire, I don't think that we should spend time on this (see my remark on timing below). > defined by the consumer, not by the provider. Otherwise the resulting > behavior would become unpredictable; likely to result in user > experiences with mixed languages. >>> It may >>> be the list of available translation languages, formatting locales, >>> those locales for which linguistic sorting behavior is supported, >>> or something alike. >>> >> >> Yes, and we need to support the service implementer making the >> decision about which pattern to advertise and/or use. You and I might >> choose entirely different criteria for choosing how we advertise >> locale support for a given service. >> >> >>> Because a service cannot determine the appropriate >>> locale for the locale sensitive service operation, it needs to be >>> made >>> possible for the service consumer to discover what locale is >>> supported, >>> in order for the application to produce the desired UI behavior. >>> >> >> I agree, with a nit: >> >> - sometimes it doesn't make sense to list everything that is >> available. Sometimes it is better (consumes less bandwidth, >> processing, etc.) to say: "I'll do my best to match your request". >> This can even make sense when the list is quite short. >> > I agree it is sometimes unnecessary for consumers to know what locales > are supported. In other cases, as mentioned, people may dislike mixed > languages on UI and an application needs to control the locales in > which the service operates. Suppose an application UI had three > sections each presenting text information from different services, the > user experience may be better if their language is the same. If the > information is dated, the date format would be expected to be consistent. >>>> My concern here is that many services fall into a sort of middle >>>> >>> category: they can service many locales, but only have a limited >>> set of localizations. Messages from the services are necessarily >>> constrained to the smaller set, while the service might actually be >>> useful for a larger set. >>> Yes that is why we think the translation locale and the locale for >>> other purposes should be identified separately. >>> >> >> I understand that's your intent. However I think this will confuse >> the vast preponderance of developers who have only a very rough idea >> what a "locale" or a "language" is. There are also different ways >> that services can be provisioned. It may not be possible to enumerate >> one list or the other easily. Having two things that do roughly the >> same thing doesn't seem that useful to me. How often do you actually >> set LC_MESSAGES separately from LC_ALL? >> > Whenever a user's preferred locale is supported but preferred language > is not, he or she would set LC_MESSAGES explicitly. This is often > needed because the set of supported translation languages is usually > small. I wonder if you also mean LC_MESSAGES is confusing or not > needed because it won't be set often and does not seem so useful. > > Because the sets of supported languages and locales are usually > different, they are practically different. I do agree they are > conceptually roughly the same. However, in reality, service consumers > are usually interested in serving the user in their most preferred > available language and locale, but this is hard to achieve without > specifying the locale and language separately. Both of users' > preferred locale and language should be honored, but too often > language resources are not available and an alternative language > different from the language deduced from the preferred locale ought to > be used instead. This alternative language needs to be identified and > this is why #3 language (and LC_MESSAGES) is needed. >>>> My tendency is still to think that this is "locale" and not >>>> >>> "language". It looks like a bug to get a message like: "There were >>> « 1 234 » entries sorted on 14 juin." Where the locale was clearly >>> one thing and the messages in another language. >>> Having both #1 locale and #3 language does not mean that would >>> produce >>> the odd message. If using the same locale for the message >>> formatting is >>> a requirement, the component can use #3 language alone to make the >>> message locale consistent. >>> >> >> >> But this is inconsistent with the design of WS-I18N, where "locale" >> is the "big knob". I tend to think that relatively few people would >> know how to write an application like this. >> > WS-I18N needs a little knob to deal with the fact that translation > resources are missing in many use cases. Usually "locale" identifies > the user's preferred locale, which is usually supported. "language" > may be deduced from "locale", however, support for the preferred > language is often not available, so the alternative language must be > identified. >> A better solution might be: if we provide a list of available >> locales, we can provide an additional attribute to indicate which >> ones have been provisioned with messages. For example: >> >> <i18n:locale> >> <i18n:option default="true" localized="true">en-GB</i18n:option> >> <i18n:option localized="true">de</i18n:option> >> <i18n:option>fr</i18n:option> >> </i18n:locale> >> >> Here the default locale is "en-GB". German ("de") is also available, >> with localizations, as is French ("fr"), sans localization. >> >> A request could come in as something like: >> >> <i18n:locale>en-US,de-CH-1994,fr</i18n:locale> <!-- in this case, it >> matches "de" --> >> >> Or perhaps: >> >> <i18n:locale>en,zh-yue,ja-JP</i18n:locale> <!-- in this case you get >> en-GB as the default --> >> >> And finally: >> >> <i18n:locale>fr-FR</i18n:locale> <!-- you get French locale behavior, >> but probably en-GB messages; no "fr" is available --> >> >> >> >>>> What is missing in the current version is that we don't provide: >>>> >>>> - a way to enumerate the available items >>>> - a way to specify the complete set of preferences >>>> - a reference to RFC 4647 Lookup (that is, locale-based resource >>>> >>> negotiation) >>> I agree. Again my understanding is that these are to be provided as >>> a separate document or a future revision. >>> >> >> Note that WS-I18N in its current incarnation is exactly the second >> draft. W3C's first version (2005-09-14) was taken from a trial >> balloon I wrote. At that time there was no Lookup algorithm, no LTLI >> (okay, there still isn't an LTLI, but that's something to fix), not >> much in the way of LDML, and RFC 4646 was still an Internet-Draft >> (with several to follow). With these items available to us, we should >> do the work to get WS-I18N right (it's actually a fairly minor set of >> revisions required, IMO). >> > I thought that may be months of work. If a comprehensive solution > could be included in the next version, that would be great. I don't see a comprehensive solution yet, although there seems to be some rough consensus coming up in this thread. So it's hard to make time planning at the moment. Felix >> >>>> I don't say that Unicode is forced upon people (although using >>>> >>> SOAP is mighty close to forcing UTF-8). What I'm saying is that, as >>> a parameter, it usually doesn't make a lot of sense. The data often >>> has to be transcoded for the benefit of (for example) the XML >>> processor anyway. The fact that data exists as some legacy encoding >>> affects the results or operation of the service itself (you still >>> can't store Japanese character data in a WE8ISO8859P1 database even >>> if the Web service layer permits you to send it some). But it's not >>> necessarily something that one can usefully specify at the service >>> layer. >>> >>>> Anyway, I don't want to sound completely absolutist here. I know >>>> >>> what kinds of cases you're thinking of and think they have merit. >>> I do agree character set is generally not so useful as other >>> elements >>> and not encouraged to use. I just think a character set is >>> considered as >>> one of the elements of a locale and some people may find it useful >>> if WS-I18N defines how to indicate it. >>> >> >> Character set is considered one of the elements of *some* locale >> systems. The question is: what does this parameter do or mean? If I >> have a <i18n:charset>ISO8859_1</i18n:charset> in my service's >> WS-Policy, does that mean I should transcode my SOAP request to >> Latin-1? Am I limited to Latin-1 characters in my request? Will I >> only receive Latin-1 characters in the response? The charset >> limitation may occur on several different levels of the system or it >> may simply be an assertion about the data. >> >> Since most developers wouldn't know what an encoding was if it grew >> legs and bit them, that makes me wary. If nothing else, we need to >> put a big Health Warning sticker on it :-). >> >> >>> If a requester is only interested in getting responses in a >>> specific >>> native character set (e.g. the response will be processed in a >>> component >>> that can only process a native encoding, or it will be stored in a >>> database that can only store a specific native character set), the >>> service could filter the response based on this information. >>> >>> >> >> Degrading the data early is usually a bad option :-). Converting the >> data from the UTF-8 used in the transport layer to the local encoding >> is usually en effective enough filter---and YOUR code did it, not my >> beautiful, pristine service <g>. This, for example, is true when you >> find out that "ISO 8859-1" sometimes means "windows-1252"... but >> sometimes it doesn't. >> >> Anyway, I digress. We can probably find a way to accommodate >> 'charset'. All I'm saying is: how we do it is important. >> > All right, I think how i18n:charset can be useful needs to be examined > and I am not sure whether it is truly valuable. Due to this lack of > support, users who need to deal with a native encoding may decide to > migrate to Unicode and that can be a good thing. :-) Please let me > withdraw the idea of <charset> element so we can better focus on the > other items. > > Regards, > -Dan >> Addison >> > > >
Received on Wednesday, 18 June 2008 01:04:09 UTC