- From: Tex Texin <tex@xencraft.com>
- Date: Tue, 11 May 2004 20:55:49 -0400
- To: Mark Davis <mark.davis@jtcsv.com>
- Cc: "Addison Phillips [wM]" <aphillips@webmethods.com>, Web Services <public-i18n-ws@w3.org>
Mark Davis wrote: > > 1) ok on "suggestive". I had in mind that someone might guess if the locale > was > > Spain vs. Latin America, whether to use Modern or Trad., so it was indicative > > of collation in the same "loose" way PRC vs ROC is indicative of Simplified vs > > Trad. Chinese. It is not only that it is not specified but it can't (or > > shouldn't) be inferred. > > hmmm. Still uneasy. What happens in fact is that whenever we (or anyone else) > gets an underspecified request, we *have* to do something. So for example, if we > get a naked 'zh' on a resource lookup, then we have to pick either 'zh-Hant' or > 'zh-Hans' (logically, if not physically). If we get es-BO on a collation > request, then we have to choose *some* sorting. And the default sort may be > different for different locales, that is true. But "suggestive" is not > indicative of the true process. > > Of course, where there is a handshake, one can communicate back: "Don't know > what you mean by "es-BO": pick either "es-BO-x-collationtraditional" or > "es-BO-x-collationmodern". But normally that option n'exist pas. ok. It's like infer and imply. The identifier is not really suggestive, but people with a need to make a decision infer what they hope is implied. I'll fix it in the next version. > > Mark, You make an interesting distinction that strength makes a feature more > or > > less important not necessarily ignorable. > > There probably should be a separate name for that dimension of collation > > (ignoring characters, or indeterminacy) to distinguish it from strength. > > Most of the sorts I have worked with don't implement strength fully and leave > > the sort indeterminate. > > And most should. A fully determinate sort is not generally worth the cost (see > tn9). However, when sorting, at at least the case and accent level, plus > punctuation (if applicable), they should not be ignored in sorting, although may > be ignored in searching. > > The "ignore punctuation" option really means "make punctuation a really weak > strength". I need to think about this. I don't see the harm to users in simply ignoring some characters, if you never intend to distinguish them in queries, and your users do not depend on their being ordered. I do know indeterminacy makes life difficult from a testing and support perspective (harder to do regression tests and difficult to reproduce certain problems) but that is a separate issue. And there is a cost in effort and performance to making the values weakly distinctive. I can see the performance cost is minimal given that the number of values is small as I postulated, but I still wonder if its worth the trouble. What I can see, is that if you have already created and tested a module that sorts fully and with determinacy, then it is silly not to use it everywhere rather than create a new one that is more limited. But many folks are in the position of needing a sort routine for a drop-down and the like and don't have the full one handy. I guess I should just recommend ICU. ;-) > > 5) on abbreviations, I am not sure how often expansion is used, I do know it > is > > done. If by limited environment you mean with respect to certain fields rather > > than applied to all fields, yes of course. > > It is done, but what I mean by "limited environment" is that "replace all > acronyms" makes no sense unless you have a very limited vocabulary, in a very > restricted domain, like "Dental Supplies". How are you supposed to sort "ICU" > when it could have the following expansions??? > > http://www.acronymfinder.com/af-query.asp?String=exact&Acronym=ICU ok, yes. > > > > > on abbreviations in a multilingual context: well yes, collation is language > > sensitive and I was looking to indicate there are problems due to using narrow > > language-sensitive operations in a multilingual space. Maybe you can suggest > > some better examples? > > I think this is an area where Web Services may get tripped up. > > I think it just needs to be removed. Mixing operations from different languages > will always get you into trouble; this has nothing really to do with the example > you want to make. Is there never a requirement to search for data in a multilingual database and have multiple language rules operating rather than a single user preference? If I look for addresses of tex texin, I want "ma" to expand to match massachusetts and for my French chateau to have its address with "za" expand to Zone d'activité, so I can look for a match of "tex and (zone or massachusetts)". (Don't ask why my chateau is in a ZA... ;-) ) I would think a database of multinational and multilingual addresses must use different dictionaries of abbreviations according to the country the address is for. tex
Received on Tuesday, 11 May 2004 20:57:18 UTC