- From: Mark Davis <mark.davis@jtcsv.com>
- Date: Tue, 11 May 2004 18:46:37 -0700
- To: "Tex Texin" <tex@xencraft.com>
- Cc: "Addison Phillips [wM]" <aphillips@webmethods.com>, "Web Services" <public-i18n-ws@w3.org>
> If I look for addresses of tex texin, I want "ma" to expand to match > massachusetts and for my French chateau to have its address with "za" expand to > Zone d'activité, so I can look for a match of "tex and (zone or > massachusetts)". > > (Don't ask why my chateau is in a ZA... ;-) ) > > I would think a database of multinational and multilingual addresses must use > different dictionaries of abbreviations according to the country the address is > for. *must* is a strong word, and not warranted here. Remember, all of the 17 different abbreviations for ICU I pointed to are valid in en-US. You can't distinguish them by locale at all. If you wanted to have a system whereby for each locale and datatype there was a mapping from acronyms to expanded text, that I would say was possible. I'd also say it was unnecessary, and the likelihood of anyone spending all the money to do that is pretty close to nil. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Tex Texin" <tex@xencraft.com> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "Addison Phillips [wM]" <aphillips@webmethods.com>; "Web Services" <public-i18n-ws@w3.org> Sent: Tue, 2004 May 11 17:55 Subject: Re: sec 4.11 > > > Mark Davis wrote: > > > 1) ok on "suggestive". I had in mind that someone might guess if the locale > > was > > > Spain vs. Latin America, whether to use Modern or Trad., so it was indicative > > > of collation in the same "loose" way PRC vs ROC is indicative of Simplified vs > > > Trad. Chinese. It is not only that it is not specified but it can't (or > > > shouldn't) be inferred. > > > > hmmm. Still uneasy. What happens in fact is that whenever we (or anyone else) > > gets an underspecified request, we *have* to do something. So for example, if we > > get a naked 'zh' on a resource lookup, then we have to pick either 'zh-Hant' or > > 'zh-Hans' (logically, if not physically). If we get es-BO on a collation > > request, then we have to choose *some* sorting. And the default sort may be > > different for different locales, that is true. But "suggestive" is not > > indicative of the true process. > > > > Of course, where there is a handshake, one can communicate back: "Don't know > > what you mean by "es-BO": pick either "es-BO-x-collationtraditional" or > > "es-BO-x-collationmodern". But normally that option n'exist pas. > > ok. It's like infer and imply. The identifier is not really suggestive, but > people with a need to make a decision infer what they hope is implied. > > I'll fix it in the next version. > > > > Mark, You make an interesting distinction that strength makes a feature more > > or > > > less important not necessarily ignorable. > > > There probably should be a separate name for that dimension of collation > > > (ignoring characters, or indeterminacy) to distinguish it from strength. > > > Most of the sorts I have worked with don't implement strength fully and leave > > > the sort indeterminate. > > > > And most should. A fully determinate sort is not generally worth the cost (see > > tn9). However, when sorting, at at least the case and accent level, plus > > punctuation (if applicable), they should not be ignored in sorting, although may > > be ignored in searching. > > > > The "ignore punctuation" option really means "make punctuation a really weak > > strength". > > I need to think about this. I don't see the harm to users in simply ignoring > some characters, if you never intend to distinguish them in queries, and your > users do not depend on their being ordered. I do know indeterminacy makes life > difficult from a testing and support perspective (harder to do regression tests > and difficult to reproduce certain problems) but that is a separate issue. > > And there is a cost in effort and performance to making the values weakly > distinctive. > I can see the performance cost is minimal given that the number of values is > small as I postulated, but I still wonder if its worth the trouble. > > What I can see, is that if you have already created and tested a module that > sorts fully and with determinacy, then it is silly not to use it everywhere > rather than create a new one that is more limited. > > But many folks are in the position of needing a sort routine for a drop-down > and the like and don't have the full one handy. > I guess I should just recommend ICU. ;-) > > > > 5) on abbreviations, I am not sure how often expansion is used, I do know it > > is > > > done. If by limited environment you mean with respect to certain fields rather > > > than applied to all fields, yes of course. > > > > It is done, but what I mean by "limited environment" is that "replace all > > acronyms" makes no sense unless you have a very limited vocabulary, in a very > > restricted domain, like "Dental Supplies". How are you supposed to sort "ICU" > > when it could have the following expansions??? > > > > http://www.acronymfinder.com/af-query.asp?String=exact&Acronym=ICU > > ok, yes. > > > > > > > > > on abbreviations in a multilingual context: well yes, collation is language > > > sensitive and I was looking to indicate there are problems due to using narrow > > > language-sensitive operations in a multilingual space. Maybe you can suggest > > > some better examples? > > > I think this is an area where Web Services may get tripped up. > > > > I think it just needs to be removed. Mixing operations from different languages > > will always get you into trouble; this has nothing really to do with the example > > you want to make. > > Is there never a requirement to search for data in a multilingual database and > have multiple language rules operating rather than a single user preference? > > If I look for addresses of tex texin, I want "ma" to expand to match > massachusetts and for my French chateau to have its address with "za" expand to > Zone d'activité, so I can look for a match of "tex and (zone or > massachusetts)". > > (Don't ask why my chateau is in a ZA... ;-) ) > > I would think a database of multinational and multilingual addresses must use > different dictionaries of abbreviations according to the country the address is > for. > tex > >
Received on Tuesday, 11 May 2004 21:46:35 UTC