W3C home > Mailing lists > Public > public-i18n-ws@w3.org > May 2004

Re: sec 4.11

From: Tex Texin <tex@xencraft.com>
Date: Tue, 11 May 2004 20:55:49 -0400
Message-ID: <40A17615.459C758C@xencraft.com>
To: Mark Davis <mark.davis@jtcsv.com>
Cc: "Addison Phillips [wM]" <aphillips@webmethods.com>, Web Services <public-i18n-ws@w3.org>



Mark Davis wrote:
> > 1) ok on "suggestive". I had in mind that someone might guess if the locale
> was
> > Spain vs. Latin America, whether to use Modern or Trad., so it was indicative
> > of collation in the same "loose" way PRC vs ROC is indicative of Simplified vs
> > Trad. Chinese. It is not only that it is not specified but it can't (or
> > shouldn't) be inferred.
> 
> hmmm. Still uneasy. What happens in fact is that whenever we (or anyone else)
> gets an underspecified request, we *have* to do something. So for example, if we
> get a naked 'zh' on a resource lookup, then we have to pick either 'zh-Hant' or
> 'zh-Hans' (logically, if not physically). If we get es-BO on a collation
> request, then we have to choose *some* sorting. And the default sort may be
> different for different locales, that is true. But "suggestive" is not
> indicative of the true process.
> 
> Of course, where there is a handshake, one can communicate back: "Don't know
> what you mean by "es-BO": pick either "es-BO-x-collationtraditional" or
> "es-BO-x-collationmodern". But normally that option n'exist pas.

ok. It's like infer and imply. The identifier is not really suggestive, but
people with a need to make a decision infer what they hope is implied.

I'll fix it in the next version.

> > Mark, You make an interesting distinction that strength makes a feature more
> or
> > less important not necessarily ignorable.
> > There probably should be a separate name for that dimension of collation
> > (ignoring characters, or indeterminacy) to distinguish it from strength.
> > Most of the sorts I have worked with don't implement strength fully and leave
> > the sort indeterminate.
> 
> And most should. A fully determinate sort is not generally worth the cost (see
> tn9). However, when sorting, at at least the case and accent level, plus
> punctuation (if applicable), they should not be ignored in sorting, although may
> be ignored in searching.
> 
> The "ignore punctuation" option really means "make punctuation a really weak
> strength".

I need to think about this. I don't see the harm to users in simply ignoring
some characters, if you never intend to distinguish them in queries, and your
users do not depend on their being ordered. I do know indeterminacy makes life
difficult from a testing and support perspective (harder to do regression tests
and difficult to reproduce certain problems) but that is a separate issue.

And there is a cost in effort and performance to making the values weakly
distinctive.
I can see the performance cost is minimal given that the number of values is
small as I postulated, but I still wonder if its worth the trouble.

What I can see, is that if you have already created and tested a module that
sorts fully and with determinacy, then it is silly not to use it everywhere
rather than create a new one that is more limited.

But many folks are in the position of needing a sort routine for a drop-down
and the like and don't have the full one handy.
I guess I should just recommend ICU. ;-)

> > 5) on abbreviations, I am not sure how often expansion is used, I do know it
> is
> > done. If by limited environment you mean with respect to certain fields rather
> > than applied to all fields, yes of course.
> 
> It is done, but what I mean by "limited environment" is that "replace all
> acronyms" makes no sense unless you have a very limited vocabulary, in a very
> restricted domain, like "Dental Supplies". How are you supposed to sort "ICU"
> when it could have the following expansions???
> 
> http://www.acronymfinder.com/af-query.asp?String=exact&Acronym=ICU

ok, yes.

> 
> >
> > on abbreviations in a multilingual context: well yes, collation is language
> > sensitive and I was looking to indicate there are problems due to using narrow
> > language-sensitive operations in a multilingual space. Maybe you can suggest
> > some better examples?
> > I think this is an area where Web Services may get tripped up.
> 
> I think it just needs to be removed. Mixing operations from different languages
> will always get you into trouble; this has nothing really to do with the example
> you want to make.

Is there never a requirement to search for data in a multilingual database and
have multiple language rules operating rather than a single user preference? 

If I look for addresses of tex texin, I want "ma" to expand to match 
massachusetts and for my French chateau to have its address with "za" expand to
Zone d'activité, so I can look for a match of "tex and (zone or
massachusetts)".

(Don't ask why my chateau is in a ZA... ;-) )

I would think a database of multinational and multilingual addresses must use
different dictionaries of abbreviations according to the country the address is
for.
tex
Received on Tuesday, 11 May 2004 20:57:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:53 GMT