RE: [IndexedDB] Languages for collation from Pablo Castro on 2010-08-13 (public-webapps@w3.org from July to September 2010)

From: Pablo Castro <Pablo.Castro@microsoft.com>
Date: Fri, 13 Aug 2010 00:31:19 +0000
To: Jeremy Orlow <jorlow@chromium.org>
CC: Mikeal Rogers <mikeal.rogers@gmail.com>, public-webapps WG <public-webapps@w3.org>
Message-ID: <F753B2C401114141B426DB383C8885E05901C8B8@TK5EX14MBXC126.redmond.corp.microsoft.>

From: jorlow@google.com [mailto:jorlow@google.com] On Behalf Of Jeremy Orlow
Sent: Thursday, August 12, 2010 2:18 AM

>> I think we should first break down the use cases and look at how many of them just need _a_ sort order, how many of them a per-database sort order is ok, and how many of them would need something finer grained (like a per-key ordering).

That's reasonable. What I was thinking is that any case where you'll use the order of items in a store/index to display things to the user (e.g. a list of contacts) you'd want the items to be in proper order  for the user's language. That will not only match users' expectations but also match other applications (or even other parts of the UA) that display data based on the current OS language or the users' choice of language. 

That covers a very broad spectrum of scenarios that need language-specific sort order. 

I find it unlikely that a single web app will need more than one language per database (or even per origin/OS account), given that most applications operate in a single language at any one point in time. 

>> Are there work-arounds for getting an UCA ordered data structure to hold data other language's order?  For example, I could imagine it'd be possible to do some sort of encode step on the data before insertion (and decode on removal) that would make UCA work.  I have no idea, but if such algorithms existed and were well understood, then it'd definitely make me lean towards punting language specification to v2.

I'm not sure I understand this paragraph. "UCA ordered" may not mean much more than just ordering using a binary collation if the language is not specified. While this is typically not an issue in English, in other languages this introduces a varying level of deviation from users' expectations. Given that different languages have conflicting rules for collation, I'm not sure how this can be generalized independently of the language. Even in the UCA specification [1] the aspect of input language is mentioned as the most important feature of collation.

[1] http://www.unicode.org/reports/tr10/

Received on Friday, 13 August 2010 00:31:54 UTC