Re: [IndexedDB] Languages for collation from Jeremy Orlow on 2010-08-13 (public-webapps@w3.org from July to September 2010)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Fri, 13 Aug 2010 09:56:23 +0100
To: Pablo Castro <Pablo.Castro@microsoft.com>
Cc: Mikeal Rogers <mikeal.rogers@gmail.com>, public-webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTi=VCJqwts0sd-LM7cRkCcbS6FttDzNymG+BJFV9@mail.gmail.com>

On Fri, Aug 13, 2010 at 1:31 AM, Pablo Castro <Pablo.Castro@microsoft.com>wrote:

>
> From: jorlow@google.com [mailto:jorlow@google.com] On Behalf Of Jeremy
> Orlow
> Sent: Thursday, August 12, 2010 2:18 AM
>
> >> I think we should first break down the use cases and look at how many of
> them just need _a_ sort order, how many of them a per-database sort order is
> ok, and how many of them would need something finer grained (like a per-key
> ordering).
>
> That's reasonable. What I was thinking is that any case where you'll use
> the order of items in a store/index to display things to the user (e.g. a
> list of contacts) you'd want the items to be in proper order  for the user's
> language. That will not only match users' expectations but also match other
> applications (or even other parts of the UA) that display data based on the
> current OS language or the users' choice of language.
>
> That covers a very broad spectrum of scenarios that need language-specific
> sort order.
>
> I find it unlikely that a single web app will need more than one language
> per database (or even per origin/OS account), given that most applications
> operate in a single language at any one point in time.
>

A lot of people are multi-lingual and I'm sure there will be at least some
apps that need different data sorted in different ways for each language
used.  It's quite likely that such apps could use multiple databases as a
work-around though.  (As long as they don't need to execute transactions
between them.)

> >> Are there work-arounds for getting an UCA ordered data structure to hold
> data other language's order?  For example, I could imagine it'd be possible
> to do some sort of encode step on the data before insertion (and decode on
> removal) that would make UCA work.  I have no idea, but if such algorithms
> existed and were well understood, then it'd definitely make me lean towards
> punting language specification to v2.
>
> I'm not sure I understand this paragraph. "UCA ordered" may not mean much
> more than just ordering using a binary collation if the language is not
> specified. While this is typically not an issue in English, in other
> languages this introduces a varying level of deviation from users'
> expectations. Given that different languages have conflicting rules for
> collation, I'm not sure how this can be generalized independently of the
> language. Even in the UCA specification [1] the aspect of input language is
> mentioned as the most important feature of collation.
>

I understand that.  What I was asking is whether there are hacks to make it
work anyway.  i.e. ways to encode/decode the data going in/out.  In other
words, what's stored as the key would not be exactly the word you put in,
but you'd know how to undo the process on the way out.  After thinking about
it for a couple minutes, I've got some ideas on how to do it, but they're
not terribly lightweight.

Btw, my intuition is also that a database level control is the right way to
go here, but I just want to make sure we've properly considered the pros and
cons of the other possibilities.

J

Received on Friday, 13 August 2010 08:57:13 UTC