RE: [IndexedDB] Spec changes for international language support from Pablo Castro on 2011-03-18 (public-webapps@w3.org from January to March 2011)

From: Pablo Castro <Pablo.Castro@microsoft.com>
Date: Fri, 18 Mar 2011 19:29:15 +0000
To: Keean Schupke <keean@fry-it.com>, Jonas Sicking <jonas@sicking.cc>
CC: Jungshik Shin <jshin@chromium.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, public-webapps WG <public-webapps@w3.org>
Message-ID: <F108E2F6BA743C4696146F0B7111C2610B611D@TK5EX14MBXC245.redmond.corp.microsoft.co>
From: keean.schupke@googlemail.com [mailto:keean.schupke@googlemail.com] On Behalf Of Keean Schupke
Sent: Friday, March 18, 2011 1:53 AM

>> See my proposal in another thread. The basic idea is to copy BDB. Have a primary index that is based on an integer, something primitive and fast. Allow secondary indexes which use a callback to generate a binary index key. IDB shifts the complexity out into a library. Common use cases can be provided (a hash of all fields in the object, internationalised bidirectional lexicographic etc...), but the user is free to write their own for less usual cases (for example indexing by the last word in a name string to order by surname).

I agree with Jeremy's comments on the other thread for this. Having the callback mechanism definitely sounds interesting but there are a ton of common cases that we can solve by just taking a language identifier, I'm not sure we want to make people work hard to get something that's already supported in most systems. The idea of having a callback to compute the index value feels incremental to this, so we could take on it later on without disrupting the explicit international collation stuff.

>> On 18 March 2011 02:19, Jonas Sicking <jonas@sicking.cc> wrote:
>> 2011/3/17 Pablo Castro <Pablo.Castro@microsoft.com>:
>> >
>> > From: Jonas Sicking [mailto:jonas@sicking.cc]
>> > Sent: Tuesday, March 08, 2011 1:11 PM
>> >
>> >>> All in all, is there anything preventing adding the API Pablo suggests
>> >>> in this thread to the IndexedDB spec drafts?
>> >
>> > I wanted to propose a couple of specific tweaks to the initial proposal and then unless I hear pushback start editing this into the spec.
>> >
>> > From reading the details on this thread I'm starting to realize that per-database collations won't do it. What did it for me was the example that has a fuzzier matching mode (case/accent insensitive). This is exactly the kind of index I would want to sort people's names in my address book, but most likely not the index I'll want to use for my primary key.
>> >
>> > Refactoring the API to accommodate for this would mean to move the setCollation() method and the collation property to the object store and index objects. If we were willing to live without the ability to change them we could take collation as one of the optional parameters to createObjectStore()/createIndex() and reduce a bit of surface area...
>> Unfortunately I think you bring up good use cases for
>> per-objectStore/index collations. It's definitely tempting to just add
>> it as a optional parameter to createObjectStore/createIndex. The
>> downside is obviously pushing more complexity onto web developers.
>> Complexity which will be duplicated across sites.
>>
>> However there is another problem to consider here. Can switching
>> collation on a objectStore or a unique index can affect its validity?
>> I.e. if you switch from a case sensitive to a case insensitive
>> collation, does that mean that if you have two entries with the
>> primary keys "Sweden" and "sweden" they collide and thus the change of
>> collation must result in an error (or aborted transaction)?
>>
>> I do seem to recall that there are ways to do at least case
>> sensitivity such that you generally don't take case into account when
>> sorting, unless two entries are exactly the same, in which case you do
>> look at casing to differentiate them. However I don't really know a
>> whole lot about this and so defer to people that know
>> internationalization better.

This is a good point. It makes me lean toward not allowing changing the collation of an index or store. That means we could just have an optional parameter (in the generic parameter object thingy we have now) on createObjectStore and createIndex that indicates the collation name. It seems minimally disruptive, it doesn't tax people that don't care about it, and since there is no setCollation we don't have the problem of not being able to re-index the data.

>> > Another piece of feedback I heard consistently as I discussed this with various folks at Microsoft is the need to be able to pick up what the UA would consider the collation that's most appropriate for the user environment (derived from settings, page language or whatever). We could support this by introducing a special value that  you can pass to setCollation that indicates "pick whatever is the right for the environment's language right now". Given that there is no other way for people to discover the user preference on this, I think this is pretty important.
>> I would be fine with this as long as it's a explicit opt-in. There is
>> definitely a risk that people will do this and then only do testing in
>> one language, but it seems to me like a useful use case to support,
>> and I don't see a way of supporting this while completely avoiding the
>> risk of internationalization bugs.

I agree, it should be opt-in. I still assume we'll default to binary collation (same if you specify the collation value as null). I was reading the BCP 47 [1] and in section 4.1 "Choice of Language Tag" the item #7 seems to describe what we're looking for. The value "i-default" seems to match our needs close enough, so callers could use that value. Discoverability is not great, but we avoid having to specify something new, and arguably they'll need to read somewhere that this argument is a BCP47-compatible value, and we could put a comment about "i-default" right there.

Thanks
-pablo
Received on Friday, 18 March 2011 19:29:51 UTC