RE: [IndexedDB] Languages for collation from Pablo Castro on 2010-08-12 (public-webapps@w3.org from July to September 2010)

From: Pablo Castro <Pablo.Castro@microsoft.com>
Date: Thu, 12 Aug 2010 07:28:59 +0000
To: Mikeal Rogers <mikeal.rogers@gmail.com>
CC: public-webapps WG <public-webapps@w3.org>
Message-ID: <F753B2C401114141B426DB383C8885E05901A06D@TK5EX14MBXC126.redmond.corp.microsoft.>

	
From: Mikeal Rogers [mailto:mikeal.rogers@gmail.com] 
Sent: Wednesday, August 11, 2010 11:35 PM

>> Why not just use the unicode collation algorithm?
>>
>> Then you won't have to hint the locale.

Unless I'm missing something, the UCA defines the general algorithm for collating strings but you still need to know the language in order to sort strings properly in that language. For example, in Spanish the letters "c" and "h"  together (e.g. in "chau" (bye)) sort as a single letter, causing the expected sort order to be different from English where they are always two independent letters (e.g. so "chau" comes before "cuando" (when) when sorted in English, but after when sorted in Spanish).

>>
>> http://en.wikipedia.org/wiki/Unicode_collation_algorithm
>>
>> CouchDB uses some definitions around sorting complex types like arrays and objects but when it comes down to sorting strings it just defaults to to the unicode collation algorithm and all the locale's are happy.
>>
>> -Mikeal
>>
>> On Wed, Aug 11, 2010 at 11:28 PM, Pablo Castro <Pablo.Castro@microsoft.com> wrote:
>> We had some discussions about collation algorithms and such in the past, but I don't think we have settled on the language aspect of it. In order to have stores and indexes sort character-based keys in a way that is consistent with users' expectations we'll have to take indication in the API of what language we should use to collate strings.
>>
>> Trying to take a minimalist approach, we could add an optional parameter on the database open call that indicates the language to use (e.g. "en" or "en-UK", etc.). If the language is not specified and the database does not exist, then we can use the current browser/OS language to create the database. If not specified and database already exists, then use the one it's already there (this accommodates the fact that a user may be able to change their default language in the browser/OS after the database has been created using the default). If the language is specified and the database already exists and the specified language is not the one the database has then we'll throw an exception (same behavior as with "description", although we have that one in flight right now as well).
>>
>> We should probably also add a read-only attribute to the database object that exposes the language.
>>
>> If this works for folks I can write a proposal for the specific changes to the spec.
>>
>> Thanks
>> -pablo

Received on Thursday, 12 August 2010 07:29:36 UTC