Re: Invariant language in collations and sorting etc in general

Abel, you posted to the XSL WG list a question that arose from Tuesday's joint meeting: I'm responding to both lists.

I think that what you are asking for here, at least as far as collation is concerned, is sorting according to DUCET without any tailoring. DUCET is the default unicode collation element table, and is a dataset referenced normatively from Unicode TR10, which defines UCA. Language-based tailorings of UCA generally refine the ordering of characters commonly used in the language in question, but use the DUCET ordering for all other characters (e.g. the tailoring for Swedish will use the DUCET order for Kanji and Arabic unchanged).

This raises the question as to whether you can explicitly request untailored DUCET ordering using our UCA collation URI syntax. I think the answer is that you can specify "lang=root". LDML (defined in Unicode TR35) uses the locale name "root" for this purpose; it is a valid xs:language code, and we define the meanings of the values for the "lang" keyword by reference to LDML. 

There's a slight gap in this story, which is that we don't formally link our "lang" keyword to the LDML "language_id" parameter.

Michael Kay
Saxonica


> On 27 Sep 2016, at 17:53, Abel Braaksma <abel.braaksma@xs4all.nl> wrote:
> 
> The question I rose today during the telcon was whether we have a notion of an invariant language. That is, a language setting, or language code that makes your stylesheet, xquery or xpath expression run invariantly and the same regardless of the host language settings.
>  
> This matters, for instance if you have a (compiled) stylesheet that you write on, say, a Dutch computer, and then running it on a server that is hosted in the US. You may know, or not know as a programmer that your processor defaults to the language of its host environment, either way, you'd like to make sure your stylesheet runs indifferent of the locale (assuming, of course, that as a programmer you do not want a specific locale).
>  
> Do we have a language code or other means for this? Something similar to xml:lang="invariant"?
>  
> Cheers,
> Abel

Received on Wednesday, 28 September 2016 19:23:00 UTC