Re: Comment on ORDER BY and language tags

Thanks for responding, Andy.

> My summary:
> --------
> 1/ Language tags - suggestion to add to the SPARQL language spec  
> that there be a defined ordering on language tags.

Not necessarily, IMO -- only that situations such as comparing "xyz"  
and "xyz"@en have their outcome described explicitly in the standard  
(whether that's "language tags are used as tiebreaks" or "the outcome  
is undefined"). I'm perfectly fine with the commandment "use STR(),  
that's what it's there for", but I don't think it's helpful for  
implementers to have to dig through the standard, piecing together  
wordings, to figure out what's supposed to happen.

> 2/ Query about extensibility
> Do extensions to the operator table apply to ORDER BY?

Yes. Additionally, if the answer is yes (which you imply below), it  
is probable that an extended implementation will produce results with  
a different ordering than that produced by an unextended  
implementation, even disregarding regions of the result sequence  
where ordering is undefined. This is because of the "try the next  
clause" behavior of ORDER BY. I'd like to have someone else's  
consideration of that possibility, and for the outcome to be  
considered in the standard, as it's an important consequence of this  
extensibility.

> 3/ A request for a test case.
> --------
>
> Note that this is tied to the operator table in sec 11 and what  
> happens with "<" for literals with a language tag.
>
> My understanding (Eric - can you confirm?) is that "<" on two  
> literals with language tags is undefined and leads to an error.  As  
> it's an error, an implementation is free to add "<".

... except that in an unextended implementation,

ORDER BY ?x ?y

will order by ?y when ?x is a literal with a language tag. In an  
extended implementation, the order will be defined by ?x.

> Is it then legal to add "<" on unlike language tags so ordering  
> works out?
>
> Hopefully, that's a "yes"

As long as the consequences of such are written down somewhere :)

> As we don't require SPARQL engines to handle language tags, we  
> can't add them to the ORDER BY section without adding them into the  
> operator table, then the 'use "<"' rule catches them.
>
> Stephane refers to ARQ but there I preferred to give a total orders  
> results which is deterministic always but is implementation  
> dependent - for language tags it treats language tags as some sort  
> of value space, named by lowercased language tag.  If all else is  
> equals, it's lexical order of same language tag i.e. "x"@EN <  
> "x"@en.  Similar for unknown datatype URIs and blank nodes there  
> are sorting rules to put everything into a fixed order (which may  
> change, but is very unlikely to, if you re-read the graphs with  
> FROM - that's blank nodes for you).

That's the approach I've taken with twinql, but, for complex ordering  
expressions which rely on type errors, it's important to know what  
the specified behavior should be.

> For the test case: I found:
> http://www.w3.org/2001/sw/DataAccess/tests/data/OpenWorld/open- 
> eq-07.rq
> on
> http://www.w3.org/2001/sw/DataAccess/tests/data/OpenWorld/data-2.ttl
> which is close but not quite right

Indeed. It would be helpful to have one which uses a single binding,  
but features every possible kind of literal, such that a complete  
example of ordering is provided. (As a corollary to this, perhaps the  
regions for which no ordering is defined should be demarcated...)
Here's one attached, though I don't know if the ordering is correct  
according to the standard!


Thanks again for your response.

-R

Received on Friday, 23 February 2007 17:58:27 UTC