Re: lang-case-sensitivity from Lee Feigenbaum on 2007-06-19 (public-rdf-dawg@w3.org from April to June 2007)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Mon, 18 Jun 2007 23:42:38 -0400
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <OFCF493FAC.554E234C-ON852572FF.000255FF-852572FF.0014602A@us.ibm.com>
"Seaborne, Andy" <andy.seaborne@hp.com> wrote on 06/18/2007 05:18:24 PM:

> Lee Feigenbaum wrote (Agenda for 19/June/2007)
> ...
> > 3. Test progress
> ...
> > I'd also like to look at the language tag case sensitivity tests that 
we 
> > didn't approve last
> > week:
> > 
> > data-r2/open-world/manifest-lang-case-sensitivity.ttl
> > 
> > Given time, we'll find other tests to work through and approve.
> 
> ARQ fails the first test "lang-case-sensitivity" because lang tags are 
> compared in filters in a case insensitive way.

(We're talking about 
http://www.w3.org/2001/sw/DataAccess/tests/data-r2/open-world/lang-case-sensitivity-eq.rq 
right?)

I agree; I think the test is incorrect: there should be 4 solutions. 

We've discussed this before, but for the record (so we have a URI), here's 
my reading of the specs:

We're testing plain literals with language tags with =. So we look through 
the table in 11.3. The = entry for simple literals doesn't apply since the 
literals have language tags. So we drop down to the = entry for RDF terms, 
which defers to RDFTerm-equal ( 
http://www.w3.org/TR/rdf-sparql-query/#func-RDFterm-equal ). RDFterm-equal 
passes the buck to 6.5.1 Literal Equality from RDF Concepts ( 
http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality ), where we 
find, among other things, "The language tags, if any, compare equal." 
Looking directly above 6.5.1 (in the intro to 6.5 RDF Literals), we see 
that "Plain literals have a lexical form and optionally a language tag as 
defined by [RFC-3066], normalized to lowercase." And finally, most to the 
point:

"""
Note: The case normalization of language tags is part of the description 
of the abstract syntax, and consequently the abstract behaviour of RDF 
applications. It does not constrain an RDF implementation to actually 
normalize the case. Crucially, the result of comparing two language tags 
should not be sensitive to the case of the original input.
"""

...from which I reach my conclusion that the objects of the two triples in 
http://www.w3.org/2001/sw/DataAccess/tests/data-r2/open-world/lang-case-sensitivity.ttl 
are RDFterm-equal.

So the query should give a result for each pair of bindings from the two 
triples - i.e., 4 results.

lang-case-insensitive-eq seems to be the correct version of the test.

> 
> I think the last test "lang-case-insensitive-ne" is not making the point 
the 
> naming suggests.
> 
> lang-case-insensitive-ne.srx
>    and
> lang-case-sensitive-ne.srx
> 
> are the same (no rows).  The tests form the cross product of the triples 
and 
> then filter:
> 
> SELECT *
> {
>      ?x1 :p ?v1 .
>      ?x2 :p ?v2 .
>      FILTER ( ?v1 != ?v2 )
> }
> 
> 
> I'd expect lang-case-insensitive-ne.srx to record the cases of
> 'xyz'@en != 'xyz'@EN and 'xyz'@EN!= 'xyz'@en
> 
> -----------------------------------
> | x1  | v1       | x2  | v2       |
> ===================================
> | :x3 | "xyz"@EN | :x2 | "xyz"@en |
> | :x2 | "xyz"@en | :x3 | "xyz"@EN |
> -----------------------------------

Hmm? I'd expect that to be the case of lang-case-*sensitive* - when using 
case insensitivity, all of the pairs compare equal, so the result should 
be the empty set.

In any case, as far as the spec goes and approved tests go, I think that 
we should be approving both of the *insensitive tests.

Lee

>    Andy
> 
> 
> -- 
>   Hewlett-Packard Limited
>   Registered Office: Cain Road, Bracknell, Berks RG12 1HN
>   Registered No: 690597 England
Received on Tuesday, 19 June 2007 03:42:51 UTC