Re: an interesting non-entailment from Andy Seaborne on 2023-07-28 (public-rdf-star-wg@w3.org from July 2023)

From: Andy Seaborne <andy@apache.org>
Date: Fri, 28 Jul 2023 16:22:16 +0100
To: public-rdf-star-wg@w3.org
Message-ID: <d5fca672-9815-a58c-191b-21647eb10459@apache.org>
On 28/07/2023 15:06, Pierre-Antoine Champin wrote:
> 
> On 27/07/2023 13:15, Andy Seaborne wrote:
>> On 27/07/2023 10:37, Pierre-Antoine Champin wrote:
>>>
>>> On 21/07/2023 21:59, Peter F. Patel-Schneider wrote:
>>>> As far as I can tell,
>>>>
>>>> :a :h "x"@EN {| :accordingTo :e |} .
>>>>
>>>> does not entail
>>>>
>>>> :a :h "x"@en {| :accordingTo :e |} .
>>>>
>>>> in the community group semantics, even if the underlying semantics 
>>>> is the RDFS semantics.
>>
>> This is covered in D-entailment.
>>
>> https://www.w3.org/TR/rdf12-semantics/#D_interpretations
>>
>> so it is similar to the case of "5"^^xsd:integer and "05"^^xsd:integer.
> 
> Since literals in quoted triples are opaque in the CG report, 
> D-entailment does not "fix", as illustrated in Example 38:
> 
> https://www.w3.org/2021/12/rdf-star.html#ref-opacity-annotation

Right - it is the same situation as the numeric one previously mentioned 
in the CG. I'm not claiming it fixes it (it doesn't!).

>> The difficulty I have is why deal with language tags one way and XSD 
>> numbers another way. RDF Concepts, which mentions "Core types" 
>> xsd:decimal and xsd:integer.
...

> Note that this does not prevent implementation to preserve the original 
> case (e.g. "en-US") to respect users preferences.

When retrieving from storage you don't know if the data is in the 
results, or if it's going to be tested, or both. Examples below.

>>> Gregg's PR #48 on rdf12-concepts fixes this [1] by making the 
>>> conversion to lower case part of the comparison for term equality.
>>
>> A change here is one that will affect existing stored data. But if we 
>> could solve this once and for all, that would be good.
> 
> Will it, though? I ran the following SPARQL queries on a number of 
> implementations :
> 
>    SELECT (sameTerm("a"@en, "a"@EN) as ?test) {}
> 
> and all of them (Jena, RDFlib (python), Ruby RDF, GraphDB, Oxigraph, 
> Comunica) returned true, except one (Virtuoso).

There's a SPARQL test for that: strlang03
Jena isn't consistent in all possible cases.

That query doesn't touch storage which may make a difference.


Other tests (also not storage) are:

SELECT (count(DISTINCT *) AS ?C) { VALUES ?x {"a"@EN  "a"@en } }

SELECT ?lang_x ?lang_y {
    VALUES (?x ?y) {("a"@EN  "a"@en)}
    BIND(LANG(?x) AS ?lang_x)
    BIND(LANG(?y) AS ?lang_y)
    ## maybe tests on ?lang_x and ?lang_y
}

And loading data:

<x:s1>  <x:p1> "abc"@en .
<x:s2>  <x:p1> "abc"@EN .

SELECT ?s { ?s ?p ?z . FILTER(sameTerm(?z, "abc"@en)) }
# Passed into the FILTER:
SELECT ?s {
     ?s ?p ?z .
     VALUES ?C {"abc"@en}
     FILTER(sameTerm(?z, ?C))
     }

SELECT ?s { ?s ?p ?z . FILTER( ?z = "abc"@en ) }

Jena returns 1 row for the first two because it is working on exact 
presentation in storage (it is a bug in the optimizer) and 2 rows for 
the second (because "=" includes "sameTerm").

I would like this all to go away :-) but there are lots of deployments 
nowadays and so any change can have an impact.

The charter has something to same here (as did RDF 1.1 and SPARQL 1.1 
charters) [*]

>> We could produce a "best practice" note/document/...
>> We could conduct a community survey.
> +1

For both XSD and language tag (XSD cases being more common), I've been 
recommending canonicalizing on input to get consistent and explainable 
behaviour.

     Andy

[*]
"Compatibility means deliberately repeating other people's mistakes."

https://quotepark.com/quotes/2112031-david-wheeler-computer-scientist-compatibility-means-deliberately-repeating-other-p/
Received on Friday, 28 July 2023 15:22:24 UTC