Re: an interesting non-entailment from Pierre-Antoine Champin on 2023-07-28 (public-rdf-star-wg@w3.org from July 2023)

From: Pierre-Antoine Champin <pierre-antoine@w3.org>
Date: Fri, 28 Jul 2023 16:06:55 +0200
To: Andy Seaborne <andy@apache.org>, public-rdf-star-wg@w3.org
Message-ID: <3b577ffa-6923-1cc1-d606-835973dfaedb@w3.org>


On 27/07/2023 13:15, Andy Seaborne wrote:
> On 27/07/2023 10:37, Pierre-Antoine Champin wrote:
>>
>> On 21/07/2023 21:59, Peter F. Patel-Schneider wrote:
>>> As far as I can tell,
>>>
>>> :a :h "x"@EN {| :accordingTo :e |} .
>>>
>>> does not entail
>>>
>>> :a :h "x"@en {| :accordingTo :e |} .
>>>
>>> in the community group semantics, even if the underlying semantics 
>>> is the RDFS semantics.
>
> This is covered in D-entailment.
>
> https://www.w3.org/TR/rdf12-semantics/#D_interpretations

>
> so it is similar to the case of "5"^^xsd:integer and "05"^^xsd:integer.

Since literals in quoted triples are opaque in the CG report, 
D-entailment does not "fix", as illustrated in Example 38:

https://www.w3.org/2021/12/rdf-star.html#ref-opacity-annotation


>
> The difficulty I have is why deal with language tags one way and XSD 
> numbers another way. RDF Concepts, which mentions "Core types" 
> xsd:decimal and xsd:integer.
>
>> It boils down to deciding whether "x"@EN and "x"@en are the /same/ 
>> literal (syntactically) or not.
>
> For some users, the appearance of language tags is important. They 
> would want the preferred language tag formatting although (from my 
> discussions, lowercase is probably better than the status quo but 
> following the RFC is preferable).
>
> "en-US" not "en-us"
>
> https://datatracker.ietf.org/doc/html/rfc5646#section-2.1.1

>
> has a format that does not require registry access.
>
> The RFC does say:
> https://datatracker.ietf.org/doc/html/rfc5646#section-4.5

> [[
>    All comparisons MUST be performed in a case-insensitive manner.
> ]]
> which is reflected in RDF 1.1 concepts:
> [[
> The value space of language tags is always in lower case.
> ]]
>
>> I agree with you that a strict reading of RDF 1.1 leads to the 
>> conclusion that they are not :
>> literal are consider the same "if the two lexical forms, the two 
>> datatype IRIs, and the two language tags (if any) compare equal, 
>> character by character." However, the definition of literal says: 
>> "Lexical representations of language tags /MAY/ be converted to lower 
>> case."  Meaning that the literals are not the same, but people are 
>> allowed to arbitrarily replace one by the other at any time... In my 
>> view, this is a bug.
>
> Which part do you consider the bug? The "MAY" transformation? It is 
> odd that this one case is called out but other D-interpretations are not.

I consider it a bug that literals differing only in the case of their 
language tag are not explicitly considered the same term (i.e. 
syntactical equality, not just semantic equality under D-entailment).

Note that this does not prevent implementation to preserve the original 
case (e.g. "en-US") to respect users preferences.

>
>> Gregg's PR #48 on rdf12-concepts fixes this [1] by making the 
>> conversion to lower case part of the comparison for term equality.
>
> A change here is one that will affect existing stored data. But if we 
> could solve this once and for all, that would be good.

Will it, though? I ran the following SPARQL queries on a number of 
implementations :

   SELECT (sameTerm("a"@en, "a"@EN) as ?test) {}

and all of them (Jena, RDFlib (python), Ruby RDF, GraphDB, Oxigraph, 
Comunica) returned true, except one (Virtuoso).

>
> We could produce a "best practice" note/document/...
> We could conduct a community survey.
+1
>
>>
>>    pa
> >
>> [1] 
>> https://github.com/w3c/rdf-concepts/pull/48/files#diff-97efdc0e30285fae4d5cb7cc4a510dbaa7c33f4e56d4216dcde109ad2cdef15fL735

>>
>>>
>>>
>>> peter
>>>
>

Attachments

application/pgp-keys attachment: OpenPGP public key

Received on Friday, 28 July 2023 14:07:25 UTC