- From: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Date: Fri, 28 Jul 2023 16:06:55 +0200
- To: Andy Seaborne <andy@apache.org>, public-rdf-star-wg@w3.org
- Message-ID: <3b577ffa-6923-1cc1-d606-835973dfaedb@w3.org>
On 27/07/2023 13:15, Andy Seaborne wrote:
> On 27/07/2023 10:37, Pierre-Antoine Champin wrote:
>>
>> On 21/07/2023 21:59, Peter F. Patel-Schneider wrote:
>>> As far as I can tell,
>>>
>>> :a :h "x"@EN {| :accordingTo :e |} .
>>>
>>> does not entail
>>>
>>> :a :h "x"@en {| :accordingTo :e |} .
>>>
>>> in the community group semantics, even if the underlying semantics
>>> is the RDFS semantics.
>
> This is covered in D-entailment.
>
> https://www.w3.org/TR/rdf12-semantics/#D_interpretations
>
> so it is similar to the case of "5"^^xsd:integer and "05"^^xsd:integer.
Since literals in quoted triples are opaque in the CG report,
D-entailment does not "fix", as illustrated in Example 38:
https://www.w3.org/2021/12/rdf-star.html#ref-opacity-annotation
>
> The difficulty I have is why deal with language tags one way and XSD
> numbers another way. RDF Concepts, which mentions "Core types"
> xsd:decimal and xsd:integer.
>
>> It boils down to deciding whether "x"@EN and "x"@en are the /same/
>> literal (syntactically) or not.
>
> For some users, the appearance of language tags is important. They
> would want the preferred language tag formatting although (from my
> discussions, lowercase is probably better than the status quo but
> following the RFC is preferable).
>
> "en-US" not "en-us"
>
> https://datatracker.ietf.org/doc/html/rfc5646#section-2.1.1
>
> has a format that does not require registry access.
>
> The RFC does say:
> https://datatracker.ietf.org/doc/html/rfc5646#section-4.5
> [[
> All comparisons MUST be performed in a case-insensitive manner.
> ]]
> which is reflected in RDF 1.1 concepts:
> [[
> The value space of language tags is always in lower case.
> ]]
>
>> I agree with you that a strict reading of RDF 1.1 leads to the
>> conclusion that they are not :
>> literal are consider the same "if the two lexical forms, the two
>> datatype IRIs, and the two language tags (if any) compare equal,
>> character by character." However, the definition of literal says:
>> "Lexical representations of language tags /MAY/ be converted to lower
>> case." Meaning that the literals are not the same, but people are
>> allowed to arbitrarily replace one by the other at any time... In my
>> view, this is a bug.
>
> Which part do you consider the bug? The "MAY" transformation? It is
> odd that this one case is called out but other D-interpretations are not.
I consider it a bug that literals differing only in the case of their
language tag are not explicitly considered the same term (i.e.
syntactical equality, not just semantic equality under D-entailment).
Note that this does not prevent implementation to preserve the original
case (e.g. "en-US") to respect users preferences.
>
>> Gregg's PR #48 on rdf12-concepts fixes this [1] by making the
>> conversion to lower case part of the comparison for term equality.
>
> A change here is one that will affect existing stored data. But if we
> could solve this once and for all, that would be good.
Will it, though? I ran the following SPARQL queries on a number of
implementations :
SELECT (sameTerm("a"@en, "a"@EN) as ?test) {}
and all of them (Jena, RDFlib (python), Ruby RDF, GraphDB, Oxigraph,
Comunica) returned true, except one (Virtuoso).
>
> We could produce a "best practice" note/document/...
> We could conduct a community survey.
+1
>
>>
>> pa
> >
>> [1]
>> https://github.com/w3c/rdf-concepts/pull/48/files#diff-97efdc0e30285fae4d5cb7cc4a510dbaa7c33f4e56d4216dcde109ad2cdef15fL735
>>
>>>
>>>
>>> peter
>>>
>
Attachments
- application/pgp-keys attachment: OpenPGP public key
Received on Friday, 28 July 2023 14:07:25 UTC