- From: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Date: Fri, 28 Jul 2023 16:06:55 +0200
- To: Andy Seaborne <andy@apache.org>, public-rdf-star-wg@w3.org
- Message-ID: <3b577ffa-6923-1cc1-d606-835973dfaedb@w3.org>
On 27/07/2023 13:15, Andy Seaborne wrote: > On 27/07/2023 10:37, Pierre-Antoine Champin wrote: >> >> On 21/07/2023 21:59, Peter F. Patel-Schneider wrote: >>> As far as I can tell, >>> >>> :a :h "x"@EN {| :accordingTo :e |} . >>> >>> does not entail >>> >>> :a :h "x"@en {| :accordingTo :e |} . >>> >>> in the community group semantics, even if the underlying semantics >>> is the RDFS semantics. > > This is covered in D-entailment. > > https://www.w3.org/TR/rdf12-semantics/#D_interpretations > > so it is similar to the case of "5"^^xsd:integer and "05"^^xsd:integer. Since literals in quoted triples are opaque in the CG report, D-entailment does not "fix", as illustrated in Example 38: https://www.w3.org/2021/12/rdf-star.html#ref-opacity-annotation > > The difficulty I have is why deal with language tags one way and XSD > numbers another way. RDF Concepts, which mentions "Core types" > xsd:decimal and xsd:integer. > >> It boils down to deciding whether "x"@EN and "x"@en are the /same/ >> literal (syntactically) or not. > > For some users, the appearance of language tags is important. They > would want the preferred language tag formatting although (from my > discussions, lowercase is probably better than the status quo but > following the RFC is preferable). > > "en-US" not "en-us" > > https://datatracker.ietf.org/doc/html/rfc5646#section-2.1.1 > > has a format that does not require registry access. > > The RFC does say: > https://datatracker.ietf.org/doc/html/rfc5646#section-4.5 > [[ > All comparisons MUST be performed in a case-insensitive manner. > ]] > which is reflected in RDF 1.1 concepts: > [[ > The value space of language tags is always in lower case. > ]] > >> I agree with you that a strict reading of RDF 1.1 leads to the >> conclusion that they are not : >> literal are consider the same "if the two lexical forms, the two >> datatype IRIs, and the two language tags (if any) compare equal, >> character by character." However, the definition of literal says: >> "Lexical representations of language tags /MAY/ be converted to lower >> case." Meaning that the literals are not the same, but people are >> allowed to arbitrarily replace one by the other at any time... In my >> view, this is a bug. > > Which part do you consider the bug? The "MAY" transformation? It is > odd that this one case is called out but other D-interpretations are not. I consider it a bug that literals differing only in the case of their language tag are not explicitly considered the same term (i.e. syntactical equality, not just semantic equality under D-entailment). Note that this does not prevent implementation to preserve the original case (e.g. "en-US") to respect users preferences. > >> Gregg's PR #48 on rdf12-concepts fixes this [1] by making the >> conversion to lower case part of the comparison for term equality. > > A change here is one that will affect existing stored data. But if we > could solve this once and for all, that would be good. Will it, though? I ran the following SPARQL queries on a number of implementations : SELECT (sameTerm("a"@en, "a"@EN) as ?test) {} and all of them (Jena, RDFlib (python), Ruby RDF, GraphDB, Oxigraph, Comunica) returned true, except one (Virtuoso). > > We could produce a "best practice" note/document/... > We could conduct a community survey. +1 > >> >> pa > > >> [1] >> https://github.com/w3c/rdf-concepts/pull/48/files#diff-97efdc0e30285fae4d5cb7cc4a510dbaa7c33f4e56d4216dcde109ad2cdef15fL735 >> >>> >>> >>> peter >>> >
Attachments
- application/pgp-keys attachment: OpenPGP public key
Received on Friday, 28 July 2023 14:07:25 UTC