Re: ISSUE-12: xs:string VS plain literals: proposed resolution from Antoine Zimmermann on 2011-05-04 (public-rdf-wg@w3.org from May 2011)

From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
Date: Wed, 04 May 2011 17:00:42 +0200
To: public-rdf-wg@w3.org
Message-ID: <4DC16A1A.1060800@insa-lyon.fr>
My understanding of this decision is that it improves interoperability 
by asking data providers to produce character strings in a uniform way. 
The XSD semantics may or may not be implement in RDF systems. So some 
systems "understand" that "foo"^^xs:string is the same as "foo", but 
others would see "foo"^^xs:string as being as meaningful as 
"foo"^^ex:mydatatype. Yet, whether it's a plain literal or an xs:string 
typed literal, the goal is to talk about the character string "foo". So, 
asking all xs:string to be converted to plain literals allows systems 
that do not implement the XSD semantics to recognise character strings 
as such, independently of the datatype. However, this decision should 
not change the semantics, that is, if xsd:string is not converted to a 
plain literal, it should remain a distinct entity, in terms of literal 
equality. It should also remain distinct wrt simple entailment.
More comments below.

Le 04/05/2011 16:08, Lee Feigenbaum a écrit :
> I'd like to understand if the proposed resolution of this issue is
> ("merely") a recommendation, or is a change to RDF syntactic equality.
> In particular, will we be changing
> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality such that
> "foo" and "foo"^^xsd:string are equal literals?

Literal equality is a purely syntactic equality. I see no reason this 
should change.

> Looking at this through SPARQL's eyes (as I am wont to do), one of the
> goals of this change is so that I can write:
>
> SELECT ... { ?s :p "foo" }
>
> and have that match whether the data that was loaded into the store was
> "foo" or "foo"^^xsd:string.

The idea of the decision was to simply "make archaic" things like 
"foo"^^xsd:string, such that it slowly disappear from datasets (but it 
was improperly said that xs:string should be made archaic, thus the 
reformulation). Like other things made archaic, it does not change at 
all the semantics. "foo"^^xsd:string would still be a typed literal, and 
"foo" an untyped literal.
My understanding is that triple store would simply convert 
"foo"^^xs:string to "foo", which would simply make the query

SELECT ...{?s :p "foo"^^xs:string}

return no result under simple entailment. Under XSD entailment, "foo" 
and "foo"^^xs:string are the same so both would match.

>
> Recommending that stores canonicalize to "foo" would be one way to
> accomplish this, but only for new data. (And even then, is only a
> recommendation.) If we changed (or made a SHOULD-style change) literal
> equality, then the above query would match against :s :p
> "foo"^^xsd:string as well as :s :p "foo", which -- for me -- is the goal
> of this issue.

This should not affect literal equality, which is really about things 
written equal, not about semantic equivalence. "foo"^^xs:string and 
"foo" are the same (same interpretation) under XSD entailment, but are 
not equal in terms of literal equality (they don't have the same datatype).

> (SPARQL defines matching based on subgraphs, which in terms is based on
> RDF graph equivalence.)
>
> I'm not an expert on the RDF standards documents, admittedly, so I might
> be missing something.
>
> thanks,
> Lee
>
> On 5/4/2011 6:04 AM, Antoine Zimmermann wrote:
>> Hi,
>>
>>
>> With respect to ISSUE-12, I propose that we reformulate the resolution
>> as follows:
>>
>> "PROPOSED: Recommend that data publishers use plain literals instead of
>> xs:string typed literals and tell systems to silently convert xs:string
>> literals to plain literals without language tag."
>>
>> In the text of the spec, we may want to add some more details, saying:
>>
>> "In XSD-interpretations, any xs:string-typed literal "aaa"^^xs:string is
>> interpreted as the character string "aaa", that is, it is the same as
>> the plain literal "aaa". Thus, to ensure a canonical form of character
>> strings and better interoperability, we recommend that data publishers
>> always use plain literals instead of xs:string typed literals and tell
>> systems to silently convert xs:string literals to plain literals without
>> language tag whenever they occur in an RDF graph."
>>
>>
>>
>> Regards,
>


-- 
Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
France
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex
France
antoine.zimmermann@insa-lyon.fr
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 4 May 2011 15:01:10 UTC