Re: ISSUE-12: xs:string VS plain literals: proposed resolution from Alex Hall on 2011-05-04 (public-rdf-wg@w3.org from May 2011)

From: Alex Hall <alexhall@revelytix.com>
Date: Wed, 4 May 2011 14:08:14 -0400
To: Lee Feigenbaum <lee@thefigtrees.net>
Cc: Pat Hayes <phayes@ihmc.us>, Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>, public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <BANLkTimjeV_vbF7Xq2jziU-SjiWqRFdkYg@mail.gmail.com>
On Wed, May 4, 2011 at 1:36 PM, Lee Feigenbaum <lee@thefigtrees.net> wrote:

> On 5/4/2011 1:17 PM, Pat Hayes wrote:
>
>>
>> On May 4, 2011, at 9:08 AM, Lee Feigenbaum wrote:
>>
>>  I'd like to understand if the proposed resolution of this issue is
>>> ("merely") a recommendation, or is a change to RDF syntactic equality. In
>>> particular, will we be changing
>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality such that
>>> "foo" and "foo"^^xsd:string are equal literals?
>>>
>>> Looking at this through SPARQL's eyes (as I am wont to do), one of the
>>> goals of this change is so that I can write:
>>>
>>> SELECT ... { ?s :p "foo" }
>>>
>>> and have that match whether the data that was loaded into the store was
>>> "foo" or "foo"^^xsd:string.
>>>
>>> Recommending that stores canonicalize to "foo" would be one way to
>>> accomplish this, but only for new data. (And even then, is only a
>>> recommendation.) If we changed (or made a SHOULD-style change) literal
>>> equality, then the above query would match against :s :p "foo"^^xsd:string
>>> as well as :s :p "foo", which -- for me -- is the goal of this issue.
>>>
>>
>> Well, have SPARQL decide that the appropriate entailment is
>> {xsd:string}-entailment (that is, D-entailment where D={xsd:string}), and
>> that fixes the necessary matching. Seems to me that this is not RDF
>> business, in fact. RDF already provides the machinery for doing this, all
>> SPARQL has to do is use the existing RDF specs appropriately.
>>
>
> Then maybe I don't understand the original motivation behind ISSUE-12 in
> this working group at all.
>
> *shrug*
>
>
>From what I can tell based on looking at the charter, the original
motivation was exactly what you stated: to make querying for string data
simpler in SPARQL.

Unfortunately, the only ways I can see of making that work transparently in
SPARQL are:
1. Follow Pat's suggestion and define SPARQL BGP matching in terms of
{xsd:string}-entailment.
2. Modify the abstract syntax specified in RDF Concepts so that there's only
one way of expressing string data in an RDF literal, which seems to be what
you're asking for.

I'm not fundamentally opposed to either of those approaches, but they both
would require significant changes to deployed code.  Given a choice, I would
go with the second one because I don't think the problem is confined to
SPARQL.  I personally think that making a breaking change to the abstract
syntax would be worthwhile in this case because string data is so pervasive,
but I wouldn't be surprised if there's backlash from the community over
that.

The proposed resolution for ISSUE-12 appears to me to be avoiding making any
breaking changes by recommending that data producers prefer one form
syntactic form over another.  I share your skepticism over how well that
will work in the long run.

-Alex



> Lee
>
>
>
>> Pat
>>
>>
>>> (SPARQL defines matching based on subgraphs, which in terms is based on
>>> RDF graph equivalence.)
>>>
>>> I'm not an expert on the RDF standards documents, admittedly, so I might
>>> be missing something.
>>>
>>> thanks,
>>> Lee
>>>
>>> On 5/4/2011 6:04 AM, Antoine Zimmermann wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> With respect to ISSUE-12, I propose that we reformulate the resolution
>>>> as follows:
>>>>
>>>> "PROPOSED: Recommend that data publishers use plain literals instead of
>>>> xs:string typed literals and tell systems to silently convert xs:string
>>>> literals to plain literals without language tag."
>>>>
>>>> In the text of the spec, we may want to add some more details, saying:
>>>>
>>>> "In XSD-interpretations, any xs:string-typed literal "aaa"^^xs:string is
>>>> interpreted as the character string "aaa", that is, it is the same as
>>>> the plain literal "aaa". Thus, to ensure a canonical form of character
>>>> strings and better interoperability, we recommend that data publishers
>>>> always use plain literals instead of xs:string typed literals and tell
>>>> systems to silently convert xs:string literals to plain literals without
>>>> language tag whenever they occur in an RDF graph."
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>
>>>
>>>
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>>
>>
>
Received on Wednesday, 4 May 2011 18:08:43 UTC