Re: ISSUE-12: xs:string VS plain literals: proposed resolution from Eric Prud'hommeaux on 2011-05-04 (public-rdf-wg@w3.org from May 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 4 May 2011 14:29:46 -0400
To: Alex Hall <alexhall@revelytix.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, Pat Hayes <phayes@ihmc.us>, Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>, public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <20110504182943.GB31645@w3.org>
* Alex Hall <alexhall@revelytix.com> [2011-05-04 14:08-0400]
> On Wed, May 4, 2011 at 1:36 PM, Lee Feigenbaum <lee@thefigtrees.net> wrote:
> 
> > On 5/4/2011 1:17 PM, Pat Hayes wrote:
> >
> >>
> >> On May 4, 2011, at 9:08 AM, Lee Feigenbaum wrote:
> >>
> >>  I'd like to understand if the proposed resolution of this issue is
> >>> ("merely") a recommendation, or is a change to RDF syntactic equality. In
> >>> particular, will we be changing
> >>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality such that
> >>> "foo" and "foo"^^xsd:string are equal literals?
> >>>
> >>> Looking at this through SPARQL's eyes (as I am wont to do), one of the
> >>> goals of this change is so that I can write:
> >>>
> >>> SELECT ... { ?s :p "foo" }
> >>>
> >>> and have that match whether the data that was loaded into the store was
> >>> "foo" or "foo"^^xsd:string.
> >>>
> >>> Recommending that stores canonicalize to "foo" would be one way to
> >>> accomplish this, but only for new data. (And even then, is only a
> >>> recommendation.) If we changed (or made a SHOULD-style change) literal
> >>> equality, then the above query would match against :s :p "foo"^^xsd:string
> >>> as well as :s :p "foo", which -- for me -- is the goal of this issue.
> >>>
> >>
> >> Well, have SPARQL decide that the appropriate entailment is
> >> {xsd:string}-entailment (that is, D-entailment where D={xsd:string}), and
> >> that fixes the necessary matching. Seems to me that this is not RDF
> >> business, in fact. RDF already provides the machinery for doing this, all
> >> SPARQL has to do is use the existing RDF specs appropriately.
> >>
> >
> > Then maybe I don't understand the original motivation behind ISSUE-12 in
> > this working group at all.
> >
> > *shrug*
> >
> >
> >From what I can tell based on looking at the charter, the original
> motivation was exactly what you stated: to make querying for string data
> simpler in SPARQL.
> 
> Unfortunately, the only ways I can see of making that work transparently in
> SPARQL are:
> 1. Follow Pat's suggestion and define SPARQL BGP matching in terms of
> {xsd:string}-entailment.
> 2. Modify the abstract syntax specified in RDF Concepts so that there's only
> one way of expressing string data in an RDF literal, which seems to be what
> you're asking for.

3. Add a little text saying that plain literals are preferred to
literals of type xsd:string.

The RDB2RDF WG faced this in defining the Direct Mapping of relational
databases to RDF. The ISO SQL committee provides a mapping of SQL
types to XSD types, and naturally SQL's string types (STRING, CHAR(n),
VARCHAR(n)) map to xsd:string. Because we didn't want to needlessly
encumber users with a typed literal when a plain literal would do, we
overrode the mapping for strings (ints, etc. still map per ISO). A
little guidance text could encourage others to do the same and
unification will get that much easier.


> I'm not fundamentally opposed to either of those approaches, but they both
> would require significant changes to deployed code.  Given a choice, I would
> go with the second one because I don't think the problem is confined to
> SPARQL.  I personally think that making a breaking change to the abstract
> syntax would be worthwhile in this case because string data is so pervasive,
> but I wouldn't be surprised if there's backlash from the community over
> that.
> 
> The proposed resolution for ISSUE-12 appears to me to be avoiding making any
> breaking changes by recommending that data producers prefer one form
> syntactic form over another.  I share your skepticism over how well that
> will work in the long run.
> 
> -Alex
> 
> 
> 
> > Lee
> >
> >
> >
> >> Pat
> >>
> >>
> >>> (SPARQL defines matching based on subgraphs, which in terms is based on
> >>> RDF graph equivalence.)
> >>>
> >>> I'm not an expert on the RDF standards documents, admittedly, so I might
> >>> be missing something.
> >>>
> >>> thanks,
> >>> Lee
> >>>
> >>> On 5/4/2011 6:04 AM, Antoine Zimmermann wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> With respect to ISSUE-12, I propose that we reformulate the resolution
> >>>> as follows:
> >>>>
> >>>> "PROPOSED: Recommend that data publishers use plain literals instead of
> >>>> xs:string typed literals and tell systems to silently convert xs:string
> >>>> literals to plain literals without language tag."
> >>>>
> >>>> In the text of the spec, we may want to add some more details, saying:
> >>>>
> >>>> "In XSD-interpretations, any xs:string-typed literal "aaa"^^xs:string is
> >>>> interpreted as the character string "aaa", that is, it is the same as
> >>>> the plain literal "aaa". Thus, to ensure a canonical form of character
> >>>> strings and better interoperability, we recommend that data publishers
> >>>> always use plain literals instead of xs:string typed literals and tell
> >>>> systems to silently convert xs:string literals to plain literals without
> >>>> language tag whenever they occur in an RDF graph."
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>
> >>>
> >>>
> >> ------------------------------------------------------------
> >> IHMC                                     (850)434 8903 or (650)494 3973
> >> 40 South Alcaniz St.           (850)202 4416   office
> >> Pensacola                            (850)202 4440   fax
> >> FL 32502                              (850)291 0667   mobile
> >> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >

-- 
-ericP
Received on Wednesday, 4 May 2011 18:30:19 UTC