W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2011

Re: ISSUE-12: xs:string VS plain literals: proposed resolution

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 4 May 2011 16:01:26 -0400
To: Lee Feigenbaum <lee@thefigtrees.net>
Cc: Alex Hall <alexhall@revelytix.com>, Pat Hayes <phayes@ihmc.us>, Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>, public-rdf-wg <public-rdf-wg@w3.org>
Message-ID: <20110504200125.GC31645@w3.org>
* Lee Feigenbaum <lee@thefigtrees.net> [2011-05-04 14:43-0400]
> On 5/4/2011 2:29 PM, Eric Prud'hommeaux wrote:
> >* Alex Hall<alexhall@revelytix.com>  [2011-05-04 14:08-0400]
> >>On Wed, May 4, 2011 at 1:36 PM, Lee Feigenbaum<lee@thefigtrees.net>  wrote:
> >>
> >>>On 5/4/2011 1:17 PM, Pat Hayes wrote:
> >>>
> >>>>
> >>>>On May 4, 2011, at 9:08 AM, Lee Feigenbaum wrote:
> >>>>
> >>>>  I'd like to understand if the proposed resolution of this issue is
> >>>>>("merely") a recommendation, or is a change to RDF syntactic equality. In
> >>>>>particular, will we be changing
> >>>>>http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality such that
> >>>>>"foo" and "foo"^^xsd:string are equal literals?
> >>>>>
> >>>>>Looking at this through SPARQL's eyes (as I am wont to do), one of the
> >>>>>goals of this change is so that I can write:
> >>>>>
> >>>>>SELECT ... { ?s :p "foo" }
> >>>>>
> >>>>>and have that match whether the data that was loaded into the store was
> >>>>>"foo" or "foo"^^xsd:string.
> >>>>>
> >>>>>Recommending that stores canonicalize to "foo" would be one way to
> >>>>>accomplish this, but only for new data. (And even then, is only a
> >>>>>recommendation.) If we changed (or made a SHOULD-style change) literal
> >>>>>equality, then the above query would match against :s :p "foo"^^xsd:string
> >>>>>as well as :s :p "foo", which -- for me -- is the goal of this issue.
> >>>>>
> >>>>
> >>>>Well, have SPARQL decide that the appropriate entailment is
> >>>>{xsd:string}-entailment (that is, D-entailment where D={xsd:string}), and
> >>>>that fixes the necessary matching. Seems to me that this is not RDF
> >>>>business, in fact. RDF already provides the machinery for doing this, all
> >>>>SPARQL has to do is use the existing RDF specs appropriately.
> >>>>
> >>>
> >>>Then maybe I don't understand the original motivation behind ISSUE-12 in
> >>>this working group at all.
> >>>
> >>>*shrug*
> >>>
> >>>
> >>> From what I can tell based on looking at the charter, the original
> >>motivation was exactly what you stated: to make querying for string data
> >>simpler in SPARQL.
> >>
> >>Unfortunately, the only ways I can see of making that work transparently in
> >>SPARQL are:
> >>1. Follow Pat's suggestion and define SPARQL BGP matching in terms of
> >>{xsd:string}-entailment.
> >>2. Modify the abstract syntax specified in RDF Concepts so that there's only
> >>one way of expressing string data in an RDF literal, which seems to be what
> >>you're asking for.
> >
> >3. Add a little text saying that plain literals are preferred to
> >literals of type xsd:string.
> >
> >The RDB2RDF WG faced this in defining the Direct Mapping of relational
> >databases to RDF. The ISO SQL committee provides a mapping of SQL
> >types to XSD types, and naturally SQL's string types (STRING, CHAR(n),
> >VARCHAR(n)) map to xsd:string. Because we didn't want to needlessly
> >encumber users with a typed literal when a plain literal would do, we
> >overrode the mapping for strings (ints, etc. still map per ISO). A
> >little guidance text could encourage others to do the same and
> >unification will get that much easier.
> 
> This isn't a new suggestion; this is apparently what this WG is
> already doing. It's also what I (and Alex) are saying seems like not
> very effective. And what I'm saying is potentially not worth the
> time.

I'm not sure how you're measuring effectiveness, but if gentle
steering is insufficient, you apparently want something mandatory. I
read all entailments as optional so I guess you want to say something
like:

[[
  A literal in an RDF graph contains one or two named components.
  
  All literals have a lexical form being a Unicode [UNICODE] string,
  which SHOULD be in Normal Form C [NFC].
  
  Plain literals have a lexical form and optionally a language tag as
  defined by [RFC-3066], normalized to lowercase.
  
  Typed literals have a lexical form and a datatype URI being an RDF
  URI reference.
  
+ Note: There are no typed literals with the datatype
  <http://www.w3.org/2001/XMLSchema#string>; any strings should be
  represented as plain literals.
]] — http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-literal

I think that if we allow both forms to co-exist but be considered
equivalent, implementations are going to have a hard time not
surprising users by swapping representations.


> Lee
> 
> >
> >>I'm not fundamentally opposed to either of those approaches, but they both
> >>would require significant changes to deployed code.  Given a choice, I would
> >>go with the second one because I don't think the problem is confined to
> >>SPARQL.  I personally think that making a breaking change to the abstract
> >>syntax would be worthwhile in this case because string data is so pervasive,
> >>but I wouldn't be surprised if there's backlash from the community over
> >>that.
> >>
> >>The proposed resolution for ISSUE-12 appears to me to be avoiding making any
> >>breaking changes by recommending that data producers prefer one form
> >>syntactic form over another.  I share your skepticism over how well that
> >>will work in the long run.
> >>
> >>-Alex
> >>
> >>
> >>
> >>>Lee
> >>>
> >>>
> >>>
> >>>>Pat
> >>>>
> >>>>
> >>>>>(SPARQL defines matching based on subgraphs, which in terms is based on
> >>>>>RDF graph equivalence.)
> >>>>>
> >>>>>I'm not an expert on the RDF standards documents, admittedly, so I might
> >>>>>be missing something.
> >>>>>
> >>>>>thanks,
> >>>>>Lee
> >>>>>
> >>>>>On 5/4/2011 6:04 AM, Antoine Zimmermann wrote:
> >>>>>
> >>>>>>Hi,
> >>>>>>
> >>>>>>
> >>>>>>With respect to ISSUE-12, I propose that we reformulate the resolution
> >>>>>>as follows:
> >>>>>>
> >>>>>>"PROPOSED: Recommend that data publishers use plain literals instead of
> >>>>>>xs:string typed literals and tell systems to silently convert xs:string
> >>>>>>literals to plain literals without language tag."
> >>>>>>
> >>>>>>In the text of the spec, we may want to add some more details, saying:
> >>>>>>
> >>>>>>"In XSD-interpretations, any xs:string-typed literal "aaa"^^xs:string is
> >>>>>>interpreted as the character string "aaa", that is, it is the same as
> >>>>>>the plain literal "aaa". Thus, to ensure a canonical form of character
> >>>>>>strings and better interoperability, we recommend that data publishers
> >>>>>>always use plain literals instead of xs:string typed literals and tell
> >>>>>>systems to silently convert xs:string literals to plain literals without
> >>>>>>language tag whenever they occur in an RDF graph."
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Regards,
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>------------------------------------------------------------
> >>>>IHMC                                     (850)434 8903 or (650)494 3973
> >>>>40 South Alcaniz St.           (850)202 4416   office
> >>>>Pensacola                            (850)202 4440   fax
> >>>>FL 32502                              (850)291 0667   mobile
> >>>>phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >

-- 
-ericP
Received on Wednesday, 4 May 2011 20:01:56 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:42 GMT