Re: ISSUE-12: xs:string VS plain literals: proposed resolution

I am confused. There seems to now be a consensus view that plain, untyped literals are a Good Thing, to be preferred to clunky typed literals.  But the last time I encountered this whole issue of plain literals in RDF, there was a very strong consensus that plainness was a problem, and everything would be better if - in fact, for some, life would be possible only if - all literals had a type. Which is why the rdf:PlainLiteral type was invented, to be the type of these anomalous entities that had no type, in order that every literal would have a type.

So, can anyone enlighten me? Are typed literals good or bad? Is plainness beautiful, or a dire problem? And are there any actual arguments either way, or is this all based on intuition and aesthetics?

Pat



On May 4, 2011, at 1:29 PM, Eric Prud'hommeaux wrote:

> * Alex Hall <alexhall@revelytix.com> [2011-05-04 14:08-0400]
>> On Wed, May 4, 2011 at 1:36 PM, Lee Feigenbaum <lee@thefigtrees.net> wrote:
>> 
>>> On 5/4/2011 1:17 PM, Pat Hayes wrote:
>>> 
>>>> 
>>>> On May 4, 2011, at 9:08 AM, Lee Feigenbaum wrote:
>>>> 
>>>> I'd like to understand if the proposed resolution of this issue is
>>>>> ("merely") a recommendation, or is a change to RDF syntactic equality. In
>>>>> particular, will we be changing
>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality such that
>>>>> "foo" and "foo"^^xsd:string are equal literals?
>>>>> 
>>>>> Looking at this through SPARQL's eyes (as I am wont to do), one of the
>>>>> goals of this change is so that I can write:
>>>>> 
>>>>> SELECT ... { ?s :p "foo" }
>>>>> 
>>>>> and have that match whether the data that was loaded into the store was
>>>>> "foo" or "foo"^^xsd:string.
>>>>> 
>>>>> Recommending that stores canonicalize to "foo" would be one way to
>>>>> accomplish this, but only for new data. (And even then, is only a
>>>>> recommendation.) If we changed (or made a SHOULD-style change) literal
>>>>> equality, then the above query would match against :s :p "foo"^^xsd:string
>>>>> as well as :s :p "foo", which -- for me -- is the goal of this issue.
>>>>> 
>>>> 
>>>> Well, have SPARQL decide that the appropriate entailment is
>>>> {xsd:string}-entailment (that is, D-entailment where D={xsd:string}), and
>>>> that fixes the necessary matching. Seems to me that this is not RDF
>>>> business, in fact. RDF already provides the machinery for doing this, all
>>>> SPARQL has to do is use the existing RDF specs appropriately.
>>>> 
>>> 
>>> Then maybe I don't understand the original motivation behind ISSUE-12 in
>>> this working group at all.
>>> 
>>> *shrug*
>>> 
>>> 
>>> From what I can tell based on looking at the charter, the original
>> motivation was exactly what you stated: to make querying for string data
>> simpler in SPARQL.
>> 
>> Unfortunately, the only ways I can see of making that work transparently in
>> SPARQL are:
>> 1. Follow Pat's suggestion and define SPARQL BGP matching in terms of
>> {xsd:string}-entailment.
>> 2. Modify the abstract syntax specified in RDF Concepts so that there's only
>> one way of expressing string data in an RDF literal, which seems to be what
>> you're asking for.
> 
> 3. Add a little text saying that plain literals are preferred to
> literals of type xsd:string.
> 
> The RDB2RDF WG faced this in defining the Direct Mapping of relational
> databases to RDF. The ISO SQL committee provides a mapping of SQL
> types to XSD types, and naturally SQL's string types (STRING, CHAR(n),
> VARCHAR(n)) map to xsd:string. Because we didn't want to needlessly
> encumber users with a typed literal when a plain literal would do, we
> overrode the mapping for strings (ints, etc. still map per ISO). A
> little guidance text could encourage others to do the same and
> unification will get that much easier.
> 
> 
>> I'm not fundamentally opposed to either of those approaches, but they both
>> would require significant changes to deployed code.  Given a choice, I would
>> go with the second one because I don't think the problem is confined to
>> SPARQL.  I personally think that making a breaking change to the abstract
>> syntax would be worthwhile in this case because string data is so pervasive,
>> but I wouldn't be surprised if there's backlash from the community over
>> that.
>> 
>> The proposed resolution for ISSUE-12 appears to me to be avoiding making any
>> breaking changes by recommending that data producers prefer one form
>> syntactic form over another.  I share your skepticism over how well that
>> will work in the long run.
>> 
>> -Alex
>> 
>> 
>> 
>>> Lee
>>> 
>>> 
>>> 
>>>> Pat
>>>> 
>>>> 
>>>>> (SPARQL defines matching based on subgraphs, which in terms is based on
>>>>> RDF graph equivalence.)
>>>>> 
>>>>> I'm not an expert on the RDF standards documents, admittedly, so I might
>>>>> be missing something.
>>>>> 
>>>>> thanks,
>>>>> Lee
>>>>> 
>>>>> On 5/4/2011 6:04 AM, Antoine Zimmermann wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> With respect to ISSUE-12, I propose that we reformulate the resolution
>>>>>> as follows:
>>>>>> 
>>>>>> "PROPOSED: Recommend that data publishers use plain literals instead of
>>>>>> xs:string typed literals and tell systems to silently convert xs:string
>>>>>> literals to plain literals without language tag."
>>>>>> 
>>>>>> In the text of the spec, we may want to add some more details, saying:
>>>>>> 
>>>>>> "In XSD-interpretations, any xs:string-typed literal "aaa"^^xs:string is
>>>>>> interpreted as the character string "aaa", that is, it is the same as
>>>>>> the plain literal "aaa". Thus, to ensure a canonical form of character
>>>>>> strings and better interoperability, we recommend that data publishers
>>>>>> always use plain literals instead of xs:string typed literals and tell
>>>>>> systems to silently convert xs:string literals to plain literals without
>>>>>> language tag whenever they occur in an RDF graph."
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> ------------------------------------------------------------
>>>> IHMC                                     (850)434 8903 or (650)494 3973
>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>> Pensacola                            (850)202 4440   fax
>>>> FL 32502                              (850)291 0667   mobile
>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
> 
> -- 
> -ericP
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Wednesday, 4 May 2011 19:13:40 UTC