Re: Proposal for ISSUE-12, string literals from Pat Hayes on 2011-05-14 (public-rdf-wg@w3.org from May 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 13 May 2011 21:08:17 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-Id: <63CAD862-1BB6-4918-AC77-D421965A33C8@ihmc.us>
On May 13, 2011, at 11:00 AM, Andy Seaborne wrote:

> 
> 
> On 13/05/11 16:43, Lee Feigenbaum wrote:
>> On 5/13/2011 11:00 AM, Richard Cyganiak wrote:
>>> This feels weird. Ok, "foo" is of type string, even though the type is
>>> implicit, I can understand that. But why is it no longer a string if I
>>> tag it as English? Shouldn't it still have an implicit type of string?
>>> So you have replaced one weird thing (multiple ways of representing a
>>> string) with another weird thing (a notion of string datatypes that
>>> doesn't make sense).
>>> 
>>> I think the sensible way would be:
>>> 1) every literal has *both* a datatype and a (possibly empty) language
>>> tag;
>>> 2) of the built-in datatypes, only xsd:string can have non-empty
>>> language tags;
>>> 3) plain literals and rdf:PlainLiterals don't exist;
>>> 4) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
>>> 5) "foo"@en in concrete syntaxes is syntactic sugar for
>>> "foo"^^xsd:string@en.
>> 
>> I would love this, if it were workable. I just didn't think that that
>> sort of change to the model was feasible to warrant consideration.
>> 
>> Lee
> 
> Agreed.
> 
> It would be good to understand how this all came about.  There may be something in the reasoning last time (or the time before that) that still needs to be factored in.

There were several last times. No doubt Dan Brinkley, the RDF historian, can document all this better, but my recollection is as follows. 

The last RDF WG inherited plain literals and language tags from the very first ad-hoc RDF group. They seemed fairly harmless at the time. Still, there was soon some feeling that they were a structural anomaly and rather a nuisance. The WG gave some thought to remove the tags altogether, and possibly encode the language information in a separate triple. Intense pressure from what was affectionately called 'i18n' ensured that we could not simply get rid of lang tags. The decision to prohibit literals in subject position (under strong pressure from the members responsible for RDFXML syntax) made the 'separate triples' ideas impractical, and they were abandoned. As with many other aspects of RDF, the WG was eventually forced back into adopting the first, simple, design, in spite of the awkwardness of having tags as a syntactically separate item. (In retrospect, I think if we had forseen the problems this would produce, we would have come up with something like the rdf:PlainLiteral trick to embed the lang tag into the string.) 

Datatypes came later in the WG activity, and were a huge can of worms. The WG took longer to decide how to write a number than it takes to make a baby. Many of the issues had to do with the fact that datayping in many systems is seen as a way to catch errors, which is not a natural way to think about RDF datatyping. Partly as a result of this extended debate, *many* different designs were contemplated for RDF datatyping. LIke, maybe 20 or more. There were issues of how datayping interacts with RDFS class reasoning. (For example, if the range of a property P is defined to be a datatype class, should this mean that this datatype is automatically applied to values of P, so that  if :P rdfs:range xsd:integer . then :x :P "234" . is taken to mean :x :P "234"^^xsd:integer ?) None of this proposed machinery was finally adopted, much to the disappointment of some WG members. But after all this debate, the WG was kind of burned out about datatyping, to be honest. And it was obvious that various different datatypes would support all kinds of special entailments unique to them, many of them not at all obvious, especially when one factors OWL expressivity into the mix. (For example, xsd:boolean can have only two values, so if you know that you have a class of such values and there seem to be three items in the class, then you know two of them must be the same. And you can express all this in OWL.) So, the fact that xsd:string and plain literals were equal just seemed like a kind of minor matter, almost a footnote. Such equivalences were coming out of the woodwork at the time, and it was assumed that all such equivalences would be handled by inferences somehow. We did put in a remark to the effect that normalization might be a good idea when it can be done; but at that time, there was a general feeling that such matters were best left to implementors of inference engines. And we all wanted to go home and spend time with out families. So we left it at that. 

Later, the idea that plain literals had no type was seen by many as a problem, and the rdf:PlainLIteral device was invented as a kind of after-market semantic plug-in to rectify this. Unfortunately, it was obliged to be written so as to conform to the existing specs, which required that any RDF datatype be a mapping from strings to values. And tagged literals aren't strings, so they had to be retrospectively made into strings, hence the trailing @ business. 

Pat

> 	Andy
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 14 May 2011 02:09:20 UTC