- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Mon, 20 Jun 2011 13:28:17 +0100
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
Summary: A design to use RDF is easy - it's the backward compatiblity that is messy. 1/ If RDF-WG change the terminology, I think we're blocked from going to REC until they do. We may be blocked anyway to be sure the decision on abstract syntax remains as it is. 2/ Some functions need some work where simple literals are used for lang tags and lexical forms. On 20/06/11 05:50, Lee Feigenbaum wrote: > [all Tos/CCs removed except for SPARQL WG and David Wood] > > We need to discuss over email and tomorrow on our call how to approach > this. Here are some miscellaneous thoughts -- I haven't studied this > closely -- I guess that maybe Andy has thought about it significantly > more than I have. Not that much - some analysis below. From experience and hanlding Jena support questions: A1/ The simple literal/xsd:String confusion does come up in SPARQL sometimes but not often. It is explainable and users get the explanation. Users have also got used to adding str() occassionally, for URIs etc so it does not create surprises. (In ARQ, regex on xsd:Strings just works.) A2/ I do not see it occurring in practice in RDF. All the arguments seem to be an abstract concern driven by explaining RDF and by formal OWL/RIF work, not RDF data in practice. Either simple literals occur or xsd:Strings occur in data -- I can't remember seeing a mixture and anyone being caught out by it. A3/ In Jena, memory models store simple literals and xsd:strings separately but they compare equal. In persistent storage models, they don't. This is rarely a source of confusion because mixtures rarely occur in practice. > 0) As a WG, are we OK with this decision by the RDF WG? Do we have any > feedback to send to the other WG? If other people can corroborate, see (A2) > 1) We ought to do whatever we can to make "foo" and "foo"^^xs:string > parse as the same abstract thing (a typed literal with xs:string as its > type) in SPARQL queries. I see two general approaches here, depending on > _how_ the RDF WG implements their proposal. > > A) If they implement it by removing any mention of "plain literal" from > their spec, then we need to do the same. This would be a pretty big > change to SPARQL 1.1 Query, I think, and would also mean that SPARQL 1.1 > Query probably could not advance to Rec until the RDF WG documents do? Agreed - we need to do the same. It is a 2nd LC at least and SPARQL is blocked from proceeding to REC until it is certain that the RDF-WG will follow that decision, which in practice means when RDF proceeds to REC. "plain literal" means all literals without a datatype and includes language tags. http://www.w3.org/TR/rdf-concepts/#dfn-plain-literal """ Plain literals have a lexical form and optionally a language tag """ There is discussion involving removing the terminology "plain literal" from the abstract syntax but that is beyond what the RDF-WG has decided currently which only applies to what SPARQL calls simple literals. Even the wiki page has better language "Abolish plain literals without language tag from the abstract syntax". http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/AbolishUntaggedPlain I have suggested using the terminology "simple literals" in RDF, and RDF term which covers IRIs, bNodes and literals, but got no response from the editors. > B) If they implement the proposal by redefining "plain literal" such > that it means the same thing as "literal typed as xs:string", then we > may not need to make much change to SPARQL at all. In that case, SPARQL > 1.1 Query could proceed forward as normal, and once the new RDF > documents were Recs, SPARQL's meaning would shift along with RDF's > meaning. (This is my preferred method forward, but I don't know what the > RDF editors intend, nor what's possible.) Anywhere "simple literal" is used, then we need to check. W euse simple literal for lxicial forms and for the lang tag themselves. The literal-with-language thing needs settling. One proposal is that literal-with-language tag are of class, not datatype, rdf:LangTagString or some such name. That leads to a mess around DATATYPE because, while technically it need not change, it is going to be confusing if we split hairs on class vs datatypes for some set of literals. > 2) We need to do something about the SPARQL Results XML Format. > Specifically, we need to give guidance about how to serialize literals > with type xs:string, since that's what all plain literals will now be. > Perhaps the best path forward is to change this section: > > """ > The value of a query variable binding, which is an RDF Term, is included > as the content of the binding as follows: > > RDF URI Reference U > <binding><uri>U</uri></binding> > RDF Literal S > <binding><literal>S</literal></binding> > RDF Literal S with language L > <binding><literal xml:lang="L">S</literal></binding> > RDF Typed Literal S with datatype URI D > <binding><literal datatype="D">S</literal></binding> > Blank Node label I > <binding><bnode>I</bnode></binding> > """ > > in 2.3.1 (http://www.w3.org/TR/rdf-sparql-XMLres/#results) > > (That section is already a bit less rigorous then it could be, since it > refers to "RDF Literal S" rather than "RDF Plain Literal S without a > language".) > > My personal preference would be that the XML results format suggest that > implementations SHOULD serialize an RDF literal with type xs:string as: > > <binding><literal>S</literal></binding> > > ...excluding datatype="...string". > > I'd prefer that this be a SHOULD and not a MUST. I agree we need to do something and I agree with the suggestion. I am strongly in favour of SHOULD, not MUST, langauge. I see no reason why existing systems should be made non-comformant on a spec detail and also it recognises that there is a transition to be made and software is going to have to cope. > > We'll need a volunteer to make whatever change we decide here and to > help with the publication process. We'll publish this as both a FPWD and > LCWD in our next publication cycle. > > Lee Here is a run through of effects on the spec: I may well have missed something. It is leaving the possibility open for systems that need cope with the ways things are to day. Simply changing SPARQL to uniformly work on xsd:strings is, IMHO, unworkable. Application writers don't write their own SPARQL engines - deployed applications have to be checked and potential updated. Maybe a few chnages - but significant costs to requalify apps. B1/ Content checking Pass over the document to look for "plain literal" don't show anything too bad. B2/ Functions The functions are the most affected - operator dispatch table needs revising but that seems to have a natural update. Quite a few functions take arguments that are simple literals or xsd:strings - these just work. Things that return simple literals need more thought. Ones to pay attention to: B2.1/ Use of simple literals for the lexical form: STR, STRDT, STRLANG STRLANG is particular tricky here PlainLiteralWithLang STRLANG(lexicalForm, langTag) Its workable to make the args xsd:strings - it's explaining why STRLANG(literalWithDatatype, X) removes the datatype that seems messy. B2.2/ DATATYPE DATATYPE("foo"@en) = ????? B2.3/ LANG What kind of thing is the lang tag itself? Currently, a simple literal. B3/ BGP Matching This is mainly a matter of compatibility. The RDF abstract syntax will only have xsd:strings. All simple literals will have been "upgraded" to xsd:strings. Assuming { ?x :p "x"^^xsd:string } is legal (but the not-preferred form) then there needs to be words about matching data :x :p "x" where it used not to, and the other way round. And there is SPARQL Update. Just parsing "x" to "x"^^xsd:String is going to break when updating existing data with SPARQL 1.1 + xsd:String. At least some words about what this change means will be needed. DELETE DATA { ?x :p "x" } Andy
Received on Monday, 20 June 2011 12:28:51 UTC