A review of rdf:text from a SPARQL perspective (1st draft)

(This is my first cut at a review to expose it to the WG as early as possible).

== Short version

* Impacts SPARQL because SPARQL queries the graph, not the serialized form.
* May be able to accommodate through modified BGP matching extensions but it is a change.
* Various functions (STR, DATATYPE, LANG) may change if rdf:text exposed.
* SPARQL Query Results XML Format needs further consideration.

== Background

(For the SPARQL-WG)

Publication:
http://www.w3.org/TR/rdf-text/

Editors' version:
http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec


References to locations in a document refer to the wiki version of 14/April which I understand to be the LC text.

This is a review of specification.  There have been discussions about intent but this is about the LC text.

rdf:text is a REC track document from RIF and OWL2.  It aims to give a uniform way to handling plain RDF literals (with and without language tags) as a value space which is comprised of the union of pairs 

  (lexical form)
  (lexical form, langTag) 

where langTag can't be the empty string.  Done this way, the value space of xsd:string is a subset of rdf:text.  The lexical space of xsd:string is not a subset of the lexical space of rdf:text.

To do this, the design is to introduce a datatype for plain literals, rdf:text, in the RDF namespace.  (An alternative would be to map from the existing serializations direct to the value space model.)

xsd:string "foo" is "foo@"^^rdf:text
Simple literal "foo" is "foo@"^^rdf:text
Plain literal with language tag "foo"@en is "foo@en"^^rdf:text


The LC document explicitly states that rdf:text must not appear in RDF graphs serializations.  (sec 1, end of para 2; last para of section 4) This helps compatibility and means that existing tools do not have be upgraded but does not address the case of SPARQL.

== Overview

There are some SPARQL-specific issues that arise that are not addressed in the document.  The rdf:text only refers to "graph exchange" when saying that rdf:text must not appear in RDF graphs serializations but that does not apply to SPARQL directly.

Because rdf:text document says nothing about SPARQL operations and it's not clear to me whether changes to existing SPARQL queries are being assumed.  At one time, they were.

Since SPARQL is defined over simple entailment, NOT datatype entailment, the notion of "semantic equivalence" (mentioned but not defined in the  rdf:text document) does not make sense and this spec appears to require changes to SPARQL behaviour. This would be undesirable.

1/ SPARQL Query Result XML Format
2/ Interactions with simple entailment matching of BGPs, 
   and extension of SPARQL via BGPs.
3/ Effects on DATATYPE, LANG and STR

Note: In RDF, a literal has either a language tag or a datatype but not both. rdf:text changes this assumption so deployed code or SPARQL implementations that rely on this invariant may break.

== SPARQL XML Results Format

This is not "graph exchange" so the prohibition use of rdf:text in a serialization does not apply.  It could be applied, but might not help systems that do want to see rdf:text literals, for example,  SPARQL/OWL2.

For compatibility, I suggest that rdf:text is changed to add a prohibition of the use of rdf:text in SPARQL Query Result XML Format (and the JSON form as well) because this is inline with the approach to graph exchange (which does apply to CONSTRUCT and DESCRIBE).  Any rdf:text processor is required to convert RDF style literals into the rdf:text form.

== BGP matching

SPARQL is not acting on the serialization of an RDF graph.  It acts on the value space with respect to the entailment regime being used for BGP matching.  SPARQL is strictly defined only for simple entailment.

rdf:text talks about "semantic equivalence".  This term is not defined; it seems to imply any of the RDF-MT entailments apply specially xsd1a and xsd1b, which are the rules for plain literals without language tag being the same value as XSD strings.  So these are not required of a SPARQL processor using simple entailment.

What is the binding of a variable when the value is some literal of datatype rdf:text?  The examples below for functions show where differences may appear.  STR() is the most complex case and represents a tension between rdf:text and RDF over lexical forms.

What happens with extensions that might support rdf:text directly is unclear.

A particular consequence is that SPARQL and (some future) SPARQL/OWL will either different answers or the SPARQL answers will differ between SPARQL/2008 and any revision to incorporate rdf:text.  See the examples of built-in functions below.

Fix: through the BGP extension mechanism, note that a BGP extension may expose rdf:text.  

Caveats: 
A/ Changes the result of functions as below
B/ Potential interoperability issues due to SPARQL Query Results XML Format
C/ Federated query (potential for mixed generation of systems answering one query) makes it complicated.
D/ SPARQL and SPAQRL/OWL2 may end up with different literals in the answers on same data for the same query.

== Effects on DATATYPE, LANG and STR

Noting that this SPARQL-WG should maintain compatibility with SPARQL as published Jan 2008.

(For these example I have written out rdf:text as a serialized form although in an RDF graph it exists as a value and when the graph is serialised rdf:text does not appear.  But SPARQL is not defined to work on the serialization of a graph.)

rdf:text does define some functions on rdf:text.

DATATYPE, LANG and STR are the literal accessors in SPARQL, accessing the three parts of an RDF literal.

DATATYPE is defined so that the type of a plain literal without language tag is xsd string.  There is no datatype for a literal with language.

SPARQL has the concept of a "simple literal" for a plain literal without language tag.

http://www.w3.org/TR/rdf-sparql-query/#func-datatype


== DATATYPE of a literal with language tag

SPARQL/2008:
  DATATYPE ("Padre de familia"@es) ==> error

Is this true or assumed when rdf:text is active?
  DATATYPE("Padre de familia@es"^^rdf:text) ==> rdf:text
  DATATYPE("Padre de familia"@es) ==> rdf:text

== DATATYPE of a literal without language tag

SPARQL/2008 defines:
  DATATYPE ("Padre de familia") ==> xs:string

I don't know what rdf:text says here: two possibilities:

  DATATYPE ("Padre de familia") ==> rdf:text ?? xs:string ??

because one value space is a subset of the other.

The reason for rdf:text is the uniform treatment of literals so the query to find all the untyped literals ("untyped" meaning as per the current SPARQL REC - without type - simple literal or literal with language tag) might be changed.

== LANG

In RDF, a literal has either a language tag or a datatype but not both.  So:

SPARQL/2008:
  Lang("Padre de familia"@es) ==> "es"
but
  Lang("Padre de familia@es"^^rdf:text) ==> ""

rdf:text:
  Lang("Padre de familia@es"^^rdf:text) ==> ??

c.f.
rtfn:lang-from-text(Padre de familia@es"^^rdf:text) ==> "es"

== STR

rdf:text is a datatype with lexical space including the language tag 

SPARQL/2008 defines:
  STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia@es"
  STR("Padre de familia"@es) ==> "Padre de familia"

rdf:text:
  STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia" ??

because STR returns the lexical form.

The lexical space of literals with language tags is changed by rdf:text.


----

See also:

http://lists.w3.org/Archives/Public/public-rdf-text/2008OctDec/0036.html


but changes to be SPARQL-compatible have not been made.

 Andy

--------------------------------------------
  Hewlett-Packard Limited
  Registered Office: Cain Road, Bracknell, Berks RG12 1HN
  Registered No: 690597 England

Received on Wednesday, 22 April 2009 16:22:04 UTC