SPARQL WG comments on rdf:text

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Wed, 06 May 2009 16:35:22 -0400
Message-ID: <4A01F48A.5040104@thefigtrees.net>
To: public-owl-comments@w3.org
CC: SPARQL Working Group <public-rdf-dawg@w3.org>, Eric Prud'hommeaux <eric@w3.org>
Hello OWL and RIF working groups,

The SPARQL WG has reviewed the rdf:text Last Call document on our 
mailing list[1], in a teleconference [2], and today at our face-to-face 
meeting [3].

The group resolved to send the following comments. At this time, we do 
not have proposed spec text to resolve these comments, but would be glad 
to consult on possibilities.

The comment is at 
and is reproduced here for your convenience.


SPARQL queries act on the graph, not on the serialized form. Thus, we 
suggest to the editors state the interactions with SPARQL in respect to:

    1. the restriction to rdf:text not appearing in RDF graphs should be 
extended such that rdf:text MUST NOT appear in SPARQL XML results. This 
extends the existing coverage of RDF graph exchange to include SPARQL 
results from SELECT, in the same way that CONSTRUCT and DESCRIBE queries 
are already covered.
    2. the use of "semantic equivalence" shall be clarified and it 
should be noted that rdf:text is a D-entailment and is accessed by 
SPARQL via a BGP entailment regime extension.
    3. that functions STR/DATATYPE/LANG act on the lexical 
representations and will be affected depending on the way an rdf:text 
aware entailment regime manifests it's results.

In addition it should be noted that rdf:text relates to the assumption 
in RDF that a literal has a datatype or a language tag but not both. 
Existing, deployed code relies on this invariant.
[edit] Overview

There are some SPARQL-specific issues that arise that are not addressed 
in the document. The rdf:text only refers to "graph exchange" when 
saying that rdf:text must not appear in RDF graphs serializations but 
that does not apply to SPARQL directly.

Because rdf:text document says nothing about SPARQL operations and it's 
not clear to me whether changes to existing SPARQL queries are being 
assumed. At one time, they were.

Since SPARQL is defined over simple entailment, NOT datatype entailment, 
the notion of "semantic equivalence" (mentioned but not defined in the 
rdf:text document) does not make sense and this spec appears to require 
changes to SPARQL behaviour. This would be undesirable since it affects:

1. SPARQL Query Result XML Format

2. Interactions with simple entailment matching of BGPs, and extension 
of SPARQL via BGPs.

3. Effects on DATATYPE, LANG and STR

Note: In RDF, a literal has either a language tag or a datatype but not 
both. rdf:text changes this assumption so deployed code or SPARQL 
implementations that rely on this invariant may break.

We believe that these concerns can be remedied, if rdf:text talks about 
D-entailment specifically, instead of "semantic equivalence" (and thus 
not affecting simple entailment as well) in general.
[edit] SPARQL XML Results Format

This is not "graph exchange" so the prohibition use of rdf:text in a 
serialization does not apply. It could be applied, but might not help 
systems that do want to see rdf:text literals, for example, SPARQL/OWL2.

The problem here, again, is that the semantic implications of rdf:text 
are not forward-compatible with existing RDF. This concern would be 
remedied by defining the semantic implications of rdf:text in terms of 
D-entailment only, as suggested above. In fact, we think that this fix 
makes the restrictions of the usage of rdf:text in RDF graphs redundant.
[edit] Datatype Property

What happens if a datatype property is restricted to a rdf:text? What 
does the RDF serialization look like? Does it include rdf:text?
[edit] BGP matching

The SPARQL standard defines SPARQL with respect to simple entailment and 
provides a mechanism for extension to other entailment regimes. See the 
section "12.6 Extending SPARQL Basic Graph Matching".

Since SPARQL is defined over simple entailment, NOT datatype entailment, 
the notion of "semantic equivalence" (mentioned but not defined in the 
rdf:text document) does not make sense. SPARQL is not acting on the 
serialization of an RDF graph. It acts on the value space of literals.

Simple entailment does not cover the RDF-MT entailments xsd1a and xsd1b, 
which are the rules for plain literals without language tag being the 
same value as XSD strings. So these are not required of a SPARQL 
processor using simple entailment.

Additional semantic equivalences implied by rdf:text should only affect 
D-entailment (where rdf:text is part of the datatype map D following 
[1]) but not simple entailment. Thus, the document should not talk about 
"semantic equivalence" in general terms but just in terms of 
D-entailment. This should fix the main problem raised and would only 
affect SPARQL engines that follow a (yet to be defined).

We suggest that it is explicitly noted that access to rdf:text aware 
entailment regimes by a SPARQL query is via the extension mechanism.
[edit] Effects on DATATYPE, LANG and STR

Noting that this SPARQL-WG should maintain compatibility with SPARQL as 
published Jan 2008.

These functions are accessors to the components of a literal term. 
Different ways of manifesting a value from BGP matching will lead to 
different resutlts from these functions.

For these example, the serialized form using rdf:text is used although 
in an RDF graph it exists as a value and when the graph is serialised 
rdf:text does not appear. The examples relate to a variable bound to 
such a value and how the literal accessor function (DATATYPE, LANG and 
STR) of SPARQL can be impacted.

rdf:text does define some functions on rdf:text.

DATATYPE is defined so that the type of a plain literal without language 
tag is xsd string. There is no datatype for a literal with language.

SPARQL has the concept of a "simple literal" for a plain literal without 
language tag.

These functions are applied as part of the algebra, not as part of BGP 
matching - the entailment extension mechanism does not modify these 
functions. There may be different entailment regimes, maybe on different 
graphs, in the same query.

DATATYPE of a literal with language tag


  DATATYPE ("Padre de familia"@es) ==> error

When a literal is bound to a variable and subsequently used in a call to 
DATATYPE, what return value is expected? Is it true that if instead it 
is presented as below, a different result is obtained?

  DATATYPE("Padre de familia@es"^^rdf:text) ==> rdf:text


SPARQL/2008 defines:

  DATATYPE ("Padre de familia") ==> xs:string

but what is:

  DATATYPE ("Padre de familia") ==> rdf:text ?? xs:string ??

because one value space is a subset of the other.

The reason for rdf:text is the uniform treatment of literals so the 
query to find all the untyped literals ("untyped" meaning as per the 
current SPARQL REC - without type - simple literal or literal with 
language tag) might be changed.
[edit] LANG

In RDF, a literal has either a language tag or a datatype but not both. So:


  Lang("Padre de familia"@es) ==> "es"


  Lang("Padre de familia@es"^^rdf:text) ==> ""


  Lang("Padre de familia@es"^^rdf:text) ==> ??

c.f. rtfn:lang-from-text(Padre de familia@es"^^rdf:text) ==> "es"
[edit] STR

rdf:text is a datatype with lexical space including the language tag

SPARQL/2008 defines:

  STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia@es"
  STR("Padre de familia"@es) ==> "Padre de familia"


  STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia" ??

because STR returns the lexical form.

The lexical space of literals with language tags is changed by rdf:text.
[edit] FILTERs

SPARQL FILTERs evaluate to an effective boolean value (defined in XQuery 
"2.4.3 Effective Boolean Value" and referenced by SPARQL "11.2.2 
Effective Boolean Value (EBV)".

The EBV of a string is false if the string is of length zero else true.

Do any rdf:text literals have an EBV of false?

[edit] Intra-spec Compatibility
[edit] IRIs vs. URIs

"This specification uses Uniform Resource Identifiers (URIs) for naming 
datatypes and their components" indicates that language tags in RDF are 
URIs, where SPARQL Query interpreted them as IRIs. Using URIs would 
imply that

<X> <p> 
<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85&channel=R%26D> .

would be matched by the SPARQL graph pattern

<X> <p> <http://伝言.example/?user=أكرم&channel=R&D> .

[edit] References

1. http://www.w3.org/TR/rdf-mt/#dtype_interp

2. http://www.w3.org/TR/rdf-sparql-query/#sparqlBGPExtend

3. http://lists.w3.org/Archives/Public/public-rdf-text/2008OctDec/0036.html

on behalf of the SPARQL WG

[1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2009AprJun/0107.html
[2] http://www.w3.org/2009/sparql/meeting/2009-04-28#rdf__3a_text
[3] raw IRC log: http://www.w3.org/2009/05/06-sparql-irc
