A summary of the proposal for resolving the issues with rdf:text --> Could you please check it one more time?

Hello,

Andy has asked if I could summarize the proposal for resolving the issues
surrounding rdf:text, so here it is.

rdf:text (as opposed to rdfs:Literal) complies with the definitions of datatype
both the sense of XML schema and RDF. That is, it has a clearly defined
-  lexical space,
-  value space, and
-  a lexical-to-value mapping.

It seems that we can resolve most (all?) of the issues from the LC comment by
the SPARQL WG if we changed rdf:text to just a normal datatype whose value space
"coincitdentially" overlaps with the value space of xs:string, and whose typed
literals are equivalent to plain RDF literals under *D-entailment*. In this way,

1) we do not affect SPARQL implementations that rely on simple entailment, and
2) we do not affect SPARQL implementations that rely on D-entailment but that do
not know of rdf:text (i.e., that do not have rdf:text in their datatype map).

Several observations are important here:

* It is true that the lexical forms of rdf:text and xs:string (partially)
overlap, and that the same lexical form is typically assigned different values.
Consider, for example, the following literals and their associated data values:

(1) "Hello@"^^xs:string  ==> the string "Hello@"
(2) "Hello@"^^rdf:text   ==> the string "Hello"

Thus, despite the fact that the lexical forms are the same, the literals are
mapped to different data values. Note, however, that such a situation already
exists in existing datatypes. For example, "a"^^xs:hexBinary and
"a"^^xs:base64Binary are mapped to different data values, and so is the case for
"1"^^xs:float and "1"^^xs:integer.

* It is true that different lexical forms may be assigned the same data values.
Consider, for example, the following literals and their associated data values:

(3) "Hello"^^xs:string   ==> the string "Hello"
(4) "Hello@"^^rdf:text   ==> the string "Hello"

This is important for applications that want to use SPARQL with D-entailment.
The problems of rdf:text, however, are not unique, and exist in other datatypes.
For example, the following literals have distinct lexical forms, but are
assigned the same data value:

(5) "1.0"^^xs:decimal    ==> the integer 1
(6) "1"^^xs:integer      ==> the integer 1

If this causes problems for the definition of SPARQL's built-in functions, such
problems are not cause by rdf:text. Rather, such problems are caused by the fact
that the behavior of these functions might be unclear when used in a
D-entailment regime. Such problems should be dealt with by the SPARQL WG (or any
subset of it, such as SPARQL/OWL, who is interested in defining a proper
D-entailment regime for SPARQL.

As an rdf:text editor, however, I do not believe that rdf:text cause any
additional problems; that is, any problems are due to rdf:text, these are also
due to other XML Schema datatypes as well.

* The built-in functions STR, DATATYPE, and LANG operate on the lexical forms
only, so there is no problem: they should be evaluated as specified in the
present specification. Hence, we have the following behavior, which is a
consequence of the fact that rdf:text is just another datatype.

STR("Hello@"^^xs:string)= STR("Hello@"^^rdf:text) = "Hello@"
STR("Hello@en")=
STR("Hello@en"^^rdf:text)=
STR("Hello@en"^^xs:string)= "Hello"@en"

DATATYPE("Hello@en"^^xs:string)= xs:string DATATYPE("Hello@en"^^rdf:text)=
rdf:text DATATYPE("Hello@en")= xs:string DATATYPE("Hello"@en)= error

LANG("Hello"@en) = "en"
LANG("Hello@en") =
LANG("Hello@en"^^rdf:text) =
LANG("Hello@en"^^xs:string)= ""

* In D-entailment respecting rdf:text, any triple containing a literal
"Hello"@en would also entail a triple containing "Hello@en"^^rdf:text, and vice
versa. Similarly, and any triple containing a literal "Hello@en" would also
entail a triple "Hello@en@"^^rdf:text and a triple containing a literal
"Hello@en"^^xs:string.

That is completely analogous to the case of, say, xs:integer and xs:decimal: any
triple containing "1"^^xs:integer, should entail a triple containing the literal
"1.0"^^xs:decimal, and so on. Again, rdf:text introduces no additional problems;
furthermore, if there are any actual problems, these should be resolved by
defining a proper D-entailment regime for SPARQL.


Without trying to making any preconceptions about the definition of a
D-entailment regime for SPARQL, a possible D-entailment could work such that he
scoping graph for a BGP includes all forms of the relevant literals. For
example, the scoping graph for a BGP

    :s :p "Hello".

would be

    :s :p "Hello@"^^rdf:text.
    :s :p "Hello".

Please note, however, that this is again independent from rdf:text per se, and
that similar problems arise with other XML Schema datatypes, as outlined earlier
in this e-mail.


As a consequence, I believe that the LC comment of the SPARQL WG should be
addressed by simply removing any mention of literal replacement during graph
exchange. This makes it clear that rdf:text is just another, regular datatype
that is in no way different from the other XML Schema or user-defined datatypes.

Regards,

	Boris

Received on Saturday, 16 May 2009 16:03:29 UTC