Re: RDF WG Resolution Regarding Various Forms of String Literals

Summary:

A design to use RDF is easy - it's the backward compatiblity that is messy.

1/ If RDF-WG change the terminology, I think we're blocked from going to 
REC until they do.  We may be blocked anyway to be sure the decision on 
abstract syntax remains as it is.

2/ Some functions need some work where simple literals are used for lang 
tags and lexical forms.


On 20/06/11 05:50, Lee Feigenbaum wrote:
> [all Tos/CCs removed except for SPARQL WG and David Wood]
>
> We need to discuss over email and tomorrow on our call how to approach
> this. Here are some miscellaneous thoughts -- I haven't studied this
> closely -- I guess that maybe Andy has thought about it significantly
> more than I have.

Not that much - some analysis below.

 From experience and hanlding Jena support questions:

A1/ The simple literal/xsd:String confusion does come up in SPARQL 
sometimes but not often.  It is explainable and users get the explanation.

Users have also got used to adding str() occassionally, for URIs etc so 
it does not create surprises.  (In ARQ, regex on xsd:Strings just works.)

A2/ I do not see it occurring in practice in RDF.  All the arguments 
seem to be an abstract concern driven by explaining RDF and by formal 
OWL/RIF work, not RDF data in practice.  Either simple literals occur or 
xsd:Strings occur in data -- I can't remember seeing a mixture and 
anyone being caught out by it.

A3/ In Jena, memory models store simple literals and xsd:strings 
separately but they compare equal.  In persistent storage models, they 
don't.  This is rarely a source of confusion because mixtures rarely 
occur in practice.

> 0) As a WG, are we OK with this decision by the RDF WG? Do we have any
> feedback to send to the other WG?

If other people can corroborate, see (A2)

> 1) We ought to do whatever we can to make "foo" and "foo"^^xs:string
> parse as the same abstract thing (a typed literal with xs:string as its
> type) in SPARQL queries. I see two general approaches here, depending on
> _how_ the RDF WG implements their proposal.
>
> A) If they implement it by removing any mention of "plain literal" from
> their spec, then we need to do the same. This would be a pretty big
> change to SPARQL 1.1 Query, I think, and would also mean that SPARQL 1.1
> Query probably could not advance to Rec until the RDF WG documents do?

Agreed - we need to do the same.

It is a 2nd LC at least and SPARQL is blocked from proceeding to REC 
until it is certain that the RDF-WG will follow that decision, which in 
practice means when RDF proceeds to REC.

"plain literal" means all literals without a datatype and includes 
language tags.

http://www.w3.org/TR/rdf-concepts/#dfn-plain-literal
"""
Plain literals have a lexical form and optionally a language tag
"""

There is discussion involving removing the terminology "plain literal" 
from the abstract syntax but that is beyond what the RDF-WG has decided 
currently which only applies to what SPARQL calls simple literals.  Even 
the wiki page has better language "Abolish plain literals without 
language tag from the abstract syntax".

http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/AbolishUntaggedPlain

I have suggested using the terminology "simple literals" in RDF, and RDF 
term which covers IRIs, bNodes and literals, but got no response from 
the editors.

> B) If they implement the proposal by redefining "plain literal" such
> that it means the same thing as "literal typed as xs:string", then we
> may not need to make much change to SPARQL at all. In that case, SPARQL
> 1.1 Query could proceed forward as normal, and once the new RDF
> documents were Recs, SPARQL's meaning would shift along with RDF's
> meaning. (This is my preferred method forward, but I don't know what the
> RDF editors intend, nor what's possible.)

Anywhere "simple literal" is used, then we need to check.  W euse simple 
literal for lxicial forms and for the lang tag themselves.

The literal-with-language thing needs settling.

One proposal is that literal-with-language tag are of class, not 
datatype, rdf:LangTagString or some such name.  That leads to a mess 
around DATATYPE because, while technically it need not change, it is 
going to be confusing if we split hairs on class vs datatypes for some 
set of literals.

> 2) We need to do something about the SPARQL Results XML Format.
> Specifically, we need to give guidance about how to serialize literals
> with type xs:string, since that's what all plain literals will now be.
> Perhaps the best path forward is to change this section:
>
> """
> The value of a query variable binding, which is an RDF Term, is included
> as the content of the binding as follows:
>
> RDF URI Reference U
> <binding><uri>U</uri></binding>
> RDF Literal S
> <binding><literal>S</literal></binding>
> RDF Literal S with language L
> <binding><literal xml:lang="L">S</literal></binding>
> RDF Typed Literal S with datatype URI D
> <binding><literal datatype="D">S</literal></binding>
> Blank Node label I
> <binding><bnode>I</bnode></binding>
> """
>
> in 2.3.1 (http://www.w3.org/TR/rdf-sparql-XMLres/#results)
>
> (That section is already a bit less rigorous then it could be, since it
> refers to "RDF Literal S" rather than "RDF Plain Literal S without a
> language".)
>
> My personal preference would be that the XML results format suggest that
> implementations SHOULD serialize an RDF literal with type xs:string as:
>
> <binding><literal>S</literal></binding>
>
> ...excluding datatype="...string".
>
> I'd prefer that this be a SHOULD and not a MUST.

I agree we need to do something and I agree with the suggestion.

I am strongly in favour of SHOULD, not MUST, langauge.  I see no reason 
why existing systems should be made non-comformant on a spec detail and 
also it recognises that there is a transition to be made and software is 
going to have to cope.

>
> We'll need a volunteer to make whatever change we decide here and to
> help with the publication process. We'll publish this as both a FPWD and
> LCWD in our next publication cycle.
>
> Lee

Here is a run through of effects on the spec: I may well have missed 
something.

It is leaving the possibility open for systems that need cope with the 
ways things are to day. Simply changing SPARQL to uniformly work on 
xsd:strings is, IMHO, unworkable.  Application writers don't write their 
own SPARQL engines - deployed applications have to be checked and 
potential updated.  Maybe a few chnages - but significant costs to 
requalify apps.


B1/ Content checking

Pass over the document to look for "plain literal" don't show anything 
too bad.

B2/ Functions

The functions are the most affected - operator dispatch table needs 
revising but that seems to have a natural update.

Quite a few functions take arguments that are simple literals or 
xsd:strings - these just work. Things that return simple literals need 
more thought.

Ones to pay attention to:

B2.1/ Use of simple literals for the lexical form:
   STR, STRDT, STRLANG

STRLANG is particular tricky here

     PlainLiteralWithLang STRLANG(lexicalForm, langTag)

Its workable to make the args xsd:strings - it's explaining why 
STRLANG(literalWithDatatype, X) removes the datatype that seems messy.

B2.2/ DATATYPE

DATATYPE("foo"@en) = ?????

B2.3/ LANG

What kind of thing is the lang tag itself?  Currently, a simple literal.


B3/ BGP Matching

This is mainly a matter of compatibility.

The RDF abstract syntax will only have xsd:strings.  All simple literals 
will have been "upgraded" to xsd:strings.

Assuming { ?x :p "x"^^xsd:string } is legal (but the not-preferred form) 
then there needs to be words about matching data :x :p "x" where it used 
not to, and the other way round.



And there is SPARQL Update. Just parsing "x" to "x"^^xsd:String is going 
to break when updating existing data with SPARQL 1.1 + xsd:String.  At 
least some words about what this change means will be needed.

DELETE DATA { ?x :p "x" }

	Andy

Received on Monday, 20 June 2011 12:28:51 UTC