CONSTRUCT operator from Sergio Tessaris on 2006-01-25 (public-rdf-dawg@w3.org from January to March 2006)

From: Sergio Tessaris <tessaris@inf.unibz.it>
Date: Wed, 25 Jan 2006 12:57:24 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <A5B34B35-AC27-4A71-A467-A028C9B6EB4D@inf.unibz.it>
Looking at the formal definition of the CONSTRUCT operator in <http:// 
www.w3.org/2001/sw/DataAccess/rq23/#construct> I think that it  
doesn't correspond to the actual intended behaviour. In particular,  
it doesn't maintain the co-references of bnodes from the Dataset  
(because of the naive use of RDFmerge operator, see below for details).

BTW I looked at both the Use Cases and Testcases and I wasn't able to  
find any reference to the CONSTRUCT operator. Therefore I'm proposing  
two queries, to show the problem with the current CONSTRUCT operator  
definition, which can be used both as use cases or testcases.

Q1) Build a graph which is graph-equivalent to the Dataset (identity):

     CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }

Q2) Build a reification of the Dataset

     CONSTRUCT { _:s rdf:subject ?s .
                 _:s rdf:predicate ?p .
                 _:s rdf:object ?o } WHERE { ?s ?p ?o }

Assume the following Dataset:

_:a foaf:name "Alice" .
_:a foaf:mbox <mailto:alice@example.org> .

The BGP { ?s ?p ?o } returns the following answer set (I'm not  
keeping told bnodes):

S1 = [ ?s/_:g1, ?p/foaf:name, ?o/"Alice" ]
S1 = [ ?s/_:g1, ?p/foaf:mbox, ?o/<mailto:alice@example.org> ]

With Q1 we have that S1(T) = { _:g1 foaf:name "Alice" } and S2(T) =  
{ _:g1 foaf:mbox <mailto:alice@example.org> }; so, according to the  
semantics:

   S1(T) RDFmerge S2(T) = { _:g1 foaf:name "Alice" .
                            _:g24 foaf:mbox <mailto:alice@example.org> }

which is not graph-equivalent to the original Dataset.

In this case unioning rather than merging would solve the problem,  
but in this way you'd break Q2 (or the example in 10.3.1). As you see  
here we are in a situation which is similar to the problems we had  
with the BGP, bnodes in templates should behave as bnodes and the  
ones in answer sets as constants.

I see two possible solutions, and I'd like to hear from the group  
which can be the better. In any case we can assume that an answer set  
is an ordered sequence of pattern solutions S1...Sn (here there's a  
technical detail on whether we allow for infinite answers, see below).

The two solutions are:

1) recursive definition using something similar to the ordered merge  
(or the latest G'/BGP' trick)

	C_1 = S1(T)
	C_k = Sk( C_k-1 RDFmerge T)

and CONSTRUCT T with answer set {S1...Sn} would be C_n. This solution  
has the drawback that it works with finite sequences only, and we  
need to define the RDFmerge for templates (the same as the one for BGP).

2) non-recursive definition, by defining a sequence of templates. Let  
be T_1...T_n a sequence of templates such that each T_i is graph  
equivalent to T and T_1...T_n don't share any bnode among them and  
with the Dataset (G'). Then CONSTRUCT T would be the

     U_i=1..n Si(T_i)

this one works with infinite answersets as well.

I just sketched the two solutions, the details are not so difficult  
to spell out and I can do it if the WG agrees on one of the two.

Moreover, I think that we should adopt a different notation for the  
variable substitution in graph templates, since it looks the same as  
the one for BGP but it behaves differently. In particular wrt  
variables not present in the domain of a pattern solution.

Cheers,
--sergio
Received on Wednesday, 25 January 2006 11:57:34 UTC