Re: CONSTRUCT operator from Pat Hayes on 2006-01-25 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 25 Jan 2006 12:36:22 -0600
To: Sergio Tessaris <tessaris@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230902bffd6f8afece@[10.100.0.23]>
>Looking at the formal definition of the CONSTRUCT operator in 
><http://www.w3.org/2001/sw/DataAccess/rq23/#construct> I think that 
>it doesn't correspond to the actual intended behaviour. In 
>particular, it doesn't maintain the co-references of bnodes from the 
>Dataset (because of the naive use of RDFmerge operator, see below 
>for details).

Good catch. I had thought of this but decided it didn't affect the 
spec, because I forgot all about CONSTRUCT :-(

>
>BTW I looked at both the Use Cases and Testcases and I wasn't able 
>to find any reference to the CONSTRUCT operator. Therefore I'm 
>proposing two queries, to show the problem with the current 
>CONSTRUCT operator definition, which can be used both as use cases 
>or testcases.
>
>Q1) Build a graph which is graph-equivalent to the Dataset (identity):
>
>     CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }
>
>Q2) Build a reification of the Dataset
>
>     CONSTRUCT { _:s rdf:subject ?s .
>                 _:s rdf:predicate ?p .
>                 _:s rdf:object ?o } WHERE { ?s ?p ?o }
>
>Assume the following Dataset:
>
>_:a foaf:name "Alice" .
>_:a foaf:mbox <mailto:alice@example.org> .
>
>The BGP { ?s ?p ?o } returns the following answer set (I'm not 
>keeping told bnodes):
>
>S1 = [ ?s/_:g1, ?p/foaf:name, ?o/"Alice" ]
>S1 = [ ?s/_:g1, ?p/foaf:mbox, ?o/<mailto:alice@example.org> ]
>
>With Q1 we have that S1(T) = { _:g1 foaf:name "Alice" } and S2(T) = 
>{ _:g1 foaf:mbox <mailto:alice@example.org> }; so, according to the 
>semantics:
>
>   S1(T) RDFmerge S2(T) = { _:g1 foaf:name "Alice" .
>                            _:g24 foaf:mbox <mailto:alice@example.org> }
>
>which is not graph-equivalent to the original Dataset.
>
>In this case unioning rather than merging would solve the problem, 
>but in this way you'd break Q2 (or the example in 10.3.1). As you 
>see here we are in a situation which is similar to the problems we 
>had with the BGP, bnodes in templates should behave as bnodes and 
>the ones in answer sets as constants.

Right, there is a scope clash. What we need is a distinct 'copy' of 
the BGP, one for each answer binding, with their pattern bnodes 
separated from each other, since the scope of the pattern bnodes is 
the local pattern, but that of answer bnodes is the whole set. We 
need to merge the former and union the latter.

>I see two possible solutions, and I'd like to hear from the group 
>which can be the better. In any case we can assume that an answer 
>set is an ordered sequence of pattern solutions S1...Sn (here 
>there's a technical detail on whether we allow for infinite answers, 
>see below).
>
>The two solutions are:
>
>1) recursive definition using something similar to the ordered merge 
>(or the latest G'/BGP' trick)
>
>	C_1 = S1(T)
>	C_k = Sk( C_k-1 RDFmerge T)
>
>and CONSTRUCT T with answer set {S1...Sn} would be C_n. This 
>solution has the drawback that it works with finite sequences only, 
>and we need to define the RDFmerge for templates (the same as the 
>one for BGP).

It could be defined using transfinite induction if it were ever 
necessary for theoretical purposes. But this seems like overkill.

>2) non-recursive definition, by defining a sequence of templates. 
>Let be T_1...T_n a sequence of templates such that each T_i is graph 
>equivalent to T and T_1...T_n don't share any bnode among them and 
>with the Dataset (G').

Good trick. I got stuck because I was trying to extend the definition 
of graph merge to a set of templates rather than to a sequence.

>Then CONSTRUCT T would be the
>
>     U_i=1..n Si(T_i)
>
>this one works with infinite answersets as well.

I like this second one much better, it seems more direct and understandable.

>I just sketched the two solutions, the details are not so difficult 
>to spell out and I can do it if the WG agrees on one of the two.
>
>Moreover, I think that we should adopt a different notation for the 
>variable substitution in graph templates, since it looks the same as 
>the one for BGP but it behaves differently. In particular wrt 
>variables not present in the domain of a pattern solution.

There once was a tweak in the definition of substitution mapping 
which I think handles this, where S(v)=v if v is outside the domain 
of S, so S maps V to (RDF_T union V). This makes all S's total on 
templates and maps templates to templates (graphs being the special 
case of templates with no variables). Would this handle the issue 
that you are referring to? (If not, I'd like to see an example, 
because in that case Im not following you.)

Pat


>
>Cheers,
>--sergio


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 25 January 2006 18:36:36 UTC