Re: CONSTRUCT operator from Lee Feigenbaum on 2006-01-25 (public-rdf-dawg@w3.org from January to March 2006)

From: Lee Feigenbaum <feigenbl@us.ibm.com>
Date: Wed, 25 Jan 2006 10:05:29 -0500
To: Sergio Tessaris <tessaris@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>, public-rdf-dawg-request@w3.org
Message-ID: <OFACBD3926.E40ADD15-ON85257101.0051CD62-85257101.0052E3F8@us.ibm.com>
A small part of the problem is a disparity between the formal definition 
(in the blue box) and the text. The formal definition uses RDF Merge while 
the text in 10.3 says:

"""
The result is an RDF graph formed by taking each query solution in the 
solution sequence, substituting for the variables into the graph template 
and combining the triples into a single RDF graph by set union.
"""

Regardless of that dispartiy though, there is still the problem that 
Sergio points out that using RDF Merge fails to scope bnodes from the 
dataset to the entire solution while using union loses the ability to 
scope bnodes from the graph template to individual solutions. 

I prefer Sergio's second solution since it more closely matches my 
intuition for what CONSTRUCT is doing in these cases.

Lee


public-rdf-dawg-request@w3.org wrote on 01/25/2006 06:57:24 AM:

> 
> Looking at the formal definition of the CONSTRUCT operator in <http:// 
> www.w3.org/2001/sw/DataAccess/rq23/#construct> I think that it 
> doesn't correspond to the actual intended behaviour. In particular, 
> it doesn't maintain the co-references of bnodes from the Dataset 
> (because of the naive use of RDFmerge operator, see below for details).
> 
> BTW I looked at both the Use Cases and Testcases and I wasn't able to 
> find any reference to the CONSTRUCT operator. Therefore I'm proposing 
> two queries, to show the problem with the current CONSTRUCT operator 
> definition, which can be used both as use cases or testcases.
> 
> Q1) Build a graph which is graph-equivalent to the Dataset (identity):
> 
>      CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }
> 
> Q2) Build a reification of the Dataset
> 
>      CONSTRUCT { _:s rdf:subject ?s .
>                  _:s rdf:predicate ?p .
>                  _:s rdf:object ?o } WHERE { ?s ?p ?o }
> 
> Assume the following Dataset:
> 
> _:a foaf:name "Alice" .
> _:a foaf:mbox <mailto:alice@example.org> .
> 
> The BGP { ?s ?p ?o } returns the following answer set (I'm not 
> keeping told bnodes):
> 
> S1 = [ ?s/_:g1, ?p/foaf:name, ?o/"Alice" ]
> S1 = [ ?s/_:g1, ?p/foaf:mbox, ?o/<mailto:alice@example.org> ]
> 
> With Q1 we have that S1(T) = { _:g1 foaf:name "Alice" } and S2(T) = 
> { _:g1 foaf:mbox <mailto:alice@example.org> }; so, according to the 
> semantics:
> 
>    S1(T) RDFmerge S2(T) = { _:g1 foaf:name "Alice" .
>                             _:g24 foaf:mbox <mailto:alice@example.org> }
> 
> which is not graph-equivalent to the original Dataset.
> 
> In this case unioning rather than merging would solve the problem, 
> but in this way you'd break Q2 (or the example in 10.3.1). As you see 
> here we are in a situation which is similar to the problems we had 
> with the BGP, bnodes in templates should behave as bnodes and the 
> ones in answer sets as constants.
> 
> I see two possible solutions, and I'd like to hear from the group 
> which can be the better. In any case we can assume that an answer set 
> is an ordered sequence of pattern solutions S1...Sn (here there's a 
> technical detail on whether we allow for infinite answers, see below).
> 
> The two solutions are:
> 
> 1) recursive definition using something similar to the ordered merge 
> (or the latest G'/BGP' trick)
> 
>    C_1 = S1(T)
>    C_k = Sk( C_k-1 RDFmerge T)
> 
> and CONSTRUCT T with answer set {S1...Sn} would be C_n. This solution 
> has the drawback that it works with finite sequences only, and we 
> need to define the RDFmerge for templates (the same as the one for BGP).
> 
> 2) non-recursive definition, by defining a sequence of templates. Let 
> be T_1...T_n a sequence of templates such that each T_i is graph 
> equivalent to T and T_1...T_n don't share any bnode among them and 
> with the Dataset (G'). Then CONSTRUCT T would be the
> 
>      U_i=1..n Si(T_i)
> 
> this one works with infinite answersets as well.
> 
> I just sketched the two solutions, the details are not so difficult 
> to spell out and I can do it if the WG agrees on one of the two.
> 
> Moreover, I think that we should adopt a different notation for the 
> variable substitution in graph templates, since it looks the same as 
> the one for BGP but it behaves differently. In particular wrt 
> variables not present in the domain of a pattern solution.
> 
> Cheers,
> --sergio
> 
> 
> 
>
Received on Wednesday, 25 January 2006 15:08:01 UTC