Re: SELECT-ing and then DESCRIBE-ing from Aidan Hogan on 2020-10-31 (public-sparql-dev@w3.org from October to December 2020)

From: Aidan Hogan <aidhog@gmail.com>
Date: Sat, 31 Oct 2020 16:09:03 -0300
To: Martynas Jusevičius <martynas@atomgraph.com>, public-sparql-dev@w3.org
Cc: Claus Stadler <cstadler@informatik.uni-leipzig.de>, james anderson <james@dydra.com>
Message-ID: <8d998a93-857f-0854-0ce4-050e8ac46e9b@gmail.com>

Hi Martynas,

So long as your query is deterministic, there is no value to adding 
order-by (because DESCRIBE returns RDF graphs that are unordered), so 
A.2 could, strictly speaking, simply be:

DESCRIBE ?resource
{
     ?resource ?p ?o .
}

... unless perhaps the endpoint is using a non-standard limit that the 
query exceeds, in which case the ORDER BY might be necessary. In that 
case it would be more simple to write:

DESCRIBE ?resource
{
     ?resource ?p ?o .
}
ORDER BY ?resource

Essentially for DESCRIBE you typically care about the set* of solutions: 
you do not care about order or duplicates. The only reason why the set 
of solutions would change under ORDER BY is if there is also an OFFSET 
or LIMIT associated with the query, or the endpoint. If this is the 
case, then yes, adding ORDER BY would be needed (though the engine *may* 
return the same solutions under LIMIT or OFFSET per some natural order 
even if ORDER BY is not given).

* There is a small caveat in that duplicates might matter if distinct 
blank nodes are returned for the graph generated by each duplicate; I 
don't think the standard defines what should happen in this case, but 
the graph would just end up being "non-lean" in that case. I think in 
practice you would prefer not to describe duplicates.

For B.2), if you have only IRIs, it would be better to simply write:

DESCRIBE <http://result/resource1> <http://result/resource2> ... 
<http://result/resourceN>

If you have other types of terms, you could rather write:

DESCRIBE ?resource
{
  VALUES ?resource { <http://result/resource1> <http://result/resource2> 
... <http://result/resourceN> }
}

Since these are the results of the first query, you do not need to 
"check" the basic graph pattern again, and it would introduce 
unnecessary costs to include it.

Whether A) or B) is better depends on the nature of the query. If you 
have a simple SELECT query that generates lots of results very quickly 
then A) would be better. If you have a complex SELECT query that returns 
few results and takes quite long then B) would be better. In A) you have 
the cost of running the query again, but in B) you have the potential 
issue of generating a huge query (an option might be to describe 
batches). It becomes a very similar problem to federated query 
optimization, just that in this case you are calling the same engine 
each time.

I'm not aware of a better way to do this, other than perhaps using one 
DESCRIBE or CONSTRUCT query and from the resulting RDF graph extract the 
solutions for the SELECT locally.

Best,
Aidan

On 2020-10-31 11:10, Martynas Jusevičius wrote:
> Hi,
> 
> How does one consistently retrieve a result table (SELECT result) with
> some resources *and* the graph of those same resources (DESCRIBE
> result)?
> 
> I see two options:
> 
> A) This is what we currently use
> 
> 1. Executing SELECT, e.g.
> 
> SELECT ?resource
> {
>      ?resource ?p ?o .
> }
> ORDER BY ?resource
> 
> 2. Wrapping the SELECT into DESCRIBE and executing it
> 
> DESCRIBE *
> {
>      {
>          SELECT ?resource
>          {
>              ?resource ?p ?o .
>          }
>          ORDER BY ?resource
>      }
> }
> 
> B)
> 
> 1. Executing SELECT
> 
> SELECT ?resource
> {
>      ?resource ?p ?o .
> }
> 
> 2. Using the resource URIs from the result to form a DESCRIBE and executing it
> 
> DESCRIBE ?resource
> WHERE
>    { ?resource  ?p  ?o }
> VALUES ?resource { <http://result/resource1> <http://result/resource1>
> ... <http://result/resourceN> }
> 
> 
> Here are the questions I have:
> 1. Are these approaches equivalent?
> 2. Is it correct that using approach A) we can expect the A.2 graph to
> be about those resources that were in the A.1 table only if ORDER BY
> is specified? I.e. explicit ordering is required to make it stable.
> 3. Are there other standard approaches?
> 
> Thanks.
> 
> Martynas
> atomgraph.com
>

Received on Saturday, 31 October 2020 19:09:19 UTC