- From: Aidan Hogan <aidhog@gmail.com>
- Date: Sat, 31 Oct 2020 16:09:03 -0300
- To: Martynas Jusevičius <martynas@atomgraph.com>, public-sparql-dev@w3.org
- Cc: Claus Stadler <cstadler@informatik.uni-leipzig.de>, james anderson <james@dydra.com>
Hi Martynas,
So long as your query is deterministic, there is no value to adding
order-by (because DESCRIBE returns RDF graphs that are unordered), so
A.2 could, strictly speaking, simply be:
DESCRIBE ?resource
{
?resource ?p ?o .
}
... unless perhaps the endpoint is using a non-standard limit that the
query exceeds, in which case the ORDER BY might be necessary. In that
case it would be more simple to write:
DESCRIBE ?resource
{
?resource ?p ?o .
}
ORDER BY ?resource
Essentially for DESCRIBE you typically care about the set* of solutions:
you do not care about order or duplicates. The only reason why the set
of solutions would change under ORDER BY is if there is also an OFFSET
or LIMIT associated with the query, or the endpoint. If this is the
case, then yes, adding ORDER BY would be needed (though the engine *may*
return the same solutions under LIMIT or OFFSET per some natural order
even if ORDER BY is not given).
* There is a small caveat in that duplicates might matter if distinct
blank nodes are returned for the graph generated by each duplicate; I
don't think the standard defines what should happen in this case, but
the graph would just end up being "non-lean" in that case. I think in
practice you would prefer not to describe duplicates.
For B.2), if you have only IRIs, it would be better to simply write:
DESCRIBE <http://result/resource1> <http://result/resource2> ...
<http://result/resourceN>
If you have other types of terms, you could rather write:
DESCRIBE ?resource
{
VALUES ?resource { <http://result/resource1> <http://result/resource2>
... <http://result/resourceN> }
}
Since these are the results of the first query, you do not need to
"check" the basic graph pattern again, and it would introduce
unnecessary costs to include it.
Whether A) or B) is better depends on the nature of the query. If you
have a simple SELECT query that generates lots of results very quickly
then A) would be better. If you have a complex SELECT query that returns
few results and takes quite long then B) would be better. In A) you have
the cost of running the query again, but in B) you have the potential
issue of generating a huge query (an option might be to describe
batches). It becomes a very similar problem to federated query
optimization, just that in this case you are calling the same engine
each time.
I'm not aware of a better way to do this, other than perhaps using one
DESCRIBE or CONSTRUCT query and from the resulting RDF graph extract the
solutions for the SELECT locally.
Best,
Aidan
On 2020-10-31 11:10, Martynas Jusevičius wrote:
> Hi,
>
> How does one consistently retrieve a result table (SELECT result) with
> some resources *and* the graph of those same resources (DESCRIBE
> result)?
>
> I see two options:
>
> A) This is what we currently use
>
> 1. Executing SELECT, e.g.
>
> SELECT ?resource
> {
> ?resource ?p ?o .
> }
> ORDER BY ?resource
>
> 2. Wrapping the SELECT into DESCRIBE and executing it
>
> DESCRIBE *
> {
> {
> SELECT ?resource
> {
> ?resource ?p ?o .
> }
> ORDER BY ?resource
> }
> }
>
> B)
>
> 1. Executing SELECT
>
> SELECT ?resource
> {
> ?resource ?p ?o .
> }
>
> 2. Using the resource URIs from the result to form a DESCRIBE and executing it
>
> DESCRIBE ?resource
> WHERE
> { ?resource ?p ?o }
> VALUES ?resource { <http://result/resource1> <http://result/resource1>
> ... <http://result/resourceN> }
>
>
> Here are the questions I have:
> 1. Are these approaches equivalent?
> 2. Is it correct that using approach A) we can expect the A.2 graph to
> be about those resources that were in the A.1 table only if ORDER BY
> is specified? I.e. explicit ordering is required to make it stable.
> 3. Are there other standard approaches?
>
> Thanks.
>
> Martynas
> atomgraph.com
>
Received on Saturday, 31 October 2020 19:09:19 UTC