- From: Aidan Hogan <aidhog@gmail.com>
- Date: Sat, 31 Oct 2020 16:09:03 -0300
- To: Martynas Jusevičius <martynas@atomgraph.com>, public-sparql-dev@w3.org
- Cc: Claus Stadler <cstadler@informatik.uni-leipzig.de>, james anderson <james@dydra.com>
Hi Martynas, So long as your query is deterministic, there is no value to adding order-by (because DESCRIBE returns RDF graphs that are unordered), so A.2 could, strictly speaking, simply be: DESCRIBE ?resource { ?resource ?p ?o . } ... unless perhaps the endpoint is using a non-standard limit that the query exceeds, in which case the ORDER BY might be necessary. In that case it would be more simple to write: DESCRIBE ?resource { ?resource ?p ?o . } ORDER BY ?resource Essentially for DESCRIBE you typically care about the set* of solutions: you do not care about order or duplicates. The only reason why the set of solutions would change under ORDER BY is if there is also an OFFSET or LIMIT associated with the query, or the endpoint. If this is the case, then yes, adding ORDER BY would be needed (though the engine *may* return the same solutions under LIMIT or OFFSET per some natural order even if ORDER BY is not given). * There is a small caveat in that duplicates might matter if distinct blank nodes are returned for the graph generated by each duplicate; I don't think the standard defines what should happen in this case, but the graph would just end up being "non-lean" in that case. I think in practice you would prefer not to describe duplicates. For B.2), if you have only IRIs, it would be better to simply write: DESCRIBE <http://result/resource1> <http://result/resource2> ... <http://result/resourceN> If you have other types of terms, you could rather write: DESCRIBE ?resource { VALUES ?resource { <http://result/resource1> <http://result/resource2> ... <http://result/resourceN> } } Since these are the results of the first query, you do not need to "check" the basic graph pattern again, and it would introduce unnecessary costs to include it. Whether A) or B) is better depends on the nature of the query. If you have a simple SELECT query that generates lots of results very quickly then A) would be better. If you have a complex SELECT query that returns few results and takes quite long then B) would be better. In A) you have the cost of running the query again, but in B) you have the potential issue of generating a huge query (an option might be to describe batches). It becomes a very similar problem to federated query optimization, just that in this case you are calling the same engine each time. I'm not aware of a better way to do this, other than perhaps using one DESCRIBE or CONSTRUCT query and from the resulting RDF graph extract the solutions for the SELECT locally. Best, Aidan On 2020-10-31 11:10, Martynas Jusevičius wrote: > Hi, > > How does one consistently retrieve a result table (SELECT result) with > some resources *and* the graph of those same resources (DESCRIBE > result)? > > I see two options: > > A) This is what we currently use > > 1. Executing SELECT, e.g. > > SELECT ?resource > { > ?resource ?p ?o . > } > ORDER BY ?resource > > 2. Wrapping the SELECT into DESCRIBE and executing it > > DESCRIBE * > { > { > SELECT ?resource > { > ?resource ?p ?o . > } > ORDER BY ?resource > } > } > > B) > > 1. Executing SELECT > > SELECT ?resource > { > ?resource ?p ?o . > } > > 2. Using the resource URIs from the result to form a DESCRIBE and executing it > > DESCRIBE ?resource > WHERE > { ?resource ?p ?o } > VALUES ?resource { <http://result/resource1> <http://result/resource1> > ... <http://result/resourceN> } > > > Here are the questions I have: > 1. Are these approaches equivalent? > 2. Is it correct that using approach A) we can expect the A.2 graph to > be about those resources that were in the A.1 table only if ORDER BY > is specified? I.e. explicit ordering is required to make it stable. > 3. Are there other standard approaches? > > Thanks. > > Martynas > atomgraph.com >
Received on Saturday, 31 October 2020 19:09:19 UTC