Re: SELECT-ing and then DESCRIBE-ing

Thanks for the response Aidan.

My bad for failing to mention that indeed we use this approach in
combination with pagination, so yes, OFFSET and LIMIT are in the
picture, and therefore ORDER BY is necessary. A more realistic example
would be:

DESCRIBE *
{
    {
        SELECT ?resource
        {
            ?resource ?p ?o .
        }
        ORDER BY ?resource
        OFFSET 20
        LIMIT 20
    }
}

I realized as well that B) would probably not work with blank nodes.

DESCRIBE <http://result/resource1> <http://result/resource2> ...
<http://result/resourceN> could be problematic, as IIRC it wont' work
in some triplestores if there are no bindings from the WHERE clause.

Re. extracting the SELECT solutions locally from a DESCRIBE/CONSTRUCT
result -- we've considered this, but it would have to happen in the
browser which means there has to be a client-side SPARQL engine.
Which might be the case now with Quadstore:
https://github.com/beautifulinteractions/node-quadstore

I think Claus has raised a somewhat similar SPARQL 1.2 issue, although
I'm not 100% sure: https://github.com/w3c/sparql-12/issues/128

On Sat, Oct 31, 2020 at 8:09 PM Aidan Hogan <aidhog@gmail.com> wrote:
>
> Hi Martynas,
>
> So long as your query is deterministic, there is no value to adding
> order-by (because DESCRIBE returns RDF graphs that are unordered), so
> A.2 could, strictly speaking, simply be:
>
> DESCRIBE ?resource
> {
>      ?resource ?p ?o .
> }
>
> ... unless perhaps the endpoint is using a non-standard limit that the
> query exceeds, in which case the ORDER BY might be necessary. In that
> case it would be more simple to write:
>
> DESCRIBE ?resource
> {
>      ?resource ?p ?o .
> }
> ORDER BY ?resource
>
> Essentially for DESCRIBE you typically care about the set* of solutions:
> you do not care about order or duplicates. The only reason why the set
> of solutions would change under ORDER BY is if there is also an OFFSET
> or LIMIT associated with the query, or the endpoint. If this is the
> case, then yes, adding ORDER BY would be needed (though the engine *may*
> return the same solutions under LIMIT or OFFSET per some natural order
> even if ORDER BY is not given).
>
> * There is a small caveat in that duplicates might matter if distinct
> blank nodes are returned for the graph generated by each duplicate; I
> don't think the standard defines what should happen in this case, but
> the graph would just end up being "non-lean" in that case. I think in
> practice you would prefer not to describe duplicates.
>
>
>
> For B.2), if you have only IRIs, it would be better to simply write:
>
> DESCRIBE <http://result/resource1> <http://result/resource2> ...
> <http://result/resourceN>
>
> If you have other types of terms, you could rather write:
>
> DESCRIBE ?resource
> {
>   VALUES ?resource { <http://result/resource1> <http://result/resource2>
> ... <http://result/resourceN> }
> }
>
> Since these are the results of the first query, you do not need to
> "check" the basic graph pattern again, and it would introduce
> unnecessary costs to include it.
>
>
> Whether A) or B) is better depends on the nature of the query. If you
> have a simple SELECT query that generates lots of results very quickly
> then A) would be better. If you have a complex SELECT query that returns
> few results and takes quite long then B) would be better. In A) you have
> the cost of running the query again, but in B) you have the potential
> issue of generating a huge query (an option might be to describe
> batches). It becomes a very similar problem to federated query
> optimization, just that in this case you are calling the same engine
> each time.
>
> I'm not aware of a better way to do this, other than perhaps using one
> DESCRIBE or CONSTRUCT query and from the resulting RDF graph extract the
> solutions for the SELECT locally.
>
> Best,
> Aidan
>
> On 2020-10-31 11:10, Martynas Jusevičius wrote:
> > Hi,
> >
> > How does one consistently retrieve a result table (SELECT result) with
> > some resources *and* the graph of those same resources (DESCRIBE
> > result)?
> >
> > I see two options:
> >
> > A) This is what we currently use
> >
> > 1. Executing SELECT, e.g.
> >
> > SELECT ?resource
> > {
> >      ?resource ?p ?o .
> > }
> > ORDER BY ?resource
> >
> > 2. Wrapping the SELECT into DESCRIBE and executing it
> >
> > DESCRIBE *
> > {
> >      {
> >          SELECT ?resource
> >          {
> >              ?resource ?p ?o .
> >          }
> >          ORDER BY ?resource
> >      }
> > }
> >
> > B)
> >
> > 1. Executing SELECT
> >
> > SELECT ?resource
> > {
> >      ?resource ?p ?o .
> > }
> >
> > 2. Using the resource URIs from the result to form a DESCRIBE and executing it
> >
> > DESCRIBE ?resource
> > WHERE
> >    { ?resource  ?p  ?o }
> > VALUES ?resource { <http://result/resource1> <http://result/resource1>
> > ... <http://result/resourceN> }
> >
> >
> > Here are the questions I have:
> > 1. Are these approaches equivalent?
> > 2. Is it correct that using approach A) we can expect the A.2 graph to
> > be about those resources that were in the A.1 table only if ORDER BY
> > is specified? I.e. explicit ordering is required to make it stable.
> > 3. Are there other standard approaches?
> >
> > Thanks.
> >
> > Martynas
> > atomgraph.com
> >

Received on Saturday, 31 October 2020 19:20:32 UTC