Re: SPARQL WG Soliciting Early Reviews of Working Drafts from Peter Ansell on 2010-07-08 (public-rdf-dawg-comments@w3.org from July 2010)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Thu, 8 Jul 2010 11:22:10 +1000
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group Comments <public-rdf-dawg-comments@w3.org>
Message-ID: <AANLkTilxKlQ5eNGWI_6iWRjxCdJa_afXFCe0e0kLrEw9@mail.gmail.com>
On 2 July 2010 14:56, Eric Prud'hommeaux <eric@w3.org> wrote:
> * Peter Ansell <ansell.peter@gmail.com> [2010-07-02 12:05+1000]
>> On 2 July 2010 11:30, Lee Feigenbaum <lee@thefigtrees.net> wrote:
>> > On 7/1/2010 6:44 PM, Peter Ansell wrote:
>> >>
>> >> The June 1 SPARQL Federation draft [1] doesn't make it clear how
>> >> GRAPHS and FROM/FROM NAMED etc map to, or are omited from, Federated
>> >> queries. It does say "with a query Q and no default-graph-uri or
>> >> named-graph-uri" in section 4.1, but it doesn't make it clear in the
>> >> examples. Is the idea is that you can't use GRAPH/FROM/FROM NAMED when
>> >> you are using SERVICE.
>> >>
>> >> Personally, I would find it much more useful if Federation wasn't
>> >> restricted to the default graph, as any number of endpoints may not
>> >> have any data at all in the default graph which would make them immune
>> >> to federated queries. I wouldn't like to see Federation introduced at
>> >> the expense of graphs.
>> >
>> > Hi Peter,
>> >
>> > The SERVICE keyword is a way to effectively embed an invocation of the
>> > SPARQL protocol within a query. The text in 4.1 specifies that the remote
>> > service should be invoked without any default-graph0uri or named-graph-uri
>> > parameters. The effect of this is that the remote endpoint will use its
>> > default RDF dataset -- this default dataset consists of a default graph
>> > (potentially empty) and zero or more named graphs.
>> >
>> > You can indeed use a GRAPH clause within SERVICE, and the graph pattern
>> > within the GRAPH clause will be matched against the remote endpoint's named
>> > graphs.
>> >
>> > Does this explain the situation? If so, does it address your concerns?
>>
>> That does explain the note about default-graph-uri etc..
>>
>> Is it also allowable to put the GRAPH outside the SERVICE pattern? The
>> current syntax seems to put them on the same level as both are part of
>> [49] GraphPatternNotTriples in the syntax, so the following may be
>> legal?
>>
>> GRAPH <http://example.com/mygraph>
>> {
>>   {
>>     SERVICE <http://example.com/sparql>
>>     {
>>       ?s ?p ?o .
>>     }
>>  }
>> UNION
>> {
>>     SERVICE <http://example.com/sparql2>
>>     {
>>       ?s ?p2 ?o2 .
>>     }
>>   }
>> }
>>
>> Adding an example to show how GRAPH's and FROM relate to the new
>> SERVICE pattern would be useful. It may be useful to change the syntax
>> to make sure that SERVICE will always be a top level pattern, or never
>> be inside a GRAPH pattern, if that is the intention.
>
> Two factors make this tricky:
>  1. Does the implicit query have a FROM, or just a GRAPH constraint?
>  2. Is GRAPH <G1> { GRAPH <G2> { ?s ?p ?o } } == GRAPH <G2> { ?s ?p ?o }?
>
> For 1, my temptation is to say that FROM <G1> is not implied by
>  GRAPH <G1> { SERVICE <S1> { … } }
> ; that instead the federated query should just be
>  SELECT … { GRAPH <G1> { … } } # no FROM <G1>
> , and let query crafters add the FROM to the service
> description à la:
>  GRAPH <G1> { SERVICE <S1?named-graph-uri=G1> { … } }

I would prefer that embedded graphs, and having SERVICE under a GRAPH
would not be allowed, to force users to specify the graph that should
be used on the remote service from within the SERVICE keyword.

> For 2, it's perhaps acceptable to add a transformation rule for
> SERVICE queries nested inside a single GRAPH and say that in general,
> doubly-nested GRAPH constraints are not defined.
>
> GraphGraphPattern URIorVar { ... ServiceGraphPattern } =>
> Service(IRI, Transform(GraphGraphPattern(URIorVar, GroupGraphPattern)))

It would be simpler to enforce the idea that GRAPH graph patterns
should not be parents of SERVICE patterns, but this may work also if
it will be a common case.

>
>> >> In the BINDINGS syntax, is it 'UNDEF' or 'UNBOUND'. Currently both are
>> >> used but they seem to have the same meaning.
>>
>> Just out of curiosity, which keyword is currently preferred here?
>
> I've implemented UNDEF. I vaguely recall that I originally lobbied for
> UNBOUND (to match some terminology in the SPARQL specification), but
> that folks preferred UNDEF. I'm pretty ambivilant; what's your pref?

It seems to match the BOUND function, although it is the opposite and
it is a variable rather than a function. It might be better to avoid
people assuming there are common characteristics and just stick with
UNDEF from the first version.

>
>> >> In the section 3 example there is the variable ?human in the bindings
>> >> section, and ?species in the main part of the query. Is ?human
>> >> supposed to provide values for ?species, as it is not used apart from
>> >> that. Seems like a typo where ?human needs to be changed to ?species.
>> >
>> > I'll leave these two to be fixed up by the editor in due course.
>
> I believe the editor's draft has this fixed; the change log
>  http://www.w3.org/2009/sparql/docs/fed/service#sec-cvsLog
> indicates at R 1.10 .
>
>> > thanks for the review,
>
> indeed.
>
>> > Lee
>>
>> Another query... How does one control the paging of results from
>> particular SERVICE calls? A use case would be if you know that results
>> from an endpoint need to be retrieved in offsets of a particular
>> number, but that restriction does not apply to other endpoints.
>>
>> The reasoning for this is that some sparql endpoints, notably dbpedia,
>> may legitimately reject queries if the results are going to be too
>> large, so you can't rely on all results coming down in one call, or
>> even coming down at all if you set the default too high. The following
>> example roughly demonstrates the idea:
>>
>>     SERVICE <http://dbpedia.org/sparql>
>>     {
>>       ?s ?p ?o .
>>     } ORDER BY ?s ?p ?o PAGEVALUE 500
>>     SERVICE <http://ontologies.localnetwork/sparql>
>>     {
>>       ?p ?p2 ?o2 .
>>     } ORDER BY ?p ?p2 ?o2 PAGEVALUE 10000
>>
>> If it is easier, it may be useful to extend the Service Description
>> specification to include details about preferred OFFSET values and the
>> maximum results ever returned by the endpoint even if you repeatedly
>> page through OFFSET and LIMIT.
>
> Hmm, do-able.
> [52] ServiceGraphPattern ::= 'SERVICE' VarOrIRIref GroupGraphPattern SolutionModifier
>
> I guess the OrderClause in the SolutionModifier could be practical for
> streaming query engines (working with the beginning of the result set
> return from SERVICE while those results are still coming back from the
> SERVICE service.
>
> I'll try to implement it in the next few days to see if anything
> surprises me.

Let me know how this goes. It would be nice to have a way to send more
than just pattern information to the SPARQL compiler if you know it.
If it turns out that it is better to let the compiler determine the
ordering variables and the paging information then it is not a big
loss.

Cheers,

Peter
Received on Thursday, 8 July 2010 01:22:45 UTC