FW: Comments on latest SPARQL 1.1 Working Drafts from Rob Vesse on 2010-03-23 (public-rdf-dawg-comments@w3.org from March 2010)

From: Rob Vesse <rav08r@ecs.soton.ac.uk>
Date: Tue, 23 Mar 2010 21:26:15 -0000
To: <public-rdf-dawg-comments@w3.org>
Message-ID: <EMEW3|0c912dc1ff4820bb91367dc5cbdab01em2MLQH06rav08r|ecs.soton.ac.uk|000601caca>
Just to say further to my previous email the comments on paths are purely
personal opinion and should not really have been a formal comment to the
working group so I'm happy for the working group not to respond to that part
of my previous email.

What I'm looking for clarification on is the issue of how the group intends
DISTINCT to apply, do they intend it to apply to distinct terms or distinct
values since the two things are quite different in my mind at least and
would require different implementation approaches?

Thanks,

Rob Vesse

-----Original Message-----
From: public-rdf-dawg-comments-request@w3.org
[mailto:public-rdf-dawg-comments-request@w3.org] On Behalf Of Rob Vesse
Sent: 23 March 2010 10:27
To: 'Gregory Williams'
Cc: public-rdf-dawg-comments@w3.org
Subject: RE: Comments on latest SPARQL 1.1 Working Drafts

Hi Gregory

That answers most of my comments but I just want to pick up on a couple of
points you raised

> The group has decided to allow DISTINCT as a flag to all aggregates, as
per SQL.

I'm happy with this but I'm wondering how exactly this applies in the case
of numeric aggregates, so say I'm doing a SUM over some variable which has
the following values bound to it:

"1"^^xsd:integer
"1"^^xsd:decimal
"1"^^xsd:double

Should the result be 3 since each value is distinct in terms of
term-equality or should the result be 1 since the values are non-distinct in
terms of value-equality?  The latter strikes me as potentially being
computationally more complex to compute

Also I assume that SAMPLE(DISTINCT ?x) is functionally equivalent to
SAMPLE(?x) since the DISTINCT modifier doesn't make obvious sense for SAMPLE

> Providing lengths is not currently planned.  This does weaken the
usefulness of property paths but, as a time 
> permitting feature, the WG is inclined to leave analysis and specification
of including lengths to a later 
> group when more deployed experience is available. The WG believes it has
not designed out the possibility 
> - for example, potential syntax forms have been considered to make sure
the synatx is not  a barrier to a
> future WG.

I previously objected to path lengths as I thought they'd be a pain to
implement but having now implemented paths in my engine I realised I got
path lengths for free so I've added a syntax extension like so:

SELECT * WHERE { ?x foaf:knows+ LENGTH ?distance ?y}

This evaluates the path ?x foaf:knows+ ?y and binds the length of the path
to the variable ?distance.  While it's not the prettiest of syntaxes it
seemed the easiest way to shoehorn the feature in there.  

Thanks again,

Rob Vesse

-----Original Message-----
From: Gregory Williams [mailto:greg@evilfunhouse.com] 
Sent: 22 March 2010 22:33
To: Rob Vesse
Cc: public-rdf-dawg-comments@w3.org
Subject: Re: Comments on latest SPARQL 1.1 Working Drafts

On Feb 9, 2010, at 9:30 AM, Rob Vesse wrote:

> Hi All
> 
> Here's my comments on the latest drafts - specifically the Query and the
> Service Description drafts.

Rob,

Thanks for the comments.


==Aggregates==

There is currently no plan to reduce the set of aggregates, so SAMPLE is
highly likely to be included in the subsequent drafts. It is neccesary as
SPARQL has no implicit sampling behaviour, unlike SQL.

GROUP_CONCAT is named that way so that there can also be a scalar CONCAT
function. GROUP_CONCAT probably should not be conventionally two argument,
as that causes some confusion with the semantics of aggregates. There's a
possibility of a syntax like GROUP_CONCAT(?x SEPARATOR "\t"), the intention
is to add this before the next WD.

==Property Paths==

> I agree with some of the previous comments on the list that some of the
features in property paths seem overly complex, e.g. alternatives.  If you
really need to do alternatives isn't it best just to use UNIONs?

Property path features can be combined into complex paths. Allowing
alternatives in property path makes for more compact expression.

> Returning results of a path expression in an ordered way (with regards to
RDF lists) seems at odds with the general evaluation model of SPARQL which
as I understood it was that the results were an unordered multiset up until
you start applying solution modifiers and only actually becomes ordered if
an OrderBy is applied

Order of results from property paths is not guaranteed.

> Providing lengths of paths would complicate things and I don't think it
should be in the 1.1 spec

Providing lengths is not currently planned.  This does weaken the usefulness
of property paths but, as a time permitting feature, the WG is inclined to
leave analysis and specification of including lengths to a later group when
more deployed experience is available. The WG believes it has not designed
out the possibility - for example, potential syntax forms have been
considered to make sure the synatx is not  a barrier to a future WG.

> Limiting results of path expressions to being distinct seems logical and
would aid implementation since you can potentially build a list of valid
paths as you evaluate the expression and by checking that you haven't
already found a specific path you can do cycle detection very easily (I may
be wrong here I'm just thinking off the top of my head how I might implement
paths)

Thank you for the observation.

== Open Issues ==

> 5: Surely there is nothing you can express in an ASK that you can't with
an EXISTS?

Yes - EXISTS behaves like a nested ASK.

> 14: I think the aggregates defined now are sufficient and people can
provide extensions as per my comments on issue 15

It is proposed that extension aggregates would be described by a URI as
functions are.

> 15: Extension aggregates should be defined by URIs just as with extension
functions and the individual implementations can then generate appropriate
structures depending on whether the URI indicates an aggregate/expression.
For example I've already defined a few in the function library for my engine
[2] e.g.
>  
> PREFIX lfn:<http://www.dotnetrdf.org/leviathan#>
> 
> SELECT ?s lfn:all(IsUri(?o)) AS ?AllObjectsAreUris
> WHERE
> {
>    ?s ?p ?o
> } GROUP BY ?s

This is what the group intends to do. Extension aggregates will also be able
to take the DISTINCT flag.

> 35: I think that with the aggregates currently proposed the only ones
which DISTINCT makes sense for are COUNT and possibly GROUP_CONCAT though
I'd rather have it as only valid for COUNT

The group has decided to allow DISTINCT as a flag to all aggregates, as per
SQL.

> 36: I think this should be rejected at the parsing stage - you shouldn't
be able to project an expression to an existing variable

It can't be enforced by the grammar, but the current text supports this.

> 39: I don't see too much of an issue with this though this may require
some queries to be rewritten such that projection expressions are evaluated
in such an order that the necessary expressions are evaluated prior to their
value being used

The WG has discussed this and does plan to allow a variable to be used later
in a SELECT expression list, with clear rules on scoping.

> 41: GROUP BY expression should be permitted

The current text supports this.

==Service Description==

> Looking at the Service Description draft my main concern is that it allows
you to specify that you support some extension functions but not to say
anything about the arguments of those functions.  For example there's no way
to express that an extension function takes 2 arguments both of which must
be xsd:string and gives back an xsd:string This may be too complicated for
the service description to express easily and I guess you run into issues
when you have functions like fn:concat() which can take variable/unlimited
numbers of arguments.  Is the assumption that a user/their agent will be
able to retrieve the description of that function from somewhere else?

The intention of the Service Description vocabulary is to provide a minimal
set of terms that can allow a simple description of a SPARQL endpoint, its
dataset(s), and supported features. Importantly, we're not trying to provide
a vocabulary with which to describe *all* possible aspects of an endpoint,
including specifics of the supported functions (such as argument and return
types) or dataset descriptions.

Our expectation is that with the infrastructure of service descriptions
shared between endpoints, implementers can start to use/develop vocabularies
for describing services in more detail, with consensus hopefully developing
around specific features. For example, voiD[1] is likely to be a good way to
describe datasets, while SPIN[2] might provide the sort of extension
function descriptions you are talking about.

==Grammar==

> As a general point on the query draft the EBNF in the grammar section is
still the 1.0 EBNF and does not contain the new rules which 1.1 introduces -
though I guess this may be in part due to the rules not being finalised?
Some of the new EBNF is embedded in the course of the text but some of it
seems to have disappeared at the moment.

The WG intends to produce a single grammar for both query and update
languages because they share many grammar rules.  The EBNF in the new
features is included to be helpful as indicative of changes that will be
made to the final SPARQL 1.1 grammar.


We hope this message addressed your comments.  If it does, please could you
can help our comment tracking by replying to this message stating that you
are satisfied with this response.

thanks,
Gregory Williams, Steve Harris, Andy Seaborne
on behalf of the SPARQL working group.

[1] http://rdfs.org/ns/void-guide
[2] http://spinrdf.org/sp.html
Received on Tuesday, 23 March 2010 21:26:49 UTC