Views on the outcomes of F2F from Andy Seaborne on 2009-11-10 (public-rdf-dawg@w3.org from October to December 2009)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Tue, 10 Nov 2009 17:20:04 +0000
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <4AF9A0C4.9090205@talis.com>
Here is my initial take on what appears to have been a successful 
face-to-face meeting.  A lot has been moved forward.

Lee asked for specific issues to be raised one per email thread so 
please change the subject if you reply to anything specifically.

This message is my pass over all the points.  I'll pull out specific 
issues if needed later.

I reserve the right to change my mind :-)

 Andy

 From notes
http://www.w3.org/2009/sparql/wiki/F2F2_Issue_Discussions
and resolutions
http://www.w3.org/2009/sparql/meeting/2009-11-03

Day 1:

 > ISSUE-11: Implicit grouping
 >
 > Consensus on prohibiting projecting variables/functions on
 > variables that are not included in the group by clause.

Agreed.

 > **  ISSUE-12: HAVING vs. FILTER as keyword for limiting
 > aggregate results
 >
 > General consensus in favor of using "FILTER" as the keyword,
 > with bglimm preferring "HAVING".

I prefer HAVING because familiarity with SQL.

Having both is acceptable.

 > **  DISTINCT in aggregate functions
 >
 > Consensus on allowing DISTINCT with multiple arguments to aggregate
 > functions. DISTINCT in this case passes just the DISTINCT tuples
 > into the aggregate function (for each group).

I'm unclear why it should be allowed in SUM or AVG.  Is there a use case?

We are already handling * differently by aggregate and DISNTINCT seems 
to only really man anything there.  Are there specific motivating use cases?

Is DISTINCT allowed in custom aggregates ?  If so, they have different 
syntax.

I propose that DISTINCT is not allowed for custom aggregates.  An 
aggregate can choose to do that operation as part of it's definition but 
DISTINCT and not-DISTINCT forms are two different URI to name the aggregate.

 > **  Star (asterisk) in aggregate functions
 >
 > Consensus around only allowing asterisk as an argument to COUNT.

Agreed - applies to anything that talks about rows, which is only COUNT.

a custom aggregate can do this

   agg:count()

if a custom aggregate is passed the solution, not the evaluated 
arguments for each element of a group.  No args means the solution would 
work. The document needs to be clear on this.

I don't have a use case but it would seem strange if you can't implement 
COUNT as a custom aggregate.

 > **  ISSUE-15: Syntax for custom aggregates
 >
 > Mild opinion in favor of having no keyword or special syntax
 > for custom aggregate functions

Neutral, currently.

 > **  ISSUE-16: Mixed datatypes with built-in aggregates
 >
 > Consensus that MIN/MAX should use same semantics as ORDER BY,
 > with parts (e.g. ordering xsd:string and xsd:dateTime) being
 > undefined/implementation defined.

I think this will get confusing with mixed data "1", "9", 1, 2, 3 but 
may be acceptable.  (Multivaluespace handling is still my preferred design.)

If eval failures, are "not in group", casting is OK but the document 
must talk about this.

 > Consensus that SUM/AVG should use same semantics as +

Clarification: errors not in a group means that what would be

1 + error + 2 => 3

which is not the same as +

 > **  What happens with type errors that are projected?
 >
 > Consensus that type errors that are projected should result in that
 > solution being discarded.

Agreed.

 > **  Trapping type errors?
 >
 > Consensus that COALESCE is a good way to trap errors.

Agreed but the choice of word is now obscure at best.

You can do a form of default values with this.

 > **  Do expressions always need to be aliased to a named variable?
 > Mild consensus that aliases should be required.

Disagree mildly.  Prefer to allow engine invent them.  For 
results-on-wire, must be a legal variable.  For API, who cares?

 > **  Syntax for expressions in SELECT list
 >
 > General lack of satisfaction with either:
 >
 >     * Requiring commas if a projection uses at least one expression
 >     * Wrapping expressions and aliases with parentheses (brackets)

Would like to allow optional commas everywhere (SELECT, GROUP BY, ORDER BY).

Prefer (?x +?y AS ?z) because some level of () are necessary for any 
expression in SPARQL to keep it parsable by a wide variety of 
approaches.  So might as well include the AS.  This is now the leading 
approach out there.

 > **  Sub-asks and sub-selects in FILTER
 >
 > General consensus (kasei, axel, steveh, leef) to avoid
 > the complexity of any subqueries in FILTERs.

Agreed - the meaning of patterns (scoping of free variables) would need 
join-like semantics and is complex.  The lack of scalar subSELECTs will 
be a potnetial area for consideration problem but is mitigated by having 
named variables in SPARQL.

You can place the scalar select just be for the FILTER and AS the result 
into a variable.  This is not an equivalence, the query pattern may be 
slight different, but you can get the effect as far as I can determine.

Sub-Ask is not the same as (NOT) EXISTS because EXISTS isn't join-ed 
with other results.

 > **  Sub-constructs in FROM and FROM NAMED
 >
 > General consensus (kasei, steveh, leef) to avoid the complexity of 
sub-constructs in FROM. Axel is in favor, but willing to cede the point.

Agreed.

 > **  ISSUE-13: Subqueries in HAVING
 >
 > Consensus that this can be done as is with subqueries; no need to add 
here. (kasei, axel, steveh, leef)

Mildly agree with mild worries as for subqueries in FILTERs.

 > **  ISSUE-39: Variable scope of alias variables
 >
 > Consensus that variables on the right-hand side of "AS" (alias 
variables) are not in scope for the rest of the query (including 
projected expressions), but not including outer queries of course.

Disagree - this is an unnecessary restriction and results in needing 
addition nesting of SELECTs just to reuse an expression.



Day 2:

 > 1. To close ISSUE-47 by noting consensus on keeping MODIFY in the 
Update language, modulo any concerns expressed by Update editors, no 
objetions or abstentions link

Agree.

 > 4. we'll have one update statement, DELETE ... INSERT ... WHERE ..., 
where one of DELETE or INSERT may be ommitted, and WHERE is optional, 
and multiple of these may be combined in a string using ";" as the 
separator. link

I now prefer DELETE WHERE {}, that is, the pattern becomes the template.

This also means ";" is unnecessary.  If a syntax requires the use of ";" 
to distinguish two different forms, then I would be very worried (it's 
going to be error prone).

Optional ";" is tolerable for convenience but it's used in Turtle with 
an abbreviation meaning.

 > 5. SPARQL Update WHERE clauses will be at least SPARQL 1.0 QUERY, 
with each feature 1.1 adds to SPARQL Query being AT RISK for this. This 
closes ISSUE-27. link

I think I know what you mean but this wording is not OK.

I prefer a framing of "SPARQL 1.1 Update uses SPARQL 1.1 Query; but, if 
feedback is significant, the WG will define a profile using SPARQL 1.0 
Query".  i.e. default to SPARQL 1.1.  Conformance would explicitly note 
1.1 vs 1.0.  Just 1.0 patterns is not fully compliant "SPARQL 1.1 
Update" IMHO.

There are always going to be engines that are incomplete.
Received on Tuesday, 10 November 2009 17:20:22 UTC