Lee's feature proposal

Hi everyone,

I've posted my proposal for what features the Working Group should work 
on on the wiki at: 
http://www.w3.org/2009/sparql/wiki/User:Lee_Feigenbaum#Lee.27s_feature_proposal

I've copied it in at the end of this note; it contains my reasoning 
behind my suggestions.

Regarding Rec. track vs. WG Notes. I do _not_ think that we should 
distinguish between these when choosing what features we're going to 
work on at this point, and here's why. (I'm not a process expert, this 
is purely my understanding.)

1) A document can be developed quite far before the group needs to 
decide whether it shoudl be Rec. track or not.

2) WG Notes are first published as First Public Working Drafts, at which 
point they carry the same IP requirements and exclusion opportunities as 
any document intended for Rec. track - in light of our need to 
re-charter with a specific list of deliverables, then, we should include 
in our decision any material we plan to work on, whether it ends up a 
Note or a Rec. track document.

3) I don't think we should view Notes as things to be churned out 
quickly and without much consideration or review. Rather, I think Notes 
are best used for material that may not be core to the language or 
protocol, that may represent a best or common practice (as in the JSON 
results format), or that is a common but difficult-to-implement 
extension such that the group feels that a Note would document 
interoperable semantics without requiring multiple implementations to 
move through the Rec. track.

Please note that I wrote some of this proposal before some of the more 
recent survey responses. I've taken a look at those responses, which 
don't change the meat of my proposal (I made one small change to a 
prioritization), but note that the exact numbers for some of my 
reasoning (when I refer to the survey results) may no longer be fully 
accurate.


Lee

Lee's feature proposal

The following is a proposal for the features that the SPARQL WG should 
adopt. It is an attempt to reach consensus by balancing previously 
stated goals including

     * group preference
     * group energy
     * implementation experience
     * utility to developers
     * utility to end users
     * extensibility
     * conservatism

[edit] Constraints

On April 28 the group resolved to accept aggregates, subqueries, and 
update as deliverables.
[edit] Proposal
[edit] Required Features

     * Aggregate functions
     * Subqueries
     * Update
     * Project expressions
     * Service description

[edit] Time-permitting Features

(Roughly in this order.)

     * SPARQL/OWL
     * Property paths
     * Function library
     * Basic federated query
     * Surface syntax

[edit] Commentary

This proposal has 5 mandatory features and 5 time/energy-permitting 
features. This is more than I think is desireable, but I have a hard 
time making the proposal narrower.

The required features consist of the three features identified early on 
as having the highest level of consensus.

I've also included as required project expressions, the ability to 
include arbitrary expressions in a SELECT clause. The aggregate feature 
already requires the group to find a way to include values not 
explicitly mentioned in the RDF dataset in a query's results (i.e. the 
computed value of aggregate functions), and it seems confusing and 
unnecessarily limiting to not also allow the same or a similar 
(syntactic) mechanism to be allowed to introduce new scalar values into 
query result sets. In addition, project expressions in conjunction with 
the othe required features enables the same capabilities as various 
other proposed features, including assignment and scalar expressions in 
construct. Project expressions receives significant but not overwhelming 
WG support in our survey, with five organizations ranking it amongst 
their top four features, and no organizations explicitly objecting to 
it. Project expressions is widely implemented in existing SPARQL engines.

Finally, I suggest that service description be a required deliverable of 
the Working Group. While there are various design pieces to draw on, 
service description carries the challenge of the Working Group doing a 
fair bit of design work. However, I believe that this sort of 
leading-edge-of-the-curve design work is appropriate for the SPARQL WG 
in the case of a feature such as service description that is an 
extensibility point and an enabler for future standardization efforts. 
Service description provides a standard way for extended SPARQL 
implementations to advertise their capabilities, and in doing so 
encourages similar implementations to coalesce around common syntax and 
semantics of extensions. It can be used to advertise entailment regimes, 
extended surface syntax, data set information (including optimization 
hints for federation), supported functions, and much more. Service 
description received moderate WG support in the survey (5 organizations 
including it in their top 10), and no organizations explicitly objected 
to it. With Condorcet, service description is preferred to everything 
except the top 3 features and negation. (See below for more on negation.)

I've included five time-permitting features in this proposal, ranked 
roughly in the order in which I believe the group should pursue them. I 
acknowledge at the same time that some of these efforts can reasonably 
go on in parallel with either other time-permitting features or in 
parallel with development of required features.

I believe that SPARQL/OWL is an important deliverable for this WG. The 
SPARQL community sees somewhat of a divide between those using SPARQL 
purely to query RDF graphs, and those using SPARQL in conjunction with 
richer semantics. The original SPARQL effort acknowledged this by 
providing a mechanism to define extensions that would define basic graph 
pattern matching for entailment regimes other than simple entailment. 
This extension mechanism is key to enabling groups other than the SPARQL 
working group (whether formal or informal groups) to define how SPARQL 
queries behave in the presence of other semantic regimes. But the 
extension mechanism has never been formally tested, and it seems to be 
prudent to test it (a) under the auspices of the SPARQL WG, so that the 
results may feed back into the SPARQL BGP extension specification itself 
and (b) in the context of OWL semantics, probably the most popular 
richer entailment regime that currently exists. There are numerous 
implementations that implement SPARQL/OWL already, though likely not in 
an interoperable fashion. And in the personage of Bijan Parsia, the 
SPARQL WG has the expertise and energy necessary to properly specify the 
SPARQL/OWL basic graph pattern matching extension. SPARQL/OWL received 
minimal support in the survey, but seemed to have a somewhat warmer 
reception in the discussion on the April 28 teleconference.

I believe that property paths is an important deliverable for the WG as 
it enables variable-length path queries for SPARQL developers. It has 
significant support within the WG, and it also enables most cases of the 
accessing RDF lists proposed feature.

I believe that Surface syntax and Function library represent reasonable 
maintenance tasks for the WG to examine, time-permitting. Accepting 
surface syntax as a time-permitting feature gives the WG an opportunity 
to examine capabilities of the SPARQL language that are particularly 
onerous to use and to consider specialized syntax for these features. 
Accepting function library allows the WG to consider extending the core 
set of functions available when moving between SPARQL implementations to 
include things like basic string or mathematic operations.

Finally, I believe the WG should deliver a specification for basic 
federated query, time-permitting. Federated query is implemented in a 
variety of forms in several implementations, and the feature received 
significant support in the survey (6 organizations including it amongst 
their top six choices). I believe that looking at a design for basic 
federated query is important for the growing Linked Data community, and 
the time is ripe to standardize on basic federated query as a way to 
encourage implementations to explore more and more sophisticated 
approaches to federated query.

This proposal leaves out many good features, and I'd be remiss not to 
address several specific ones.

     * Negation. The survey indicated strong support for providing a 
simpler form of asking negative queries than the current OPTIONAL/!bound 
construct. I've excluded this from my proposal under the hope that the 
design for subqueries may obviate the need for this feature.
     * Full text. The survey indicated strong support for standardizing 
the syntax and semantics for full text search in SPARQL. While I believe 
that this is one of the top interoperability stumbling blocks for 
SPARQL, the wide-open design space (both for syntax and semantics) of 
the problem worries me.
     * Parameterized inference. The survey indicated support from a 
small number of organizations for parameterized inference. The 
discussion during the April 28 teleconference made clear to me that some 
members of the WG see a need both to define what it means to query other 
entailment regimes (a la SPARQL/OWL) and also how to go about doing that 
on a query-by-query basis. The latter is what parameterized inference is 
about. I have omitted parameterized inference from my proposal because 
of the lack of existing implementations/designs to draw on, coupled with 
the fact that service descriptions provide an out-of-band way for 
endpoints to indicate the entailment regime or rulesets that they 
service. I recognize that this does not fully address the use case of 
on-demand rulesets, but I believe that this would be better served via a 
SPARQL protocol feature, and I do not see any mature designs yet in this 
space to draw upon. I believe that (1) standardizing on the semantics of 
SPARQL/OWL and (2) the increasing maturity and deployment of RIF, will 
encourage SPARQL implementations to begin to explore this space more and 
make this an appropriate feature for a future round of standardization.

Received on Friday, 1 May 2009 04:28:36 UTC