Re: RDF Data Access Working Group : first working draft of SPARQL from Graham Klyne on 2004-10-13 (public-rdf-dawg-comments@w3.org from October 2004)

From: Graham Klyne <GK@ninebynine.org>
Date: Wed, 13 Oct 2004 16:41:47 +0100
To: public-rdf-dawg-comments@w3.org
Message-Id: <5.1.0.14.2.20041013154106.020fdf30@127.0.0.1>
At 15:08 13/10/04 +0100, Seaborne, Andy wrote:


>The RDF Data Access Working Group is happy to announce the first working
>draft of the query language part of its work:
>
>    SPARQL RDF query language
>    http://www.w3.org/TR/rdf-sparql-query/
>
>The Working Group is soliciting feedback on this early draft ...

On first glance, it's looking good to me.  Here are some random thoughts:

...

1. Is the SELECT clause really useful?  My implementations return all 
variable bindings from the query, and I simply ignore those I don't want.

...

2. In section 2.2: "Not every binding needs to exist in every row of the 
table.".  I think this is an important feature whose presence should be 
very clear.  Currently, it seems a bit buried.

...

3.  I think the terminology around "Definition: Triple Pattern Matching" is 
a bit muddled.  Is a "binding" a substitution for a *single* variable, or a 
tuple of variables?  (I think you mean the former)  I think it's important 
to be very clear about this, and have clear terms corresponding to:
   (a) a single name->value binding (a "cell")
   (b) a tuple of name->value bindings, with no name repeated (a "row")
   (c) a set of tuples of name->value bindings, (with no tuple repeated 
under permutation?) (a "table")

These are distinctions I've found to be important to keep clear in my 
implementation work.

...

4. In section 2.2: "If the same variable name is used more than once in a 
pattern then, within each solution to the query, the variable has the same 
value."  This, too, I think is important to keep clearly stated.

...

5. I note that variables are allowed in predicate position.  If this 
doesn't present any problems, I'm all in favout of this, but I think the 
design decision could be highlighted more clearly.

...

6. Can the resulting variable bindings contain repeated 
binding-tuples;  e.g. in response to a query like:
    SELECT ?a ?c
    WHERE  ( ?a ?b ?c )
against the graph:
    :s1 :p1 :o1 .
    :s1 :p2 :o1 .
Later, you mention that a query result is a set, so I guess that means no 
duplicates, but I haven't yet seen this stated more explicitly.

Later, you introduce SELECT DISTINCT, so I guess that means a simple query 
result can have duplicate binding-tuples.  So it's not a set.

...

7. Section 4

I note you've chosen to allow optional elements of graph patterns, but not 
alternatives.  In one of my implementations I provided alternative blocks, 
where the last alternative could be empty, hence also providing optional 
patterns.  Alternatives are permitted to bind the same variable, thus 
providing ways to match different (graph-syntactical) expressions of the 
same information.  I have sometimes found this to be useful, but it does 
somewhat mess up the clean semantics of the approach you have adopted.

Despite the semantic messiness, I do feel that having some capability to 
select one possible match over another, when dealing with possibly messy 
real-world data, could be useful enough to justify the consequent 
complication of query optimization when such a feature is used.

...

8. Section 8

The current position seems about right to me.  Complicating the basic query 
mechanism to handle "accessing direct subclass relationship" seems 
undesirable and unnecessarily:  presenting a graph with (notional) explicit 
types (etc.) where implied by subclass relationships seems to me to be 
sufficient.

...

Section 9.

Constraining the source of a pattern seems to be only a (small) part of the 
provenance story.  Is it not also desirable to query the source.

Oops!  I now see that <source> can be a variable.  OK, that's neat, and 
works cleanly at the natural unit of provenance, viz the statement.

Is it fair to assume that support for SOURCE may be optional?  Ah yes, if 
unsupported, bind source variables to NULL.  If a statement occurs in more 
than one source with a source variable pattern, does that result in 
multiple variable-binding-tuples?  (I think it should.)

e.g. the pattern:
   SOURCE ?ppd ( ?whom foaf:age ?age )
might return
   :source1 :Jenny foaf:age "10"
   :source2 :Jenny foaf:age "10"
   :source3 :Jenny foaf:age "11"
etc.

...

Section 11

I think this might better be titled "result forms".

Is it intended that every SPARQL must support every result form?  I think 
that could add unnecessary implementation complexity.  I think there should 
be one form supported by all implementations, and SELECT seems a reasonable 
choice.  I don't really see a compelling case for requiring the the others 
to be universally available.

I think the ASK result form is also reasonable.

Thought:  if a query pattern has no variables, is there a distinction for 
SELECT * result when the query is matched or not matched.  I think there 
should be:

     {}    query not matched.
     {<>}  query matched, empty variable binding tuple.

...

Section 11.3

I'm uneasy about the DESCRIBE feature.  It seems to be going rather beyond 
the basic idea of RDF graph query, and doen's seem to have well or clearly 
defined semantics.

I think the effort here might be better applied to query language 
extensions that permit some kind of recursively-defined pattern, so that 
various kinds of sub-graph neighbourhoods can be described according to an 
applications requirements.  A simple use-case would be to describe the 
entire content of an rdf:collection from just its head element.

...

Section 12.

Testing values.  Is there a way to combine tests with non-struct 
evaluation, so that something like:

    AND isBound ?x AND ?x < 20

can be reliably processed?

...

Section 12, "Are tests syntax for RDF predicates or separate concepts?"

This makes me uneasy.  I feel that there may be tests that are not easily 
or naturally presented as RDF syntax.  Probably with enough contorion it 
can be managed, but is it helpful?  How does a test like "isBound ?x" play 
here?

Part of my viewpoint here is that there should be, as far as possible, a 
clear separation between structure within RDF literal values and structure 
that is expressed within the RDF graph.  (For this reason, I'm not 
enthusiastic about using XML schema structured datatypes as RDF literals, 
when the structure over the component values could be quite naturally 
expressed using RDF statements.  This leads me to think that the query 
language tests here should really be trying to capture those things that 
aren't comfortable captured as RDF properties.)

...

That's it, for now.

#g



------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Wednesday, 13 October 2004 16:03:01 UTC