RE: RDF Data Access Working Group : first working draft of SPARQL

Graham,

Thank you for the comments:  some of the things you raised are longer
term discussions in the working group.  

-------- Original Message --------
> From: Graham Klyne <>
> Date: 13 October 2004 16:42
> 
> At 15:08 13/10/04 +0100, Seaborne, Andy wrote:
> 
> 
> > The RDF Data Access Working Group is happy to announce the first
> > working draft of the query language part of its work:
> > 
> >    SPARQL RDF query language
> >    http://www.w3.org/TR/rdf-sparql-query/
> > 
> > The Working Group is soliciting feedback on this early draft ...
> 
> On first glance, it's looking good to me.  Here are some random
> thoughts: 
> 
> ...
> 
> 1. Is the SELECT clause really useful?  My implementations return all
> variable bindings from the query, and I simply ignore those I don't
> want. 

When results are encoded to be sent over the network, reducing the
number
of variables in each query solution can reduce the number of bytes
needed
to be sent.

Presentation of query results may also be informed by the SELECT clause,

such as choosing columns in am HTML table, removing variables which will
be bNodes (e.g. FOAF data) and other variables introduced solely for
path creation in the query itself.

> 
> ...
> 
> 2. In section 2.2: "Not every binding needs to exist in every row of
the
> table.".  I think this is an important feature whose presence should
be
> very clear.  Currently, it seems a bit buried.

I have extended this text and mentioned that optionals can cause
variables to be unset in solutions.  This is an editoral dilemma - not
mentioned everything at once to keep the text clear, against making it
complete.

> 
> ...
> 
> 3.  I think the terminology around "Definition: Triple Pattern
> Matching" is 
> a bit muddled.  Is a "binding" a substitution for a *single* variable,
> or a 
> tuple of variables?  (I think you mean the former)  I think it's
> important 
> to be very clear about this, and have clear terms corresponding to:
>    (a) a single name->value binding (a "cell")
>    (b) a tuple of name->value bindings, with no name repeated (a
"row")
>    (c) a set of tuples of name->value bindings, (with no tuple
repeated
> under permutation?) (a "table")
> 
> These are distinctions I've found to be important to keep clear in my
> implementation work.

Agreed - it's muddled.  The terminology here is important and we are in
the process of revisiting.  The next working draft will have this
cleared up.

> 
> ...
> 
> 4. In section 2.2: "If the same variable name is used more than once
in
> a 
> pattern then, within each solution to the query, the variable has the
> same 
> value."  This, too, I think is important to keep clearly stated.
> 
> ...
> 
> 5. I note that variables are allowed in predicate position.  If this
> doesn't present any problems, I'm all in favout of this, but I think
the
> design decision could be highlighted more clearly.

I'm not aware of any issues arising.  That feature is available in
several existing query languages.

> 
> ...
> 
> 6. Can the resulting variable bindings contain repeated
> binding-tuples;  e.g. in response to a query like:
>     SELECT ?a ?c
>     WHERE  ( ?a ?b ?c )
> against the graph:
>     :s1 :p1 :o1 .
>     :s1 :p2 :o1 .
> Later, you mention that a query result is a set, so I guess that means
> no 
> duplicates, but I haven't yet seen this stated more explicitly.
> 
> Later, you introduce SELECT DISTINCT, so I guess that means a simple
> query 
> result can have duplicate binding-tuples.  So it's not a set.

This is an area of debate within the working group and is not yet fully
resolved.

> 
> ...
> 
> 7. Section 4
> 
> I note you've chosen to allow optional elements of graph patterns, but
> not 
> alternatives.  In one of my implementations I provided alternative
> blocks, 
> where the last alternative could be empty, hence also providing
optional
> patterns.  Alternatives are permitted to bind the same variable, thus
> providing ways to match different (graph-syntactical) expressions of
the
> same information.  I have sometimes found this to be useful, but it
does
> somewhat mess up the clean semantics of the approach you have adopted.
> 
> Despite the semantic messiness, I do feel that having some capability
to
> select one possible match over another, when dealing with possibly
messy
> real-world data, could be useful enough to justify the consequent
> complication of query optimization when such a feature is used.



> 
> ...
> 
> 8. Section 8
> 
> The current position seems about right to me.  Complicating the basic
> query 
> mechanism to handle "accessing direct subclass relationship" seems
> undesirable and unnecessarily:  presenting a graph with (notional)
> explicit 
> types (etc.) where implied by subclass relationships seems to me to be
> sufficient.
> 
> ...
> 
> Section 9.
> 
> Constraining the source of a pattern seems to be only a (small) part
of
> the provenance story.  Is it not also desirable to query the source.
> 
> Oops!  I now see that <source> can be a variable.  OK, that's neat,
and
> works cleanly at the natural unit of provenance, viz the statement.

Our unit of provenance could also be viewed as a collection of triples
(a subgraph of the overall graph) because graphs are the unit of
exchange. 

> 
> Is it fair to assume that support for SOURCE may be optional?  Ah yes,
> if 
> unsupported, bind source variables to NULL.  If a statement occurs in
> more 
> than one source with a source variable pattern, does that result in
> multiple variable-binding-tuples?  (I think it should.)
> 
> e.g. the pattern:
>    SOURCE ?ppd ( ?whom foaf:age ?age )
> might return
>    :source1 :Jenny foaf:age "10"
>    :source2 :Jenny foaf:age "10"
>    :source3 :Jenny foaf:age "11"
> etc.
> 
> ...
> 
> Section 11
> 
> I think this might better be titled "result forms".

Good suggestion. Done. v1.122

> 
> Is it intended that every SPARQL must support every result form?  I
> think 
> that could add unnecessary implementation complexity.  I think there
> should 
> be one form supported by all implementations, and SELECT seems a
> reasonable 
> choice.  I don't really see a compelling case for requiring the the
> others 
> to be universally available.
> 
> I think the ASK result form is also reasonable.
> 
> Thought:  if a query pattern has no variables, is there a distinction
> for 
> SELECT * result when the query is matched or not matched.  I think
there
> should be:
> 
>      {}    query not matched.
>      {<>}  query matched, empty variable binding tuple.
> 

Those answers would be the right ones.

> ...
> 
> Section 11.3
> 
> I'm uneasy about the DESCRIBE feature.  It seems to be going rather
> beyond 
> the basic idea of RDF graph query, and doen's seem to have well or
> clearly 
> defined semantics.
> 
> I think the effort here might be better applied to query language
> extensions that permit some kind of recursively-defined pattern, so
that
> various kinds of sub-graph neighbourhoods can be described according
to
> an 
> applications requirements.  A simple use-case would be to describe the
> entire content of an rdf:collection from just its head element.

The DESCRIBE form means that the client does not set the shape for
the query result graph - it may not know and will analysis the graph
returned.

An example in the doc should help as would test cases and a fuller text.

We have to address whether there needs to be any support for returning
collections and containers even in SELECT.

> 
> ...
> 
> Section 12.
> 
> Testing values.  Is there a way to combine tests with non-struct
> evaluation, so that something like:
> 
>     AND isBound ?x AND ?x < 20
> 
> can be reliably processed?

This is covered in newer drafts in sec 12.  If ?x is unbound, isBound is
false so the result is false.  The evaluation of ?x < 20 if ?x is a
bNode leads to an error and hence the solution is rejected.

> 
> ...
> 
> Section 12, "Are tests syntax for RDF predicates or separate
concepts?"
> 
> This makes me uneasy.  I feel that there may be tests that are not
> easily 
> or naturally presented as RDF syntax.  Probably with enough contorion
it
> can be managed, but is it helpful?  How does a test like "isBound ?x"
> play 
> here?
> 
> Part of my viewpoint here is that there should be, as far as possible,
a
> clear separation between structure within RDF literal values and
> structure 
> that is expressed within the RDF graph.

XML schema structured datatypes will probably be accessible only via 
extensions.  They are not in the basic set of functions and operators.

>  (For this reason, I'm not
> enthusiastic about using XML schema structured datatypes as RDF
> literals, 
> when the structure over the component values could be quite naturally
> expressed using RDF statements.  This leads me to think that the query
> language tests here should really be trying to capture those things
that
> aren't comfortable captured as RDF properties.)
> 
> ...
> 
> That's it, for now.

Thanks for the comments.

	Andy

> 
> #g
> 
> 
> 
> ------------
> Graham Klyne
> For email:
> http://www.ninebynine.org/#Contact

Received on Monday, 25 October 2004 12:46:10 UTC