Comments on SPARQL draft from Geoff Chappell on 2004-10-17 (public-rdf-dawg-comments@w3.org from October 2004)

From: Geoff Chappell <geoff@sover.net>
Date: Sun, 17 Oct 2004 16:51:00 -0400
To: <public-rdf-dawg-comments@w3.org>
Message-ID: <006201c4b48b$02fd1210$d8b272d8@gsclaptop>
Here are a few comments on the SPARQL draft - hope they're helpful. 

- Aggregate Graphs
I like that the query language allows for multiple sources and that the
query is effectively a query against the union of the sources rather than a
union of the results of the query run against each source. I assume that the
definition of an aggregate graph as "...the RDF-merge of a number of
subgraphs" doesn't imply anything about rdf lean-ness of the resulting
merged graph - is that correct?

- Graph Patterns, Constraining Values
It's not clear to me why the triple patterns and the value constraints are
segregated. They just seem like different flavors of logical factors that
must be true in order for the query to be true. Is this distinction just an
historical artifact? It seems to me that this will only make things more
difficult when negation and disjunction are added (whether or not that
happens this version, it seems inevitable that they will be).

- Missing Value Assignment
Why no ability to do value assignment? I use this feature regularly when
writing RDF queries (in RDF Gateway's query language). When useful functions
are added to sparql, I think the lack of this feature will be even more
bothersome.  For example, wouldn't you want to able to do something like
this?:
	SELECT ?domain 
	WHERE  ( ?x rss:link  ?url ) and ?domain=regexp(?url, ....)

- Negation (Unsaid)
I think it would be a mistake not to include some form of negation
(especially since you're already paying the complexity price of OPTIONAL -
arguably a back-door form of negation). I'll make a suggestion in this
regard - we have a switch/case construct in RDF Gateway's query language
that serves somewhat the same purpose as OPTIONAL, plus under the hood
provides a mechanism for negation. It works like this:

Select ?x ?title where {[rdf:type] ?x [rdfs:Class]}
	and switch (?x)
	(
		case {[rdfs:label] ?x ?l}:
			?title=?l
		case {[rdfs:comment] ?x ?c}:
			?title=?c
		default:
			?title=''
	);

throw in the functions succeed() and fail() and you can do negation - a la:

select ?x where {[rdf:type] ?x [rdfs:Class]}
	and switch(?x)(
		case {[rdfs:subClassOf] ?x ?a}:
			fail()
		default:
			succeed()
	);

is the same as:
	 {[rdf:type] ?c [rdfs:Class]} and not {[rdfs:subClassOf] ?c ?a} 

I suggest looking to see if the OPTIONAL construct could be expanded in a
similar manner - so you could support exclusive alternatives as well as
negation.

- Disjunction
You can usually work around the absence of disjunction in the query
language, but it puts more of a burden on the query author/programmer. Why
pay that price thousands/millions/? of times down the road just so a small
number of sparql implementers can avoid a little work now. I suggest that if
you don't manage to include it in the first sparql version, you at least
give some thought to how it would be included in a later version to avoid
creating an OR-unfriendly syntax.

- Distinct
I think it would be a mistake for the query language to take a position on
whether or not query result sets could contain duplicate rows (or if it did
take a position, I'd want it to be that they couldn't!) From a selfish
perspective, I worry that we'll have to de-tune RDF Gateway's query
evaluation in order to allow duplicate rows to exist in a resultset (after
all if a user wants duplicate rows, they can merely select out the
variable(s) that make those rows distinguishable). Perhaps the issue of
duplicate rows could be implementation specific?

- Typing
I may be jumping the gun here since there's not yet much specified about
typing and value comparisons, but please keep query performance against
large triple stores in mind when specifying the behavior of comparison
operators such as > and <. If those operators are too type lenient (e.g. if
they're allowed to operate on plain literals), it makes it very difficult to
do an efficient indexed query.

- Query Syntax
I imagine you're past this decision point, but thought I'd add my two cents
anyway. Please consider using something other than <> to delimit URIs - it's
painful having to always encode these chars in html and xml. We use [] in
RDF Gateway for exactly this reason. On a similar note, why use parens
around triples? seems like it just confuses things when you also use parens
for grouping. Again, we use {} for this reason in Gateway's query language.


Please let me know if anything here is unclear or if you'd like me to go
into more detail on anything.

Thanks,

Geoff Chappell
Received on Sunday, 17 October 2004 20:51:22 UTC