- From: Geoff Chappell <geoff@sover.net>
- Date: Sun, 17 Oct 2004 16:51:00 -0400
- To: <public-rdf-dawg-comments@w3.org>
Here are a few comments on the SPARQL draft - hope they're helpful. - Aggregate Graphs I like that the query language allows for multiple sources and that the query is effectively a query against the union of the sources rather than a union of the results of the query run against each source. I assume that the definition of an aggregate graph as "...the RDF-merge of a number of subgraphs" doesn't imply anything about rdf lean-ness of the resulting merged graph - is that correct? - Graph Patterns, Constraining Values It's not clear to me why the triple patterns and the value constraints are segregated. They just seem like different flavors of logical factors that must be true in order for the query to be true. Is this distinction just an historical artifact? It seems to me that this will only make things more difficult when negation and disjunction are added (whether or not that happens this version, it seems inevitable that they will be). - Missing Value Assignment Why no ability to do value assignment? I use this feature regularly when writing RDF queries (in RDF Gateway's query language). When useful functions are added to sparql, I think the lack of this feature will be even more bothersome. For example, wouldn't you want to able to do something like this?: SELECT ?domain WHERE ( ?x rss:link ?url ) and ?domain=regexp(?url, ....) - Negation (Unsaid) I think it would be a mistake not to include some form of negation (especially since you're already paying the complexity price of OPTIONAL - arguably a back-door form of negation). I'll make a suggestion in this regard - we have a switch/case construct in RDF Gateway's query language that serves somewhat the same purpose as OPTIONAL, plus under the hood provides a mechanism for negation. It works like this: Select ?x ?title where {[rdf:type] ?x [rdfs:Class]} and switch (?x) ( case {[rdfs:label] ?x ?l}: ?title=?l case {[rdfs:comment] ?x ?c}: ?title=?c default: ?title='' ); throw in the functions succeed() and fail() and you can do negation - a la: select ?x where {[rdf:type] ?x [rdfs:Class]} and switch(?x)( case {[rdfs:subClassOf] ?x ?a}: fail() default: succeed() ); is the same as: {[rdf:type] ?c [rdfs:Class]} and not {[rdfs:subClassOf] ?c ?a} I suggest looking to see if the OPTIONAL construct could be expanded in a similar manner - so you could support exclusive alternatives as well as negation. - Disjunction You can usually work around the absence of disjunction in the query language, but it puts more of a burden on the query author/programmer. Why pay that price thousands/millions/? of times down the road just so a small number of sparql implementers can avoid a little work now. I suggest that if you don't manage to include it in the first sparql version, you at least give some thought to how it would be included in a later version to avoid creating an OR-unfriendly syntax. - Distinct I think it would be a mistake for the query language to take a position on whether or not query result sets could contain duplicate rows (or if it did take a position, I'd want it to be that they couldn't!) From a selfish perspective, I worry that we'll have to de-tune RDF Gateway's query evaluation in order to allow duplicate rows to exist in a resultset (after all if a user wants duplicate rows, they can merely select out the variable(s) that make those rows distinguishable). Perhaps the issue of duplicate rows could be implementation specific? - Typing I may be jumping the gun here since there's not yet much specified about typing and value comparisons, but please keep query performance against large triple stores in mind when specifying the behavior of comparison operators such as > and <. If those operators are too type lenient (e.g. if they're allowed to operate on plain literals), it makes it very difficult to do an efficient indexed query. - Query Syntax I imagine you're past this decision point, but thought I'd add my two cents anyway. Please consider using something other than <> to delimit URIs - it's painful having to always encode these chars in html and xml. We use [] in RDF Gateway for exactly this reason. On a similar note, why use parens around triples? seems like it just confuses things when you also use parens for grouping. Again, we use {} for this reason in Gateway's query language. Please let me know if anything here is unclear or if you'd like me to go into more detail on anything. Thanks, Geoff Chappell
Received on Sunday, 17 October 2004 20:51:22 UTC