- From: Andrew Newman <andrew@tucanatech.com>
- Date: Wed, 20 Oct 2004 09:36:21 +1000
- To: public-rdf-dawg-comments@w3.org
I wish to offer feedback in relation to SPARQL. It seems to me that there are many frequently occurring use cases not covered by the current standard. One piece of missing functionality seems to be counting and sorting results. Things like "How many items of a certain type are in a graph?" or "Give me the 10 highest priority items". The lack of these operations seems to be an extremely negative aspect of the standard and one that I believe will hamper wide user acceptance. One of the expectations might be for users to implement their own sorting and counting once they receive results back from a query. The most obvious problem with this is the expense of post processing results. The performance of RDF stores in comparison to others is generally considered poor, these kinds of operations will make it much worse. I also think it's unlikely the user will implement their own post processing more likely they will choose something that does offer this functionality. So to fill this lack of utility in the query language it will require implementors to support their own syntax and semantics of counting and sorting. This seems to then negate one of the benefits of standardization. If certain implementations feel that it's infeasible to support counting and sorting then maybe it should be an optional feature of SPARQL. So implementors can either offer a correct solution or leave it unimplemented. The other issue with the SPARQL is the lack of an implicit distinct. In my understand of SQL, DISTINCT is optional because if your queries work on normalized data and joins are based on distinct keys then the returned results cannot be duplicated. If your query works on rows with repeated values on the same column then you apply DISTINCT. In RDF's data model there isn't really this problem of duplicated data and normalization. SPARQL has the idea of matching statements in the graph. From my understanding, RDF's data model doesn't support the idea of multiple subject, predicates and/or objects with the same values. In other words, it only seems valid that if a query matches one result in the graph it should return that one unique result not repeated multiple results. While I can see many use cases for distinct vs non-distinct results I am not aware of a reason to return non-distinct results over distinct results. Have I missed something? I work with a member of the DAWG and follow the mailing list archives from time to time. I have asked him about why these features are not in the standard without getting an answer that I would consider appropriate. I know that a user centric view is applied to the development of this standard. However, with the above functionality in mind, it seems to me it has been avoided because it's difficult to implement rather than functionality that user's require.
Received on Tuesday, 19 October 2004 23:37:01 UTC