General comments on RDF API (W3C Editor's Draft 11 May 2011)

We are working on a Linked Open Data initiative within Elsevier and have been exploring options for querying our Linked Data store.  One of our current implementations has been to initially bypass the creation of a triple-store (and the use of SPARQL) but instead create a simple store of statements (subject, property, and object) with additional metadata associated with these statements.   These statements are then grouped into graphs.  Logically, a graph would be associated with a journal article and this article would contain many statements.  This store would be used for low expressivity queries.   A second/future implementation will be leveraging an underlying triple-store and SPARQL for higher expressivity queries.

We came across the RDF API W3C Editor's Draft and considered this as a possible option for querying our simple Linked Data store (in particular, section 3 Data Environment).  While we realize the intent of this specification was for the Browser (and a RDFa marked up page), we were curious if consideration had been given to make this a more generic query interface that could perhaps be used in a middle-tier service to query a Linked Data Store as well as the browser.  As stated previously, our focus is on querying a very large linked data store as opposed to some sort of in-memory data structure contained in a browser.  We thought it would be helpful to have one general API that could perhaps be used for both scenarios.

With that in mind, we have the following comments.

For getProperties, getSubjects, and getValues:

  *   It would be nice to have the ability to specify additional metadata fields that could be used to filter the query.
  *   It would be nice to have the ability to know how many results will be returned for the query.
  *   It would be nice to specify an 'offset' and 'number of results to return' for the query.

For getProjection, getProjections:

  *   It would be nice to have the ability to specify additional metadata fields that could be used to filter the query.
  *   It would be nice to have the ability to know how many results will be returned for the query.
  *   It would be nice to specify an 'offset' and 'number of results to return' for the query.
  *   It would be nice to specify an 'order by' clause.
  *   It would be nice to specify a 'group by' clause.  For example, group by statement or graph.

For query:

  *   It would be nice to have the ability to know how many results will be returned for the query.
  *   It would be nice if certain 'key's like 'orderBy', 'count', 'groupBy', startIndex', 'resultsToReturn' could have special meaning.  These special key values map to enhancements made in the earlier statements.

We have implemented the above APIs (with the suggested enhancements) in our Linked Data store (containing 439,000 graphs and 18,742,175 statements).  If you have any questions or need further clarification, please feel free to drop me a note.

Thanks.

Darin.

Received on Monday, 25 July 2011 17:49:31 UTC