Re: General comments on RDF API (W3C Editor's Draft 11 May 2011)

Hi Darin, All,

Some personal notes / questions:

McBeath, Darin W (ELS-STL) wrote:
> We came across the RDF API W3C Editor's Draft and considered this as a possible option for querying our simple Linked Data store (in particular, section 3 Data Environment).  While we realize the intent of this specification was for the Browser (and a RDFa marked up page), we were curious if consideration had been given to make this a more generic query interface that could perhaps be used in a middle-tier service to query a Linked Data Store as well as the browser.  As stated previously, our focus is on querying a very large linked data store as opposed to some sort of in-memory data structure contained in a browser.  We thought it would be helpful to have one general API that could perhaps be used for both scenarios.

Yes :) We have discussed many times in the past, generally with the hope 
that some interfaces can/may be shunted further up the layered 
specifications. One of the simple issues is that the definition of a 
query mechanism in the RDF Interfaces layer may be out of place, since 
other familiar query mechanisms will quite possibly also be defined - 
and a query interface generic enough to cover all styles of querying, is 
just too generic to be worth specifying! Thus, our notion was to try and 
make the interfaces and APIs as modular as possible, such that this 
particular query mechanism could be used as an optional extension, 
implementable in general, non-RDFa-specific, libraries. Still an open 
issue, although I'm unsure if there is actually a specific ISSUE open 
for it!

> With that in mind, we have the following comments.
> 
> For getProperties, getSubjects, and getValues:
> 
>   *   It would be nice to have the ability to specify additional metadata fields that could be used to filter the query.

As in specifying other attributes / values, like @class?

>   *   It would be nice to have the ability to know how many results will be returned for the query.

Agreed, although I'm unsure if there would be any cost benefits in this 
approach, is there any significant disadvantage to doing a simple 
.length or count on the results?

>   *   It would be nice to specify an 'offset' and 'number of results to return' for the query.

Ahh paging, certainly worth considering!

> For getProjection, getProjections:
> 
>   *   It would be nice to have the ability to specify additional metadata fields that could be used to filter the query.
>   *   It would be nice to have the ability to know how many results will be returned for the query.
>   *   It would be nice to specify an 'offset' and 'number of results to return' for the query.

as above.

>   *   It would be nice to specify an 'order by' clause.
>   *   It would be nice to specify a 'group by' clause.  For example, group by statement or graph.

We can/will discuss, however my immediate question thought is "why not 
use SPARQL?". Quite wary about re-inventing the wheel here, and had 
always seen the query mechanism in these APIs as more of a selector, 
than a full query language. Interested to hear others opinions on this.

> For query:
> 
>   *   It would be nice to have the ability to know how many results will be returned for the query.
>   *   It would be nice if certain 'key's like 'orderBy', 'count', 'groupBy', startIndex', 'resultsToReturn' could have special meaning.  These special key values map to enhancements made in the earlier statements.

As above.

> We have implemented the above APIs (with the suggested enhancements) in our Linked Data store (containing 439,000 graphs and 18,742,175 statements).  If you have any questions or need further clarification, please feel free to drop me a note.

Would be very interested to see the store/code if possible - and 
congratulations on your achievement :D

Best,

Nathan

Received on Tuesday, 26 July 2011 11:45:06 UTC