RDF Queries in HTTP "Range" Headers

[+BCC to www-talk]

Since RDF graphs and their serializations can be large, client
requested server side queries could be a lot more time and bandwidth
efficient. For example, if you want to query a 10MB RDF file via HTTP,
at the moment you have to download the entire file and then perform
the query. Why not just send the query to the server, have it perform
the query, and send back only those triples that match? It could even
cache queries and their results.

HTTP 1.1 includes a "Range" header that was created, it appears, for
just this sort of process. The RFC 2616 only defines an operation to
get byte ranges at the moment, but it kinda leaves the door open for
all sort of things, including text based matching (grep via Range,
anybody?), and RDF queries.

I say "kinda" since there are a couple of oddities. Firstly, the
grammar for the Range header in RFC 2616 14.35 restricts itself to
only allowing "bytes" as a range request type. Yet in section 3.12
ibid., it clearly indicates that other range units are to be allowed
by defining the "other-range-unit" production. The production, as part
of range-unit, is even used again in the production for the
Accept-Ranges header.

Secondly, even if the range request types are extensible, it isn't
clear how one should go about registering them. If URIs could be used
to indicate the type of the content, then that wouldn't be a problem.
Unfortunately, however, HTTP 1.1 doesn't allow for using URIs as
range-units, since colons and slashes and all sorts of characters
aren't permitted:-

[[[
      range-unit       = bytes-unit | other-range-unit
      bytes-unit       = "bytes"
      other-range-unit = token
]]] - RFC 2616, 3.12

[[[
       token          = 1*<any CHAR except CTLs or separators>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT
]]] - RFC 2616, 2.2

One could hack around this by using "uri" as the (other-)range-unit,
and then using the first token of its value as the URI for the type of
the rest of the contents.

But ignoring those issues for now, there is also the issue of which
query language(s) to support. It doesn't make sense to use any query
language the indicates which URIs to get the content from, of course,
so that rules out the SQL-ish variants unless the "FROM" field was
ignored. It makes the sub-graph/formula type queries seem more
attractive, perhaps with a seperate constraints field.

It wouldn't be (too) difficult a thing to implement, either; anyone
with an Apache server could use a standard CGI handler, and set a two
line .htaccess file to handle *.rdf with it. The caching and deciding
which queries would be too processor intensive parts are the worst, of
course. Perhaps one could support various levels of querying: for
example, queries without variables are usually less intensive than
queries with variables.

Sidenote: since queries have so much potential metadata attached to
them, it almost makes sense for them to be modelled in RDF themselves,
but for the fact that one can't represent universally quantified
variables in XML/RDF.

I wonder what WebServices people would make of this sort of deal?
Because basically what's going on here is a WebService for the
Semantic Web. I think that using a header (with the advantages of
graceful failure, etc.) is a better idea than cramming it into a POST
request. Are there any other ways in which this could be implemented?
Does it need to be implemented at all?

Cheers,

P.S. This might be on topic for www-rdf-rules as well... oh well. It's
difficult deciding how many people to annoy by cross-posting when
you're writing about something that touches on so many different
subjects.

--
Sean B. Palmer, <http://purl.org/net/sbp/>
"phenomicity by the bucketful" - http://miscoranda.com/

Received on Sunday, 1 June 2003 21:50:22 UTC