- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Mon, 2 Jun 2003 02:50:16 +0100
- To: <www-rdf-interest@w3.org>
[+BCC to www-talk] Since RDF graphs and their serializations can be large, client requested server side queries could be a lot more time and bandwidth efficient. For example, if you want to query a 10MB RDF file via HTTP, at the moment you have to download the entire file and then perform the query. Why not just send the query to the server, have it perform the query, and send back only those triples that match? It could even cache queries and their results. HTTP 1.1 includes a "Range" header that was created, it appears, for just this sort of process. The RFC 2616 only defines an operation to get byte ranges at the moment, but it kinda leaves the door open for all sort of things, including text based matching (grep via Range, anybody?), and RDF queries. I say "kinda" since there are a couple of oddities. Firstly, the grammar for the Range header in RFC 2616 14.35 restricts itself to only allowing "bytes" as a range request type. Yet in section 3.12 ibid., it clearly indicates that other range units are to be allowed by defining the "other-range-unit" production. The production, as part of range-unit, is even used again in the production for the Accept-Ranges header. Secondly, even if the range request types are extensible, it isn't clear how one should go about registering them. If URIs could be used to indicate the type of the content, then that wouldn't be a problem. Unfortunately, however, HTTP 1.1 doesn't allow for using URIs as range-units, since colons and slashes and all sorts of characters aren't permitted:- [[[ range-unit = bytes-unit | other-range-unit bytes-unit = "bytes" other-range-unit = token ]]] - RFC 2616, 3.12 [[[ token = 1*<any CHAR except CTLs or separators> separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT ]]] - RFC 2616, 2.2 One could hack around this by using "uri" as the (other-)range-unit, and then using the first token of its value as the URI for the type of the rest of the contents. But ignoring those issues for now, there is also the issue of which query language(s) to support. It doesn't make sense to use any query language the indicates which URIs to get the content from, of course, so that rules out the SQL-ish variants unless the "FROM" field was ignored. It makes the sub-graph/formula type queries seem more attractive, perhaps with a seperate constraints field. It wouldn't be (too) difficult a thing to implement, either; anyone with an Apache server could use a standard CGI handler, and set a two line .htaccess file to handle *.rdf with it. The caching and deciding which queries would be too processor intensive parts are the worst, of course. Perhaps one could support various levels of querying: for example, queries without variables are usually less intensive than queries with variables. Sidenote: since queries have so much potential metadata attached to them, it almost makes sense for them to be modelled in RDF themselves, but for the fact that one can't represent universally quantified variables in XML/RDF. I wonder what WebServices people would make of this sort of deal? Because basically what's going on here is a WebService for the Semantic Web. I think that using a header (with the advantages of graceful failure, etc.) is a better idea than cramming it into a POST request. Are there any other ways in which this could be implemented? Does it need to be implemented at all? Cheers, P.S. This might be on topic for www-rdf-rules as well... oh well. It's difficult deciding how many people to annoy by cross-posting when you're writing about something that touches on so many different subjects. -- Sean B. Palmer, <http://purl.org/net/sbp/> "phenomicity by the bucketful" - http://miscoranda.com/
Received on Sunday, 1 June 2003 21:50:22 UTC