RE: RDF Queries in HTTP "Range" Headers from Seaborne, Andy on 2003-06-02 (www-rdf-interest@w3.org from June 2003)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Mon, 2 Jun 2003 19:47:51 +0100
To: "'Sean B. Palmer'" <sean@mysterylights.com>
Cc: www-rdf-interest@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F064D3CD4@0-mail-1.hpl.hp.com>
Sean,

An alternative to using HTTP "Range" is to use the query string of a GET.

In Joseki [1], the query string identifies the RDF query language (so is not
specific to a fixed language or predetermined set of languages) and a
serialization of the query to be asked.  Like in your description below, the
result of a query is triples - in this case the returned RDF is a subgraph
that would yield the same answers as the query on the whole graph.
Additional parameters to the operation request may change this behaviour but
the basic building block is returning a single RDF graph with the normal
return is a subgraph of the original, large target.

	Andy

[1] http://www.joseki.org/


> -----Original Message-----
> From: Sean B. Palmer [mailto:sean@mysterylights.com] 
> Sent: 2 June 2003 02:50
> To: www-rdf-interest@w3.org
> Subject: RDF Queries in HTTP "Range" Headers
> 
> 
> 
> [+BCC to www-talk]
> 
> Since RDF graphs and their serializations can be large, 
> client requested server side queries could be a lot more time 
> and bandwidth efficient. For example, if you want to query a 
> 10MB RDF file via HTTP, at the moment you have to download 
> the entire file and then perform the query. Why not just send 
> the query to the server, have it perform the query, and send 
> back only those triples that match? It could even cache 
> queries and their results.
> 
> HTTP 1.1 includes a "Range" header that was created, it 
> appears, for just this sort of process. The RFC 2616 only 
> defines an operation to get byte ranges at the moment, but it 
> kinda leaves the door open for all sort of things, including 
> text based matching (grep via Range, anybody?), and RDF queries.
> 
> I say "kinda" since there are a couple of oddities. Firstly, 
> the grammar for the Range header in RFC 2616 14.35 restricts 
> itself to only allowing "bytes" as a range request type. Yet 
> in section 3.12 ibid., it clearly indicates that other range 
> units are to be allowed by defining the "other-range-unit" 
> production. The production, as part of range-unit, is even 
> used again in the production for the Accept-Ranges header.
> 
> Secondly, even if the range request types are extensible, it 
> isn't clear how one should go about registering them. If URIs 
> could be used to indicate the type of the content, then that 
> wouldn't be a problem. Unfortunately, however, HTTP 1.1 
> doesn't allow for using URIs as range-units, since colons and 
> slashes and all sorts of characters aren't permitted:-
> 
> [[[
>       range-unit       = bytes-unit | other-range-unit
>       bytes-unit       = "bytes"
>       other-range-unit = token
> ]]] - RFC 2616, 3.12
> 
> [[[
>        token          = 1*<any CHAR except CTLs or separators>
>        separators     = "(" | ")" | "<" | ">" | "@"
>                       | "," | ";" | ":" | "\" | <">
>                       | "/" | "[" | "]" | "?" | "="
>                       | "{" | "}" | SP | HT
> ]]] - RFC 2616, 2.2
> 
> One could hack around this by using "uri" as the 
> (other-)range-unit, and then using the first token of its 
> value as the URI for the type of the rest of the contents.
> 
> But ignoring those issues for now, there is also the issue of 
> which query language(s) to support. It doesn't make sense to 
> use any query language the indicates which URIs to get the 
> content from, of course, so that rules out the SQL-ish 
> variants unless the "FROM" field was ignored. It makes the 
> sub-graph/formula type queries seem more attractive, perhaps 
> with a seperate constraints field.
> 
> It wouldn't be (too) difficult a thing to implement, either; 
> anyone with an Apache server could use a standard CGI 
> handler, and set a two line .htaccess file to handle *.rdf 
> with it. The caching and deciding which queries would be too 
> processor intensive parts are the worst, of course. Perhaps 
> one could support various levels of querying: for example, 
> queries without variables are usually less intensive than 
> queries with variables.
> 
> Sidenote: since queries have so much potential metadata 
> attached to them, it almost makes sense for them to be 
> modelled in RDF themselves, but for the fact that one can't 
> represent universally quantified variables in XML/RDF.
> 
> I wonder what WebServices people would make of this sort of 
> deal? Because basically what's going on here is a WebService 
> for the Semantic Web. I think that using a header (with the 
> advantages of graceful failure, etc.) is a better idea than 
> cramming it into a POST request. Are there any other ways in 
> which this could be implemented? Does it need to be 
> implemented at all?
> 
> Cheers,
> 
> P.S. This might be on topic for www-rdf-rules as well... oh 
> well. It's difficult deciding how many people to annoy by 
> cross-posting when you're writing about something that 
> touches on so many different subjects.
> 
> --
> Sean B. Palmer, <http://purl.org/net/sbp/>
> "phenomicity by the bucketful" - http://miscoranda.com/
>
Received on Tuesday, 3 June 2003 04:11:14 UTC