W3C home > Mailing lists > Public > semantic-web@w3.org > February 2014

Re: Vocabulary for HTTP headers

From: Richard Smith <richard@ex-parrot.com>
Date: Wed, 12 Feb 2014 22:45:05 +0000 (GMT)
To: Semantic Web <semantic-web@w3.org>
Message-ID: <alpine.LRH.2.02.1402122132500.9417@sphinx.mythic-beasts.com>
Martynas Jusevičius wrote:

> I'm not sure what you intend to use the vocabulary for,

Perhaps I should have given a bit more background.  I have a 
tool for managing a collection of files (or links to them). 
The main part is a SPARQL engine that at the moment allows 
me to answer queries like

   "find all pictures of John Smith in France"

by writing queries like

   PREFIX foaf: <http://xmlns.com/foaf/0.1/>
   PREFIX schema: <http://schema.org/>
   PREFIX dcterms: <http://purl.org/dc/terms/abstract/>
   SELECT ?file
   WHERE {
     ?file a foaf:Image ;
       foaf:depicts [
         a foaf:Person ;
         foaf:name "John Smith" ] ;
       dcterms:spatial [
         a schema:Place ;
         schema:name "France" ] .

That's simplified quite a bit (and if I've introduced errors 
in the process, my apologies), but basically it's all 
working well enough for my purposes.  However, ideally I 
also need to be able to handle queries based on the 
modification date, media type, file length and (for somewhat 
obscure reasons) the ETag header it gets served with.

At the moment I change Last-Modified headers to 
dcterms:modified (and reformat the date to be a 
xsd:dateTime), but Content-Type, Content-Length and ETag 
just get converted into my own custom properties.  If 
something more formal exists, I would much rather use them.

Your suggestion of <http://purl.org/NET/mediatypes> and 
dcterms:format for Content-Type is a good one, and solves 
that problem.  The two obvious choices for Content-Length 
are dcterms:extent and schema:contentSize is that they're 
underspecified.  In the former case, the property's range is 
a dcterms:SizeOrDuration: its value is *not* a literal, 
despite how almost all examples (including the RSS 1.0 spec) 
seem to use it.  I could use it with rdf:value, e.g.:

   <foo.jpg> dcterms:extent [ rdf:value 514090 ] .

But in that example, or as with schema:contentSize, there's 
no indication of what units are in use.  Are they bytes? 
The schema.org spec suggests kB or MB are preferred.

As for the ETag header, I doubt anything suitable exists 
unless there's already a vocabulary of HTTP headers, hence 
my quesiton.

> but for logging there are templates:
> Turtle access log formatter for Apache:
> http://www.ebremer.com/paladin/pipelogger/2013-04-08
> Turtle access log formatter for Tomcat:
> https://gist.github.com/pumba-lt/5656373

They make it easier to use the w3's HTTP-in-RDF vocabulary, 
but that vocabulary is not suitable so it doesn't help here. 
That's because the set of RDF statements produced do not 
reference the URL of the file being fetched, so I cannot add 
it to my SPARQL.  (Yes, I could compose the name in SPARQL 
with a CONCAT of a literal "http:", and the 
http:connectionAuthority and http:absolutePath properties. 
But that's very, very messy, and ineffecient in the parsers 
I'ved tried.)

Received on Wednesday, 12 February 2014 22:45:35 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:48 UTC