Re: Hosting linked data with an apache web server from Kjetil Kjernsmo on 2012-09-24 (public-lod@w3.org from September 2012)

From: Kjetil Kjernsmo <kjetil@kjernsmo.net>
Date: Mon, 24 Sep 2012 03:25:22 +0200
To: semantic-web@w3.org
Cc: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, public-lod <public-lod@w3.org>
Message-ID: <12374395.RpTzU3HnAq@owl>

On Sunday 23. September 2012 08.42.29 Sebastian Hellmann wrote:
> Dear lists,
> We are currently looking to deploy a medium sized data set as linked
> data: http://thedatahub.org/dataset/masc

Great!

> Here are my questions:
> 1. Is this feasible or best practice? I am not sure, how many files can
> be handled, efficiently by Apache.

I'm pretty sure the number of files wouldn't be a problem. Apache would 
basically just serve the files in the file system. Indeed, it would be very 
cheap to do it this way, and you would get the benefit of correct etags and 
last-modified headers right out of the box. However, you would probably need 
to handle 303 redirects with mod_rewrite. mod_rewrite is a piece of black 
magic I prefer to stay away from.

> 2. Is there a conversion script, somewhere, that produces one RDF/XML
> file per subject URI?

You may want to have a look at my Perl module RDF::LinkedData:
https://metacpan.org/module/RDF::LinkedData

If you run Debian or Ubuntu, you can install it using 
apt-get install librdf-linkeddata-perl
Do not get alarmed by the number of dependencies, they are all very small 
modules, and well managed by the Debian ecosystem. This module is available 
in the latest Ubuntu and Debian testing. In Ubuntu, it is not the latest 
version, which I recommend, but you may want to get the .deb install first 
and then upgrade those that are needed in addition.

It is not a conversion script, as my personal opinion is that static files 
is sort of a dead end. Instead of static files on the backend, the important 
thing is to have a caching proxy such as Varnish. The setup is somewhat 
more work to set up, but affords you a lot more flexibility.

What the module does is to set up a server that takes subject URIs and then 
when you dereference those, you get a 303 redirect, content negotiation to 
many different serializations, including HTML with RDFa. Optionally, you can 
get a SPARQL endpoint to the data, VoID description, and so on. I run it at 
http://data.lenka.no/ where you will find the VoID description of my small 
dataset. You can explore from there. Moreover, it supports the read-only 
hypermedia from my ESWC paper: 
http://folk.uio.no/kjekje/2012/lapis2012.xhtml

In fact, I haven't bothered to set it up a proxy at all, because my site 
gets very little traffic, so you might not need it either. The lack of a 
reverse proxy isi the reason why the VoID page is pretty slow, it runs for 
every client. The setup I run is basically Apache with FastCGI and my Perl 
module RDF::LinkedData as well as RDF::Endpoint (giving a SPARQL 1.1 
endpoint using the working group's RDF::Query reference implementation by 
Gregory Todd Williams), RDF::Generator::Void, (giving VoID description) and 
some other auxillary modules, giving such things as full CORS support. To 
gain full CORS, you would need a dynamic server, static files would not give 
all you need as of today, I believe). The script can also run under more 
modern setups than Apache.

> It would be nice, if we were able to just give data owners the data, we
> converted for them,  as a zip file and say: please unzip in your
> /var/www to have linked data.

Yeah, I can see that attractive, but I think my solution is even easier, as 
it is "here's an RDF file, point the config towards it and reload". :-) 
However, I acknowledge that this works mainly for small installs, in larger 
installs, you would need to run a database server.

Best,

Kjetil

Received on Monday, 24 September 2012 01:26:21 UTC