- From: Kjetil Kjernsmo <kjetil@kjernsmo.net>
- Date: Mon, 24 Sep 2012 03:25:22 +0200
- To: semantic-web@w3.org
- Cc: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, public-lod <public-lod@w3.org>
On Sunday 23. September 2012 08.42.29 Sebastian Hellmann wrote: > Dear lists, > We are currently looking to deploy a medium sized data set as linked > data: http://thedatahub.org/dataset/masc Great! > Here are my questions: > 1. Is this feasible or best practice? I am not sure, how many files can > be handled, efficiently by Apache. I'm pretty sure the number of files wouldn't be a problem. Apache would basically just serve the files in the file system. Indeed, it would be very cheap to do it this way, and you would get the benefit of correct etags and last-modified headers right out of the box. However, you would probably need to handle 303 redirects with mod_rewrite. mod_rewrite is a piece of black magic I prefer to stay away from. > 2. Is there a conversion script, somewhere, that produces one RDF/XML > file per subject URI? You may want to have a look at my Perl module RDF::LinkedData: https://metacpan.org/module/RDF::LinkedData If you run Debian or Ubuntu, you can install it using apt-get install librdf-linkeddata-perl Do not get alarmed by the number of dependencies, they are all very small modules, and well managed by the Debian ecosystem. This module is available in the latest Ubuntu and Debian testing. In Ubuntu, it is not the latest version, which I recommend, but you may want to get the .deb install first and then upgrade those that are needed in addition. It is not a conversion script, as my personal opinion is that static files is sort of a dead end. Instead of static files on the backend, the important thing is to have a caching proxy such as Varnish. The setup is somewhat more work to set up, but affords you a lot more flexibility. What the module does is to set up a server that takes subject URIs and then when you dereference those, you get a 303 redirect, content negotiation to many different serializations, including HTML with RDFa. Optionally, you can get a SPARQL endpoint to the data, VoID description, and so on. I run it at http://data.lenka.no/ where you will find the VoID description of my small dataset. You can explore from there. Moreover, it supports the read-only hypermedia from my ESWC paper: http://folk.uio.no/kjekje/2012/lapis2012.xhtml In fact, I haven't bothered to set it up a proxy at all, because my site gets very little traffic, so you might not need it either. The lack of a reverse proxy isi the reason why the VoID page is pretty slow, it runs for every client. The setup I run is basically Apache with FastCGI and my Perl module RDF::LinkedData as well as RDF::Endpoint (giving a SPARQL 1.1 endpoint using the working group's RDF::Query reference implementation by Gregory Todd Williams), RDF::Generator::Void, (giving VoID description) and some other auxillary modules, giving such things as full CORS support. To gain full CORS, you would need a dynamic server, static files would not give all you need as of today, I believe). The script can also run under more modern setups than Apache. > It would be nice, if we were able to just give data owners the data, we > converted for them, as a zip file and say: please unzip in your > /var/www to have linked data. Yeah, I can see that attractive, but I think my solution is even easier, as it is "here's an RDF file, point the config towards it and reload". :-) However, I acknowledge that this works mainly for small installs, in larger installs, you would need to run a database server. Best, Kjetil
Received on Monday, 24 September 2012 01:26:21 UTC