Re: Hosting linked data with an apache web server from Hugh Glaser on 2012-09-25 (public-lod@w3.org from September 2012)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Tue, 25 Sep 2012 22:44:04 +0000
To: Norman Gray <norman@astro.gla.ac.uk>
CC: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, public-lod <public-lod@w3.org>, Kjetil Kjernsmo <kjetil@kjernsmo.net>
Message-ID: <387E72E216DF1247A2F8ED4819C93BA71E166E7C@UOS-MSG00041-SI.soton.ac.uk>
Hi.
I would have thought that what you are asking should be easy, and actually is - no need for all that 303 stuff, certainly.
Just use Solution 1 to generate the rdf files, but give them the same name as the URI, and put them in a directory (or whatever directory structure you want).
Certainly if it is all you can manage, then we (the community) should encourage you to do so.
If you do that, Apache will simply deliver the file with a 200, with your RDF in. Most consumers will handle this quite well. It isn't the recommended way of publishing Linked Data, but there are quite a few sites that do it, and any consumer would be well-advised to accept it.
The alternative that does fit the recommendation is to use "hash"es (see Tim's original document http://www.w3.org/DesignIssues/LinkedData.html for a description, although there are many more since).

One important thing here is that it seems you have no intention to publish html - you just want to stuff the RDF onto the web. So you really have no need for 303 etc.. And of course there are now quite a lot of services that will format your RDF as html if people want to see it that way. See http://browse.semanticweb.org

Apache on Linux will easily cope with 6 figures of files (I have quite a lot of those), and I probably have some directories with 7 figures (millions). There was a time when directories that big on Linux were a bit of a problem, but not any more. You may need to learn some Linux skills to do anything with them, as normal shell tools will decide the argument list is too long. But "find" and "xargs" are your friends.

So the best thing is to create RDF files that use hashes, so your users can just unzip them, and give them an index that has information about what you have given them, and will point also them at http://browse.semanticweb.org
This should be easy, and we want to make it easy for you (right?!).

Sorry, I don't have a script that will run over an RDF file or SPARQL store and realise all the RDF files for the subject URIs, but I can probably knock one up if you tell me exactly what you want..

Best
Hugh

On 24 Sep 2012, at 09:40, Norman Gray <norman@astro.gla.ac.uk> wrote:

> 
> Sebastian, hello.
> 
> (...and pruning semantic-web from the cc list)
> 
> On 24 Sep 2012, at 02:25, Kjetil Kjernsmo <kjetil@kjernsmo.net> wrote:
> 
>>> Here are my questions:
>>> 1. Is this feasible or best practice? I am not sure, how many files can
>>> be handled, efficiently by Apache.
>> 
>> I'm pretty sure the number of files wouldn't be a problem. Apache would 
>> basically just serve the files in the file system. Indeed, it would be very 
>> cheap to do it this way, and you would get the benefit of correct etags and 
>> last-modified headers right out of the box. However, you would probably need 
>> to handle 303 redirects with mod_rewrite. mod_rewrite is a piece of black 
>> magic I prefer to stay away from.
> 
> Without wishing to disagree with Kjetil's overall point (which I intend to study closely), I wouldn't rule out mod_rewrite, if you still find a static-files-plus-apache solution to be attractive.  For me, mod_rewrite is legerdemain rather than black magic: it needs practice, but it's unlikely to imperil your immortal soul.
> 
> I've included below a template .htaccess file which I've used to (statically) publish a set of SKOS vocabularies (see <http://www.ivoa.net/rdf/Vocabularies/AAkeys>).  This maps <.../foo> to <.../release-n.m/foo/foo.html> or .rdf or .ttl as appropriate.  The vocabularies in this case are single files rather than collections of lots of files, but this should give you the pattern.
> 
> Incidentally, I'm also attaching the check-uris.sh.in script which is configured at the same time as this .htaccess file, and which checks that all of the files are being served as they should be.  The .htaccess file has a nasty syntax, and it's hard to debug, so having a mechanical check like this can be very reassuring, especially after software changes, for example.
> 
> Best wishes,
> 
> Norman
> 
> 
> 
> ## Template Apache .htaccess file for serving vocabularies.
> ## This file is configured at make time, with the following
> ## substitutions:
> ##
> ##     @BASE@ should be the path component of the base URI of the
> ##     vocabulary distribution, that is, without the protocol and
> ##     authority components. 
> ##
> ##    @SUBDIR@ is a versioned subdirectory, where the public URIs
> ##    redirect to.  This is not an advertised directory, but it needs
> ##    to be generally accessible.
> ##
> ##    @FILEOPTS@ is a list of the supported vocabulary names,
> ##    separated by vertical bars.  For example 'AOIM|AAkeys'.  This
> ##     lists the vocabularies which this file will actually serve.
> ##
> ## This file is patterned after Recipe 3 in the W3C document 'Best
> ## Practice Recipes for Publishing RDF Vocabularies', at
> ## <http://www.w3.org/TR/swbp-vocab-pub/>
> 
> # Apache .htaccess file
> 
> AddType application/rdf+xml .rdf
> # The MIME type for .n3 should be text/rdf+n3, not application/n3:
> # see MIME notes at <http://www.w3.org/2000/10/swap/doc/changes.html>
> #
> # The MIME type for Turtle is text/turtle, though this has not
> # completed its registration: see
> # <http://www.w3.org/TeamSubmission/turtle/#sec-mediaReg>
> AddType text/rdf+n3 .n3
> AddType text/turtle .ttl
> # For Charset types, see <http://www.iana.org/assignments/character-sets>
> AddCharset UTF-8 .n3
> AddCharset UTF-8 .ttl
> AddCharset UTF-8 .html
> 
> RewriteEngine On
> # This base must match the directory where this file is located.
> RewriteBase @BASE@
> 
> RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
> RewriteRule ^(@FILEOPTS@)$ @SUBDIR@/$1/$1.rdf [R=303]
> 
> RewriteCond %{HTTP_ACCEPT} text/rdf\+n3 [OR]
> RewriteCond %{HTTP_ACCEPT} application/n3 [OR]
> RewriteCond %{HTTP_ACCEPT} text/turtle
> RewriteRule ^(@FILEOPTS@)$ @SUBDIR@/$1/$1.ttl [R=303]
> 
> # No accept conditions: make the .html version the default
> RewriteRule ^(@FILEOPTS@)$ @SUBDIR@/$1/$1.html [R=303]
> 
> 
> 
> -- 
> Norman Gray  :  http://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
> <check-uris.sh.in>
Received on Tuesday, 25 September 2012 22:44:59 UTC