W3C home > Mailing lists > Public > public-perma-id@w3.org > November 2015

Re: Problems and Opportunities at purl.org

From: Norman Gray <norman@astro.gla.ac.uk>
Date: Mon, 30 Nov 2015 15:24:30 +0000
To: public-perma-id <public-perma-id@w3.org>
Cc: "Stian Soiland-Reyes" <soiland-reyes@cs.manchester.ac.uk>
Message-ID: <0D882FC6-E51F-49D0-9A54-53C3425E261F@astro.gla.ac.uk>


[apologies for the delay here: it was Stian's recent message that 
reminded me I wanted to reply to his earlier one]

On 23 Nov 2015, at 11:45, Stian Soiland-Reyes wrote:

> What I see a danger with proposing some new $shinyServerSoftware is 
> that we can
> easily bind ourself into the same trap as purl.org - becoming high 
> maintenance
> sysadmin-wise, and potentially relying on abandoned technology.
> Apache HTTP server also scales very well, and you can't say it's 
> proprietary
> or at immediate risk of being abandoned. :)

I think we should remember that the _short term_ here is one or two 
decades, and that in this context the 'long term' implies preservation 
'beyond one technology generation', and thus axiomatically dealing with 
Apache httpd's successor, rather than merely what mod_rewrite's manual 
looks like in 2025.

I don't think we have to worry about the transition to URLs' successor, 
since PURL++ will surely be swept up in whatever web-wide transition 
path that requires.

Thus I believe that at this stage we should not be thinking of 
$shinyServerSoftware at all, but of what the preservation data format is 
(.csv files?) and how the schema is documented (.txt files).  Turning 
that into an actual service (doubtless using httpd and .htaccess files 
to begin with) is Just A Matter Of Code.


> What I like is the ideas that have been proposed to have a kind of 
> "build"
> stage with more managable CSV files or something, that then "compile"
> into .htaccess or XML or whatever you fancy using a
> straight-forward Python/Ruby/nodejs script.
> [...]
> This would also mean also that libraries and researchers could use & 
> archive
> the w3id "database" without having to parse .htaccess or do thousands 
> of HTTP
> request.  (We might want to clarify the license on that database!)

...I think I'm agreeing with Stian here, but possibly being more 
emphatic about it.

So, a concrete question: what is the format of the purl.org data?  I 
imagine a rather simple db schema.  For the reasons above, I think that 
we should not regard a tree of .htaccess files as anything other than 
than disposable, or intermediate, implementation technology.

>>> but we would need to migrate the existing w3id.org 
>>> <http://w3id.org/>
>>> PURLs forward, I think.
>> In the same spirit, is that _really_ the case?
> Not migrating would undermine the whole reason for having w3id.org - 
> how would
> anyone trust to use us if suddenly we wipe the existing
> identifiers?

First: I doubt it would be necessary in fact to abandon anything at 
w3id.org.  That said, I think it would be good to retain the option in 
principle.  If there's a robust model for purl++ which happens to 
undermine one or two of the more creative current .htaccess 
redirections, then the long-term preservability (ie, 2--10 decades) is 
arguably more important than preserving a redirection that's been in 
existence for only a fraction of that.  This is an argument about 
priorities, not a proposal for deletion.

That would count as a decent rationale for 'deaccessioning' those w3ids 
(and deaccessioning is something archivists are forced to do from time 
to time).

> The current collection should be quite managable to convert manually 
> in a
> couple of days - so I don't see this as a big issue.


> Being able to support pretty much of all of the existing purl.org 
> redirects is
> however much more important.  They should all be rewritable to 
> .htaccess

...or to something which is mechanically implementable as .htaccess.

All the best,


Norman Gray  :  https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
Received on Monday, 30 November 2015 17:04:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:43:41 UTC