W3C home > Mailing lists > Public > public-perma-id@w3.org > December 2015

Re: Problems and Opportunities at purl.org

From: Kerri Lemoie <kerri@achievery.com>
Date: Tue, 1 Dec 2015 08:47:31 -0500
Cc: public-perma-id <public-perma-id@w3.org>, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Message-Id: <149E8041-D902-43BC-AA9E-A5757F3DD189@achievery.com>
To: Norman Gray <norman@astro.gla.ac.uk>
Hi All,

I’m a little late to this thread but came across this today and thought it may be something worth considering:

http://ipfs.io/

A peer-to-peer distributed file system.

Kerri



> On Nov 30, 2015, at 10:24 AM, Norman Gray <norman@astro.gla.ac.uk> wrote:
> 
> 
> Greetings.
> 
> [apologies for the delay here: it was Stian's recent message that reminded me I wanted to reply to his earlier one]
> 
> On 23 Nov 2015, at 11:45, Stian Soiland-Reyes wrote:
> 
>> What I see a danger with proposing some new $shinyServerSoftware is that we can
>> easily bind ourself into the same trap as purl.org - becoming high maintenance
>> sysadmin-wise, and potentially relying on abandoned technology.
>> Apache HTTP server also scales very well, and you can't say it's proprietary
>> or at immediate risk of being abandoned. :)
> 
> I think we should remember that the _short term_ here is one or two decades, and that in this context the 'long term' implies preservation 'beyond one technology generation', and thus axiomatically dealing with Apache httpd's successor, rather than merely what mod_rewrite's manual looks like in 2025.
> 
> I don't think we have to worry about the transition to URLs' successor, since PURL++ will surely be swept up in whatever web-wide transition path that requires.
> 
> Thus I believe that at this stage we should not be thinking of $shinyServerSoftware at all, but of what the preservation data format is (.csv files?) and how the schema is documented (.txt files).  Turning that into an actual service (doubtless using httpd and .htaccess files to begin with) is Just A Matter Of Code.
> 
> So...
> 
>> What I like is the ideas that have been proposed to have a kind of "build"
>> stage with more managable CSV files or something, that then "compile"
>> into .htaccess or XML or whatever you fancy using a
>> straight-forward Python/Ruby/nodejs script.
>> 
>> [...]
>> 
>> This would also mean also that libraries and researchers could use & archive
>> the w3id "database" without having to parse .htaccess or do thousands of HTTP
>> request.  (We might want to clarify the license on that database!)
> 
> ...I think I'm agreeing with Stian here, but possibly being more emphatic about it.
> 
> So, a concrete question: what is the format of the purl.org data?  I imagine a rather simple db schema.  For the reasons above, I think that we should not regard a tree of .htaccess files as anything other than than disposable, or intermediate, implementation technology.
> 
>>>> but we would need to migrate the existing w3id.org <http://w3id.org/>
>>>> PURLs forward, I think.
>>> In the same spirit, is that _really_ the case?
>> 
>> Not migrating would undermine the whole reason for having w3id.org - how would
>> anyone trust to use us if suddenly we wipe the existing
>> identifiers?
> 
> First: I doubt it would be necessary in fact to abandon anything at w3id.org.  That said, I think it would be good to retain the option in principle.  If there's a robust model for purl++ which happens to undermine one or two of the more creative current .htaccess redirections, then the long-term preservability (ie, 2--10 decades) is arguably more important than preserving a redirection that's been in existence for only a fraction of that.  This is an argument about priorities, not a proposal for deletion.
> 
> That would count as a decent rationale for 'deaccessioning' those w3ids (and deaccessioning is something archivists are forced to do from time to time).
> 
>> The current collection should be quite managable to convert manually in a
>> couple of days - so I don't see this as a big issue.
> 
> Ditto.
> 
>> Being able to support pretty much of all of the existing purl.org redirects is
>> however much more important.  They should all be rewritable to ..htaccess
> 
> ...or to something which is mechanically implementable as .htaccess.
> 
> All the best,
> 
> Norman
> 
> 
> -- 
> Norman Gray  :  https://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
> 
Received on Tuesday, 1 December 2015 14:29:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:43:41 UTC