Re: Problems and Opportunities at purl.org from Stian Soiland-Reyes on 2015-12-01 (public-perma-id@w3.org from December 2015)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Tue, 1 Dec 2015 20:15:49 +0000
To: Norman Gray <norman@astro.gla.ac.uk>
Cc: Pemanent Identifier CG <public-perma-id@w3.org>
Message-ID: <CAPRnXtmvf0GzON9ETh92A2VrqT7K9JpMePPVUnf9O_2a=qOSSw@mail.gmail.com>
Thank you,

I agree on all points :))

Some simple file format should be sufficient for most cases, then generate
to $currentMainstream technology, or translate to $otherFormat.

What I like about CSV - as awkwardly unspecified as it is, is that it is
easy for non-techies to understand, so we could keep the current Github
pull request model for a while.

YAML could be another candidate, but then there's indentation to worry
about. JSON and XML are hard to hand-edit.

Content-negotiation is one "fancy" thing that has been mentioned on w3id
list (purl.org doesn't do this), but if we were to agree to support it,
that should be sufficient to add later as an additional mediaType column or
similar.

OK, so let's get some quick code repository up and running that can do
csv->htaccess or similar! Python or Ruby? :) (deliberately not proposing
Node.js here..)

Former OCLS/purl guys, what is the current schema, or in what form could we
get the data? I can kind of deduce most of it from the UI and seen the
documentation for batch updates (which I never got to work myself).
On 30 Nov 2015 17:04, "Norman Gray" <norman@astro.gla.ac.uk> wrote:

>
> Greetings.
>
> [apologies for the delay here: it was Stian's recent message that reminded
> me I wanted to reply to his earlier one]
>
> On 23 Nov 2015, at 11:45, Stian Soiland-Reyes wrote:
>
> What I see a danger with proposing some new $shinyServerSoftware is that
>> we can
>> easily bind ourself into the same trap as purl.org - becoming high
>> maintenance
>> sysadmin-wise, and potentially relying on abandoned technology.
>> Apache HTTP server also scales very well, and you can't say it's
>> proprietary
>> or at immediate risk of being abandoned. :)
>>
>
> I think we should remember that the _short term_ here is one or two
> decades, and that in this context the 'long term' implies preservation
> 'beyond one technology generation', and thus axiomatically dealing with
> Apache httpd's successor, rather than merely what mod_rewrite's manual
> looks like in 2025.
>
> I don't think we have to worry about the transition to URLs' successor,
> since PURL++ will surely be swept up in whatever web-wide transition path
> that requires.
>
> Thus I believe that at this stage we should not be thinking of
> $shinyServerSoftware at all, but of what the preservation data format is
> (.csv files?) and how the schema is documented (.txt files).  Turning that
> into an actual service (doubtless using httpd and .htaccess files to begin
> with) is Just A Matter Of Code.
>
> So...
>
> What I like is the ideas that have been proposed to have a kind of "build"
>> stage with more managable CSV files or something, that then "compile"
>> into .htaccess or XML or whatever you fancy using a
>> straight-forward Python/Ruby/nodejs script.
>>
>> [...]
>>
>> This would also mean also that libraries and researchers could use &
>> archive
>> the w3id "database" without having to parse .htaccess or do thousands of
>> HTTP
>> request.  (We might want to clarify the license on that database!)
>>
>
> ...I think I'm agreeing with Stian here, but possibly being more emphatic
> about it.
>
> So, a concrete question: what is the format of the purl.org data?  I
> imagine a rather simple db schema.  For the reasons above, I think that we
> should not regard a tree of .htaccess files as anything other than than
> disposable, or intermediate, implementation technology.
>
> but we would need to migrate the existing w3id.org <http://w3id.org/>
>>>> PURLs forward, I think.
>>>>
>>> In the same spirit, is that _really_ the case?
>>>
>>
>> Not migrating would undermine the whole reason for having w3id.org - how
>> would
>> anyone trust to use us if suddenly we wipe the existing
>> identifiers?
>>
>
> First: I doubt it would be necessary in fact to abandon anything at
> w3id.org.  That said, I think it would be good to retain the option in
> principle.  If there's a robust model for purl++ which happens to undermine
> one or two of the more creative current .htaccess redirections, then the
> long-term preservability (ie, 2--10 decades) is arguably more important
> than preserving a redirection that's been in existence for only a fraction
> of that.  This is an argument about priorities, not a proposal for deletion.
>
> That would count as a decent rationale for 'deaccessioning' those w3ids
> (and deaccessioning is something archivists are forced to do from time to
> time).
>
> The current collection should be quite managable to convert manually in a
>> couple of days - so I don't see this as a big issue.
>>
>
> Ditto.
>
> Being able to support pretty much of all of the existing purl.org
>> redirects is
>> however much more important.  They should all be rewritable to .htaccess
>>
>
> ...or to something which is mechanically implementable as .htaccess.
>
> All the best,
>
> Norman
>
>
> --
> Norman Gray  :  https://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
>
Received on Tuesday, 1 December 2015 20:16:24 UTC