W3C home > Mailing lists > Public > public-perma-id@w3.org > February 2016

Re: Problems and Opportunities at purl.org

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 29 Feb 2016 16:04:37 +0000
Message-ID: <CAPRnXtk8A7Zsf2NsueN9ZB0aW3mNcEZBNt3huWPfjXzrnG8FSw@mail.gmail.com>
To: Norman Gray <norman@astro.gla.ac.uk>
Cc: Pemanent Identifier CG <public-perma-id@w3.org>, David Wood <david@3roundstones.com>
I started
https://github.com/stain/w3id-csv

it's quite simple start.. but it uses a CSV file like

https://github.com/stain/w3id-csv/blob/master/purl_example.csv
which matches the schema David Wood mentioned.

and then generates a bunch of .htaccess files.

You can test it on a dummy install of Apache httpd with Docker - see the README.


Obviously now this script is quite naive in that it makes a folder for
every purl.org entry, which (in addition to making loads of files)
would be a bit wrong (e.g. the purl /fred/soup.html  would make the
fred/soup.html/.htaccess which would mean an intermediate HTTP
redirect from soup.html to soup.html/  -- and I've not gone through
the different types yet to do subtree matching or the correct HTTP
redirection status code.

So one simple improvement would be to check if the path ends with a /
in purl.org or not - and then group those entries within the parent
path so there would be a bigger .htaccess.  However I think we want to
avoid a single large top-level .htaccess for registrations like
http://purl.org/pav  without a trailing / ?


As for conflicts this should be modified to only replace it's "own"
files by having a magic "#header".

We also talked about having a "native" CSV file approach for w3id.org
- so this could be modified then to have a better file format that we
can convert the purl.org dump into.




On 29 February 2016 at 12:29, Stian Soiland-Reyes
<soiland-reyes@cs.manchester.ac.uk> wrote:
> Yeah, let's get this going.
>
> So looking at the purl database schema we don't really need the group
> and user stuff to start with (although that could be added to the
> README).
>
> the purls table itself should be sufficient to start. We can find the
> different "type" values in the purl.org source code I think?
>
>
>
> On 29 February 2016 at 11:58, Norman Gray <norman@astro.gla.ac.uk> wrote:
>>
>> Greetings, all.
>>
>> A little while ago (and this message is a reply to
>> <https://lists.w3.org/Archives/Public/public-perma-id/2015Dec/0001.html>, to
>> resuscitate the thread), there was some interest expressed in a purl.org
>> successor.  That thread ended on a positive note, with David Wood and some
>> others having access to the schema, and OCLC apparently keen on passing
>> forward the current repository.
>>
>> I was asked about purl.org by a colleague today, and this reminded me about
>> last November/December's thread: is there any news about purl.org or the
>> broader preservation plan, that can be passed on?  Or is there any way that
>> I or others could help with this?
>>
>>
>> All the best,
>>
>> Norman
>>
>>
>> --
>> Norman Gray  :  https://nxg.me.uk
>> SUPA School of Physics and Astronomy, University of Glasgow, UK
>>
>
>
>
> --
> Stian Soiland-Reyes, eScience Lab
> School of Computer Science
> The University of Manchester
> http://soiland-reyes.com/stian/work/    http://orcid.org/0000-0001-9842-9718



-- 
Stian Soiland-Reyes, eScience Lab
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/    http://orcid.org/0000-0001-9842-9718
Received on Monday, 29 February 2016 16:05:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:43:41 UTC