- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Mon, 29 Feb 2016 17:32:40 +0000
- To: Shane McCarron <shane@aptest.com>
- Cc: Pemanent Identifier CG <public-perma-id@w3.org>
On 29 February 2016 at 16:35, Shane McCarron <shane@aptest.com> wrote: > The only downside to a huge top level .htaccess is the difficulty of editing > / maintaining it. Otherwise I am not concerned. Apache .htaccess > processing is efficient enough for these purposes imho. I guess you meant to reply to the list, so I've CCed it in. Another issue then is if we are to allow editing a CSV file to re-generate .htaccess (rather than a one-off move), then we have to extra careful that there aren't any other modifications to the top-level .htaccess. I was picturing we could move to a model where you have a folder, like let's look at https://github.com/perma-id/w3id.org/blob/master/cwl/ then instead of the current .htaccess there, you could have a CSV file like https://gist.github.com/stain/c2d668b11b66948b5991 It should be quite easy to generate the corresponding .htaccess from such files - they can have some headers to warn you: ## DO NOT EDIT RewriteEngine On ## END DO NOT EDIT I think we can still do regular expressions, if they start with ^ - which I think is fair enough) and the src paths are relative to the folder you are in, so on that example the one with "context" in src basically means https://w3id.org/cwl/context Special case then is for the folder itself, so either . or empty string. The Very Advanced Edition can allow full paths like /cwl/context - where the prefix from the current directory MUST match. (or we can say this is the required format, even). This does however not work on the regular expression side - as RewriteRules in a folder are relative to their location (naturally). It's probably better to have a limited number of options, so it's easy to validate the CSV files before trying to generate the .htaccess. > On Mon, Feb 29, 2016 at 10:04 AM, Stian Soiland-Reyes > <soiland-reyes@cs.manchester.ac.uk> wrote: >> >> I started >> https://github.com/stain/w3id-csv >> >> it's quite simple start.. but it uses a CSV file like >> >> https://github.com/stain/w3id-csv/blob/master/purl_example.csv >> which matches the schema David Wood mentioned. >> >> and then generates a bunch of .htaccess files. >> >> You can test it on a dummy install of Apache httpd with Docker - see the >> README. >> >> >> Obviously now this script is quite naive in that it makes a folder for >> every purl.org entry, which (in addition to making loads of files) >> would be a bit wrong (e.g. the purl /fred/soup.html would make the >> fred/soup.html/.htaccess which would mean an intermediate HTTP >> redirect from soup.html to soup.html/ -- and I've not gone through >> the different types yet to do subtree matching or the correct HTTP >> redirection status code. >> >> So one simple improvement would be to check if the path ends with a / >> in purl.org or not - and then group those entries within the parent >> path so there would be a bigger .htaccess. However I think we want to >> avoid a single large top-level .htaccess for registrations like >> http://purl.org/pav without a trailing / ? >> >> >> As for conflicts this should be modified to only replace it's "own" >> files by having a magic "#header". >> >> We also talked about having a "native" CSV file approach for w3id.org >> - so this could be modified then to have a better file format that we >> can convert the purl.org dump into. >> >> >> >> >> On 29 February 2016 at 12:29, Stian Soiland-Reyes >> <soiland-reyes@cs.manchester.ac.uk> wrote: >> > Yeah, let's get this going. >> > >> > So looking at the purl database schema we don't really need the group >> > and user stuff to start with (although that could be added to the >> > README). >> > >> > the purls table itself should be sufficient to start. We can find the >> > different "type" values in the purl.org source code I think? >> > >> > >> > >> > On 29 February 2016 at 11:58, Norman Gray <norman@astro.gla.ac.uk> >> > wrote: >> >> >> >> Greetings, all. >> >> >> >> A little while ago (and this message is a reply to >> >> >> >> <https://lists.w3.org/Archives/Public/public-perma-id/2015Dec/0001.html>, to >> >> resuscitate the thread), there was some interest expressed in a >> >> purl.org >> >> successor. That thread ended on a positive note, with David Wood and >> >> some >> >> others having access to the schema, and OCLC apparently keen on passing >> >> forward the current repository. >> >> >> >> I was asked about purl.org by a colleague today, and this reminded me >> >> about >> >> last November/December's thread: is there any news about purl.org or >> >> the >> >> broader preservation plan, that can be passed on? Or is there any way >> >> that >> >> I or others could help with this? >> >> >> >> >> >> All the best, >> >> >> >> Norman >> >> >> >> >> >> -- >> >> Norman Gray : https://nxg.me.uk >> >> SUPA School of Physics and Astronomy, University of Glasgow, UK >> >> >> > >> > >> > >> > -- >> > Stian Soiland-Reyes, eScience Lab >> > School of Computer Science >> > The University of Manchester >> > http://soiland-reyes.com/stian/work/ >> > http://orcid.org/0000-0001-9842-9718 >> >> >> >> -- >> Stian Soiland-Reyes, eScience Lab >> School of Computer Science >> The University of Manchester >> http://soiland-reyes.com/stian/work/ >> http://orcid.org/0000-0001-9842-9718 >> > > > > -- > Shane McCarron > Managing Director, Applied Testing and Technology, Inc. -- Stian Soiland-Reyes, eScience Lab School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Monday, 29 February 2016 17:33:29 UTC