- From: Norman Gray <norman@astro.gla.ac.uk>
- Date: Thu, 03 Mar 2016 13:00:41 +0000
- To: "Stian Soiland-Reyes" <soiland-reyes@cs.manchester.ac.uk>
- Cc: "Daniel Garijo" <dgarijo@fi.upm.es>, "Ian Dunlop" <ianwdunlop@gmail.com>, "Pemanent Identifier CG" <public-perma-id@w3.org>
Stian and all, hello. On 3 Mar 2016, at 11:09, Stian Soiland-Reyes wrote: > One thing that comes up when I'm looking at the purl.org redirections > is that there's often the with and without slash variants.. e.g. > http://purl.org/pav and http://purl.org/pav/hasVersion -- it would be > good if these could be represented within /pav/rules.csv rather than > also have a line in /rules.csv -- perhaps the special values should > be empty string for "folder" and "." for "folder/". That could be represented by a magic value in the "src" column -- say "<" or "<>" (which can't appear in a URI path segment). Presuming that the implementation would not parse each rules.csv file dynamically, but would ingest them in a preprocessing step, the fact that this applies to the 'parent' path need not be a problem. > As for the existing w3id.org htaccess rules, I think any non-slash > folder usage now is indirect through Apache's own directory matching, > e.g. https://w3id.org/bundle works as it should by a 301 Moved > Permanently to https://w3id.org/bundle/ which then does 302 Found to > its final destination. These and others could potentially be handled by a couple of lines which are implicitly appended to each rules.csv file. If the logic is that the first input URIs are handled by the first row in rules.csv which has a match in column 1, then these can be overridden easily, but still provide consistent behaviour. The (usual apache) adding-slash behaviour would be "^(.*)$","$1/",301, and error behaviour would be the catch-all "^.*$","http://purl.org/admin/error.html",404, It might also be worth specifying that the beginning and end of column 1 are implicitly anchored with "^...$', where the beginning matches the beginning of the part of the URI path component which starts at the current path component. Anchoring both ends would be useful since (a) it would probably be good practice to anchor patterns explicitly in any case, and (b) it fits in with the most natural/naive reading of the list which would have the first column match on path elements (ie, principle of least surprise). Thus adapting your example, we might have "<>","http://example.com/home",302, "","http://example.com/home-dir","302","" "sub/folders/allowed","http://example.com/flat.html","302","" "sub/folders/allowed.*","http://example.com/flatter.html","302","" "blog/(.*)/","http://example.com/blog/post/$1/","302","" Since 302 Found would be the most typical status code for this service, perhaps that could be the default if the "statuscode" column is empty. If this was in a directory "foo" (or, more abstractly, if this were being interpreted in a 'context' 'foo'), then we'd have mappings .../foo -> http://example.com/home .../foo/ -> http://example.com/home-dir .../foo/sub/folders/allowed -> http://example.com/flat.html .../foo/sub/folders/allowed/bar -> http://example.com/flatter.html .../foo/blog/wibble/ -> http://example.com/blog/post/wibble/ .../foo/blog/wibble/woot/ -> http://example.com/blog/post/wibble/woot/ .../foo/blog/wibble/woot -> http://purl.org/admin/error.html .../foo/myblog/stuff -> http://purl.org/admin/error.html (and not http://example.com/blog/post/stuff) All the best, Norman -- Norman Gray : https://nxg.me.uk SUPA School of Physics and Astronomy, University of Glasgow, UK
Received on Thursday, 3 March 2016 13:01:13 UTC