- From: Shane McCarron <shane@aptest.com>
- Date: Tue, 1 Dec 2015 16:21:23 -0600
- To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Cc: Norman Gray <norman@astro.gla.ac.uk>, Pemanent Identifier CG <public-perma-id@w3.org>
- Message-ID: <CAOk_reFi7z37g8ZjfgAipxZ=VMwCvSzHpecm0QOFniQ_FY7v0w@mail.gmail.com>
Good news - The W3C (almost) has a Recommendation for CSV. Finally standardizing the format. http://www.w3.org/TR/2015/PR-tabular-data-model-20151117/ On Tue, Dec 1, 2015 at 2:15 PM, Stian Soiland-Reyes < soiland-reyes@cs.manchester.ac.uk> wrote: > Thank you, > > I agree on all points :)) > > Some simple file format should be sufficient for most cases, then generate > to $currentMainstream technology, or translate to $otherFormat. > > What I like about CSV - as awkwardly unspecified as it is, is that it is > easy for non-techies to understand, so we could keep the current Github > pull request model for a while. > > YAML could be another candidate, but then there's indentation to worry > about. JSON and XML are hard to hand-edit. > > Content-negotiation is one "fancy" thing that has been mentioned on w3id > list (purl.org doesn't do this), but if we were to agree to support it, > that should be sufficient to add later as an additional mediaType column or > similar. > > OK, so let's get some quick code repository up and running that can do > csv->htaccess or similar! Python or Ruby? :) (deliberately not proposing > Node.js here..) > > Former OCLS/purl guys, what is the current schema, or in what form could > we get the data? I can kind of deduce most of it from the UI and seen the > documentation for batch updates (which I never got to work myself). > On 30 Nov 2015 17:04, "Norman Gray" <norman@astro.gla.ac.uk> wrote: > >> >> Greetings. >> >> [apologies for the delay here: it was Stian's recent message that >> reminded me I wanted to reply to his earlier one] >> >> On 23 Nov 2015, at 11:45, Stian Soiland-Reyes wrote: >> >> What I see a danger with proposing some new $shinyServerSoftware is that >>> we can >>> easily bind ourself into the same trap as purl.org - becoming high >>> maintenance >>> sysadmin-wise, and potentially relying on abandoned technology. >>> Apache HTTP server also scales very well, and you can't say it's >>> proprietary >>> or at immediate risk of being abandoned. :) >>> >> >> I think we should remember that the _short term_ here is one or two >> decades, and that in this context the 'long term' implies preservation >> 'beyond one technology generation', and thus axiomatically dealing with >> Apache httpd's successor, rather than merely what mod_rewrite's manual >> looks like in 2025. >> >> I don't think we have to worry about the transition to URLs' successor, >> since PURL++ will surely be swept up in whatever web-wide transition path >> that requires. >> >> Thus I believe that at this stage we should not be thinking of >> $shinyServerSoftware at all, but of what the preservation data format is >> (.csv files?) and how the schema is documented (.txt files). Turning that >> into an actual service (doubtless using httpd and .htaccess files to begin >> with) is Just A Matter Of Code. >> >> So... >> >> What I like is the ideas that have been proposed to have a kind of "build" >>> stage with more managable CSV files or something, that then "compile" >>> into .htaccess or XML or whatever you fancy using a >>> straight-forward Python/Ruby/nodejs script. >>> >>> [...] >>> >>> This would also mean also that libraries and researchers could use & >>> archive >>> the w3id "database" without having to parse .htaccess or do thousands of >>> HTTP >>> request. (We might want to clarify the license on that database!) >>> >> >> ...I think I'm agreeing with Stian here, but possibly being more emphatic >> about it. >> >> So, a concrete question: what is the format of the purl.org data? I >> imagine a rather simple db schema. For the reasons above, I think that we >> should not regard a tree of .htaccess files as anything other than than >> disposable, or intermediate, implementation technology. >> >> but we would need to migrate the existing w3id.org <http://w3id.org/> >>>>> PURLs forward, I think. >>>>> >>>> In the same spirit, is that _really_ the case? >>>> >>> >>> Not migrating would undermine the whole reason for having w3id.org - >>> how would >>> anyone trust to use us if suddenly we wipe the existing >>> identifiers? >>> >> >> First: I doubt it would be necessary in fact to abandon anything at >> w3id.org. That said, I think it would be good to retain the option in >> principle. If there's a robust model for purl++ which happens to undermine >> one or two of the more creative current .htaccess redirections, then the >> long-term preservability (ie, 2--10 decades) is arguably more important >> than preserving a redirection that's been in existence for only a fraction >> of that. This is an argument about priorities, not a proposal for deletion. >> >> That would count as a decent rationale for 'deaccessioning' those w3ids >> (and deaccessioning is something archivists are forced to do from time to >> time). >> >> The current collection should be quite managable to convert manually in a >>> couple of days - so I don't see this as a big issue. >>> >> >> Ditto. >> >> Being able to support pretty much of all of the existing purl.org >>> redirects is >>> however much more important. They should all be rewritable to .htaccess >>> >> >> ...or to something which is mechanically implementable as .htaccess. >> >> All the best, >> >> Norman >> >> >> -- >> Norman Gray : https://nxg.me.uk >> SUPA School of Physics and Astronomy, University of Glasgow, UK >> > -- Shane McCarron Managing Director, Applied Testing and Technology, Inc.
Received on Tuesday, 1 December 2015 22:22:02 UTC