Re: Request for comment on "digital preservation of government records" use case from Adam Retter on 2014-03-12 (public-csv-wg@w3.org from March 2014)

From: Adam Retter <adam@exist-db.org>
Date: Wed, 12 Mar 2014 18:33:59 +0000
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <CAJKLP9aZdG0F9sq1wgXGx8XZ3xZTDXmyP=QZJYKGP1Uep260Tw@mail.gmail.com>
Hi Jeremy,

I have just seen this and will try and address it in the next few
days. I think some of the described issues have come about from the
interpretation of my use-cases. I think perhaps these were not well
enough explained by me. I just wanted to clarify some of your issues
with you, to make sure I understand what you are asking:

* Reference to Data Definition Resource (DDR) for a transcription of
records is required.
Are you looking for the CSV Schemas that we use for validating our CSV
files? If so there are quite a few different ones for different
metadata collections, do you just want one or two of the more
interesting ones?

* Provide reference to TNA Records discovery catalogue.
I am not sure what value this would provide? The Discovery system is
entirely separate to the project I am working on, a sub-set of our
metadata (that originally came in CSV) is exported to Discovery in XML
format.

* Use case also requires a real example to refer to; both the CSV
metadata and a reference into the catalogue. Please can this show how
the row type leads to different validation being applied?
It is not possible for me to provide real data, as the records that I
am working with in respect to CSV Validation are closed records and
are not publicly available.

* Resulting RDF transformed from the example CSV is required.
Again I cannot really provide this, as this RDF forms part of our
larger graph which contains sensitive non-public information. Also I
am not sure this RDF would help anyone, as we are not directly
converting CSV into RDF, rather we post-process the CSV and eventually
produce RDF which add's properties to our existing graph resources. As
such this RDF really just represents our own internal weird Digital
Archiving data model, of which properties from multiple CSV files form
a part.

* Please indicate what software language and/or tools/libraries are
used to work with the CSV data - both validation and transformation.
Are you looking for something like:
Validation is done using the CSV Validator Open Source tool that we
published on GitHub -
https://github.com/digital-preservation/csv-validator which is written
in Scala. However this is already mentioned and linked in the body of
the use-case. Regards transformation, our CSV becomes XML using XSLT
(https://github.com/digital-preservation/csv-tools/blob/master/csv-to-xml_v3.xsl)
and from there we further process it using a mix of XSLT, Java and
Scala, eventually we use XSLT to convert some data model expressed in
XML into a different data model expressed in RDF/XML.


I am not sure if the translation of my use-cases into this new
document is accurate. Once I hear from you about the above, I will try
and do some work on this, and then perhaps you could re-read and tell
me if it makes more sense and if there are a new set of issues.

On 10 March 2014 18:18, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:
> Hi Adam – I don’t know whether you saw Use Case #1 - Digital preservation of
> government records? There are a number of issues to resolve that I hope you
> can help with. Many thanks, Jeremy
>
>
>
> From: Tandy, Jeremy
> Sent: 21 February 2014 12:58
> To: 'public-csv-wg@w3.org'
> Subject: RE: updated use case document
>
>
>
> I have now added Adam Retter’s digital preservation of government records
> use case to the document. I’ve merged use cases 1 and 2 from the wiki page.
>
>
>
> Adam: I have listed a number of issues with the use case that need more
> input – mainly supporting _real_ examples. I’m looking to you to provide the
> necessary and would also appreciate your feedback on whether I’ve captured
> your issues & concerns.
>
>
>
> As ever feedback welcome from everyone.
>
>
>
> Jeremy
>
>
>
> From: Tandy, Jeremy
> Sent: 20 February 2014 15:42
> To: public-csv-wg@w3.org
> Subject: updated use case document
>
>
>
> All … I’ve begun adding use cases and requirements into the use case and
> requirements document (source is on github(gh-pages) @
> csvw/use-cases-and-requirements/index.html) starting with the surface
> temperatures databank use case. I’ll keep going in this style until told
> otherwise; feedback welcome. Regarding the requirements I’ve synthesised,
> this is only the very beginning – and I’m going to need help to extract,
> disambiguate and validate all the requirements we’ve been inferring so far.
>
>
>
> Jeremy



-- 
Adam Retter

eXist Developer
{ United Kingdom }
adam@exist-db.org
irc://irc.freenode.net/existdb
Received on Wednesday, 12 March 2014 18:34:26 UTC