RE: Request for comment on "digital preservation of government records" use case from Tandy, Jeremy on 2014-03-14 (public-csv-wg@w3.org from March 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Fri, 14 Mar 2014 15:25:06 +0000
To: Adam Retter <adam@exist-db.org>
CC: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE2B4AC50@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>
Hi Adam

Apologies if my translation of your use case was inaccurate; always a risk when shaping content supplied by others. 

You indicate that you are unable to provide CSVs of the "Transcriptions of Records" instances as these are closed records. That's a shame as I think the tangible examples enable readers to better understand the issues.

Regarding the RDF, what I had hoped to see was how the columns in the Transcriptions of Records mapped onto RDF. This would provide insight regarding the semantics you were trying to capture.

* Reference to Data Definition Resource (DDR) [...]

Given that we can't include a real example instance, can you provide a small number of "noteworthy" CSV schemas and, perhaps, a couple of test files that are conformant?

* Provide reference to TNA Records discovery catalogue

Without a specific "Transcriptions of Records" instance in the example, I agree that this adds no value. I had hoped to "close the circle" from supplied CSV to final exposure in the catalogue.

* Please indicate what software language and/or tools/libraries [...]

The information you added about XML/XSLT processing is useful; a good addition to the use case.

...

I've quickly amended the use case to pare back the issue to only indicate the anticipated inclusion of CSV schema examples and add information about the transformation. See: http://w3c.github.io/csvw/use-cases-and-requirements/#UC-DigitalPreservationOfGovernmentRecords 

I'm happy to incorporate additional changes that you feel are necessary.

However, please note that we're aiming for a cutoff for FPWD on Monday (well, Sunday really as I am travelling to the US on Monday), so there is almost no time to include amendments. However, I anticipate making amendments as the comments from FPWD come in, so can include amendments at this time.

Jeremy

-----Original Message-----
From: Adam Retter [mailto:adam@exist-db.org] 
Sent: 12 March 2014 18:34
To: Tandy, Jeremy
Cc: W3C CSV on the Web Working Group
Subject: Re: Request for comment on "digital preservation of government records" use case

Hi Jeremy,

I have just seen this and will try and address it in the next few days. I think some of the described issues have come about from the interpretation of my use-cases. I think perhaps these were not well enough explained by me. I just wanted to clarify some of your issues with you, to make sure I understand what you are asking:

* Reference to Data Definition Resource (DDR) for a transcription of records is required.
Are you looking for the CSV Schemas that we use for validating our CSV files? If so there are quite a few different ones for different metadata collections, do you just want one or two of the more interesting ones?

* Provide reference to TNA Records discovery catalogue.
I am not sure what value this would provide? The Discovery system is entirely separate to the project I am working on, a sub-set of our metadata (that originally came in CSV) is exported to Discovery in XML format.

* Use case also requires a real example to refer to; both the CSV metadata and a reference into the catalogue. Please can this show how the row type leads to different validation being applied?
It is not possible for me to provide real data, as the records that I am working with in respect to CSV Validation are closed records and are not publicly available.

* Resulting RDF transformed from the example CSV is required.
Again I cannot really provide this, as this RDF forms part of our larger graph which contains sensitive non-public information. Also I am not sure this RDF would help anyone, as we are not directly converting CSV into RDF, rather we post-process the CSV and eventually produce RDF which add's properties to our existing graph resources. As such this RDF really just represents our own internal weird Digital Archiving data model, of which properties from multiple CSV files form a part.

* Please indicate what software language and/or tools/libraries are used to work with the CSV data - both validation and transformation.
Are you looking for something like:
Validation is done using the CSV Validator Open Source tool that we published on GitHub - https://github.com/digital-preservation/csv-validator which is written in Scala. However this is already mentioned and linked in the body of the use-case. Regards transformation, our CSV becomes XML using XSLT
(https://github.com/digital-preservation/csv-tools/blob/master/csv-to-xml_v3.xsl)
and from there we further process it using a mix of XSLT, Java and Scala, eventually we use XSLT to convert some data model expressed in XML into a different data model expressed in RDF/XML.


I am not sure if the translation of my use-cases into this new document is accurate. Once I hear from you about the above, I will try and do some work on this, and then perhaps you could re-read and tell me if it makes more sense and if there are a new set of issues.

On 10 March 2014 18:18, Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk> wrote:
> Hi Adam – I don’t know whether you saw Use Case #1 - Digital 
> preservation of government records? There are a number of issues to 
> resolve that I hope you can help with. Many thanks, Jeremy
>
>
>
> From: Tandy, Jeremy
> Sent: 21 February 2014 12:58
> To: 'public-csv-wg@w3.org'
> Subject: RE: updated use case document
>
>
>
> I have now added Adam Retter’s digital preservation of government 
> records use case to the document. I’ve merged use cases 1 and 2 from the wiki page.
>
>
>
> Adam: I have listed a number of issues with the use case that need 
> more input – mainly supporting _real_ examples. I’m looking to you to 
> provide the necessary and would also appreciate your feedback on 
> whether I’ve captured your issues & concerns.
>
>
>
> As ever feedback welcome from everyone.
>
>
>
> Jeremy
>
>
>
> From: Tandy, Jeremy
> Sent: 20 February 2014 15:42
> To: public-csv-wg@w3.org
> Subject: updated use case document
>
>
>
> All … I’ve begun adding use cases and requirements into the use case 
> and requirements document (source is on github(gh-pages) @
> csvw/use-cases-and-requirements/index.html) starting with the surface 
> temperatures databank use case. I’ll keep going in this style until 
> told otherwise; feedback welcome. Regarding the requirements I’ve 
> synthesised, this is only the very beginning – and I’m going to need 
> help to extract, disambiguate and validate all the requirements we’ve been inferring so far.
>
>
>
> Jeremy



--
Adam Retter

eXist Developer
{ United Kingdom }
adam@exist-db.org
irc://irc.freenode.net/existdb
Received on Friday, 14 March 2014 15:25:41 UTC