Re: Convenience suggestion: Allow metadata in a CSV file from Gregg Kellogg on 2015-04-30 (public-csv-wg@w3.org from April 2015)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 30 Apr 2015 08:38:07 -0700
To: David Booth <david@dbooth.org>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <191D14A7-5491-43AD-8D77-831AD7900B08@greggkellogg.net>

Hi David, this is certainly an interesting use case. The CSVW model is totally abstracted from the metadata formats, with minimal references to the metadata document. The purpose of metadata processing is to create the annotated data model, which is then used for validation or conversion. The model also in “ Tabular Data Embedding Annotations” [1] suggests that there may be publication-specific mechanisms for embedding metadata within a tabular data file itself, similar to what you propose in your presentation, although it provides no guidance for how such formats might be described.

Given a hypothetical CSV-based metadata format providing much of the expressivity of our JSON-based format, this could work using the process described in “Creating Annotated Tables” [2]. In this case, you start with the CSV containing your metadata and extract embedded metadata, using rules described for your CSV-based metadata variation, which I think should transform in the a JSON representation to be compatible with the rest of the steps defined here, you then process starting with that metadata to extract data much as you describe.

If your CSV-based metadata standard has the general expressivity of the JSON-based standard, it can describe Table Groups to reference other CSVs. The CSV metadata file itself might be considered a table, with dialect information that leads to the embedded fields allowing you access to the test data contained within the file, or can be ignored if you define that Table to have “suppressOutput” set to true. You might even define a runtime variable which would set this value based on a runtime parameter, giving you a test mode inherently.

While I think defining such a format is probably beyond the scope of the group at this time, it would be an interesting exercise. As Jeni suggested, this could be something that a future group might standardize. IMO, it could also be a Note or Member Submission that the current group might publish.

Gregg Kellogg
gregg@greggkellogg.net

[1] http://www.w3.org/TR/tabular-data-model/#tabular-data-embedding-annotations
[2] http://www.w3.org/TR/tabular-data-model/#creating-annotated-tables

> On Apr 29, 2015, at 7:43 PM, David Booth <david@dbooth.org> wrote:
> 
> I don't know if the working group has already considered this, but I'd like to suggest consider allowing CSV metadata to be specified in another CSV file, as an alternative to JSON.  I have found this approach to be quite convenient in a tool that I've been developing, and I think it could increase uptake of a CSV metadata standard.
> 
> Here is a very short mockup video (2 minutes 59 seconds) that illustrates this approach:
> https://www.youtube.com/watch?v=LmQWHdaN8_w
> 
> I realize that some CSV metadata authors may prefer JSON syntax.  But as simple as JSON is, spreadsheet competence is far more widespread.  Also I would not blame anyone for being disinclined to consider this approach given the late date.  But this approach only involves different syntax -- not semantics -- and if it does indeed lower the adoption barrier then it seems to me that it would be worth considering.
> 
> What do others think?
> 
> Thanks,
> David Booth
> 
>

Received on Thursday, 30 April 2015 15:38:38 UTC