Finding Metadata for CSV Files from Jeni Tennison on 2014-03-08 (public-csv-wg@w3.org from March 2014)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sat, 8 Mar 2014 09:25:46 +0000
To: public-csv-wg@w3.org
Message-ID: <etPan.531ae21a.7fffca11.f9@jenit.local>

Hi,

It feels to me like the ‘Model for Tabular Data and Metadata on the Web’ is getting close to something publishable. The gaps that I’d like to fill are around indicating how an application might discover annotations to create an annotated data model, or might discover groups of tables to create a grouped data model.

In other words:

How does an application find annotations on tables, columns, rows and fields?
How does an application find groups of tables and common metadata about them?

I can think of four possible answers:

1. Publish a CSV file with a Link rel=describedby header pointing to a file
that provides the annotations, which might also describe other CSV files.

2. Publish a package of CSV file(s) and a file that provides the annotations
(the Simple Data Format / DSPL model).

3. Include a comment line (or something) in a CSV file that points to a file
that provides the annotations, which might also describe other CSV files.

4. Embed annotations within a CSV file, including pointers to other descriptive
documents and CSV files (the Linked CSV / CSV-LD model).

My current thinking is that we should specify all of the above because:

1. is good because the CSV file can remain untouched, but bad because it
relies on publisher access to and control of HTTP headers which is
hard in practice

2. is good because you get everything in one bundle, but bad because it
means duplicating CSV files that belong to multiple packages, making
them hard to keep up to date, and limits linking to individual CSV
files (given we lack a good fragment identifier scheme for packages)

3. is good because it’s a simple addition to a CSV file, but bad because
it means changing existing CSV files and might cause parsing problems
for legacy parsers (depending on how the commenting is done)

4. is good because it enables embedding of metadata within a file (which
means it’s less likely to get out of date) but bad because it means
changing CSV files and might cause parsing/processing problems for
legacy parsers (depending on how the embedding is done)

(3 could be considered a subset of or related to 4.)

What do you all think? Any other methods that I’ve missed?

Jeni
--
Jeni Tennison
http://www.jenitennison.com/

Received on Saturday, 8 March 2014 09:26:08 UTC