Standard Path in "Model for Tabular Data and Metadata on the Web" from David Booth on 2014-04-07 (public-csv-wg@w3.org from April 2014)

From: David Booth <david@dbooth.org>
Date: Sun, 06 Apr 2014 22:12:22 -0400
To: "public-csv-wg@w3.org" <public-csv-wg@w3.org>, Jeni Tennison <jeni@jenitennison.com>, 'Gregg Kellogg' <gregg@greggkellogg.com>
Message-ID: <53420986.9070101@dbooth.org>

Regarding sec 3.5 Standard Path,
http://w3c.github.io/csvw/syntax/#standard-path

1. I am very happy to see mention of standard path ideas, as I think in 
most cases a standard path will provide the easiest way for people to 
associate metadata with CSV documents.   A big +1 from me on that!

2. I'm also happy to see the condition that "if the metadata file does 
not explicitly point to the relevant CSV file then it MUST be ignored", 
as this neatly avoids the problem of URI squatting.  Again, +1 from me!

3. However, I think the phrasing of "if the metadata file does not 
*explicitly* point to the relevant CSV file then it MUST be ignored" may 
be slightly overstating the requirement.  To avoid URI squatting, the 
important thing is merely that the metadata document explicitly identify 
itself *as* a CSV metadata document -- not the fact that it is 
associated with any specific data document.   For example, a metadata 
document placed in a directory may be intended to apply to all CSV files 
in that directory.  So I think the question of whether such a 
directory-level metadata document should be required to explicitly list 
all affected CSV documents should be viewed only as a trade-off between:

  (a) the convenience of not needing to modify the directory-level 
metadata file each time a new CSV file is added to that directory; and

  (b) a potential mistake that a publisher might make, in placing a CSV 
file into a directory containing a directory-level metadata file that 
was not intended to apply to that CSV file.

At this point I think it would be substantially better to lean toward 
convenience -- option (a).  Personally, I hate having to make 
coordinated changes in two different places.  It violates the Don't 
Repeat Yourself (DRY) principle.  And if a directory-level metadata file 
were required to explicitly list all of the data files to which it 
applies, I actually think the chances of someone forgetting to add an 
entry to it when adding another data file would be substantial.

I guess one possible middle ground approach would be to require a 
directory-level metadata file to include a filename pattern (actually a 
relative URI pattern), to indicate which files in the directory should 
be governed by that metadata.

4. The current draft (I think) suggests standard locations:

   CSV file:      filename.csv
   Metadata file: filename.csvm

If the metadata itself is encoded as a CSV file, then another 
possibility to consider would be:

   CSV file:      filename.csv
   Metadata file: filename.csv.metadata.csv

This would have the benefit of using an established .csv extension. 
It's also less cryptic than .csvm .

5. In theory it would be fine to offer data publishers multiple Standard 
Path ways to publish a CSV document's metadata, as long as a standard 
prioritization between them is defined.  However, the more ways there 
are, the more hassle it is to implement software in a 
standards-compliant way.  So from this perspective, I think the fewer 
ways there are the better, as long as the standard makes it easy 
*enough* for data publishers.

Thanks,
David

Received on Monday, 7 April 2014 02:12:50 UTC