Re: Standard Path in "Model for Tabular Data and Metadata on the Web" from Andy Seaborne on 2014-04-07 (public-csv-wg@w3.org from April 2014)

From: Andy Seaborne <andy@apache.org>
Date: Mon, 07 Apr 2014 11:32:38 +0100
To: public-csv-wg@w3.org
Message-ID: <53427EC6.20709@apache.org>

On 07/04/14 03:12, David Booth wrote:
> Regarding sec 3.5 Standard Path,
> http://w3c.github.io/csvw/syntax/#standard-path
>
> 1. I am very happy to see mention of standard path ideas,
...
> 3. However, I think the phrasing of "if the metadata file does not
> *explicitly* point to the relevant CSV file then it MUST be ignored" may
> be slightly overstating the requirement.  To avoid URI squatting, the
> important thing is merely that the metadata document explicitly identify
> itself *as* a CSV metadata document -- not the fact that it is
> associated with any specific data document.   For example, a metadata
> document placed in a directory may be intended to apply to all CSV files
> in that directory.  So I think the question of whether such a
> directory-level metadata document should be required to explicitly list
> all affected CSV documents should be viewed only as a trade-off between:
>
>   (a) the convenience of not needing to modify the directory-level
> metadata file each time a new CSV file is added to that directory; and
>
>   (b) a potential mistake that a publisher might make, in placing a CSV
> file into a directory containing a directory-level metadata file that
> was not intended to apply to that CSV file.
>
> At this point I think it would be substantially better to lean toward
> convenience -- option (a).  Personally, I hate having to make
> coordinated changes in two different places.  It violates the Don't
> Repeat Yourself (DRY) principle.  And if a directory-level metadata file
> were required to explicitly list all of the data files to which it
> applies, I actually think the chances of someone forgetting to add an
> entry to it when adding another data file would be substantial.
>
> I guess one possible middle ground approach would be to require a
> directory-level metadata file to include a filename pattern (actually a
> relative URI pattern), to indicate which files in the directory should
> be governed by that metadata.

Convenience is important is getting reuseable data published but unless 
(a) is restricted to CSV files, not other file types, it can be tricky. 
Common files like README.md, LICENCE, instructions.html, slides.pdf, 
will appear at the top level and aren't being described by a CSV-centric 
description.  (Requiring putting this info in metadata is wrong because 
its existing practice in the hope that people read the relevant files.)

So a pattern "*.csv" seems better.  There might be a default.

Digressing a little:

There are already proposals for this sort of thing (packaging) - what I 
haven't found are assessmenets of how successful (or not) they have 
been.  I'd like to see that to gauge the cost/benefit of asking the data 
publishers to provide metadata.

	Andy

Received on Monday, 7 April 2014 10:33:09 UTC