Comment on /TR/tabular-data-model concerning standard file/directory metadata

Dear CSV WG,

This is a comment on your draft, “Model for Tabular Data and Metadata on the Web”.
http://www.w3.org/TR/2015/WD-tabular-data-model-20150108/

Let me first say that the document is great, and I expect it to serve as a solid foundation for future work around CSV.

There is however an issue that I think should be reconsidered.

It concerns sections 3.4 and 3.5, “Standard File Metadata” and “Standard Directory Metadata”.
http://www.w3.org/TR/2015/WD-tabular-data-model-20150108/#standard-file-metadata
http://www.w3.org/TR/2015/WD-tabular-data-model-20150108/#standard-directory-metadata

The mechanism described there lacks a realistic use case, and is bad for all sorts of reasons, including:

- It means that conforming processors must make three requests to retrieve a single CSV file, two of which will almost always fail.

- It does not include any protocol by which client and server can work out in advance that the metadata requests would be futile and hence should be avoided.

- It violates the axiom of URI opacity.

- It hobbles the ability of publishers who would like to deploy a different URI design, restricting their ability to manage their URI space the way they like, or to evolve it in the future.

- It makes setups where data and metadata are published from separate systems (e.g., data on FTP server, metadata on a CKAN-style data catalogue) unnecessarily complicated and awkward.

- It gets even worse if a format different from JSON becomes somewhat popular in the future, as now processors will have to do even more requests in search of a file that probably isn’t there.

- It only addresses an unrealistic use case where the publisher is so untechnical that they can only publish static files from the file system, but is also so technical that they can write JSON by hand.

- It is equivalent to a proposal to discover the landing page for some-image.gif by going to some-image.gif-landing-page.html. No one should implement such a ridiculous thing.

If everyone was designing protocols like this, the web would be a firework of 404s where clients poke blindly at servers…

The solution, I think, is simple: When metadata is published in a separate file, instead of sending around the URL of the CSV file, one should send around the URL of the metadata file, which contains a pointer to the CSV file.

Best,
Richard


(This is my personal opinion and I do not speak for my employer.)

Received on Monday, 9 March 2015 22:29:46 UTC