Re: Use of .well-known for CSV metadata: More harm than good from David Booth on 2015-06-19 (www-tag@w3.org from June 2015)

From: David Booth <david@dbooth.org>
Date: Fri, 19 Jun 2015 17:42:31 -0400
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Mark Nottingham <mnot@mnot.net>
CC: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <55848CC7.9010500@dbooth.org>

Hi Martin,

On 06/19/2015 04:55 AM, Martin J. Dürst wrote:
> Hello David, Mark, others,
>
> On 2015/06/19 16:32, David Booth wrote:
>
>> On 06/19/2015 12:29 AM, Mark Nottingham wrote:
>
>>> And,
>>> since the Web is so big, I certainly wouldn't rule out a collisions
>>> where it *is* misinterpreted as metadata.
>>
>> It certainly is possible in theory that someone with a CSV resource at a
>> particular URI could completely coincidentally and unintentionally
>> create a JSON file with the exact name and exact contents -- including
>> the URI of the CSV resource -- required to cause that JSON to be
>> misinterpreted as metadata for the CSV file.  But it seems so unlikely
>> that virtually any non-zero cost to prevent it would be a waste.
>
> In general, I'm not as concerned with this issue as Mark, in particular
> for conventions that are local to a subdirectory. Also, when I read
> Mark's mail, I felt agreeing with David that this isn't really an issue,
> even if the Web is big.
>
> But reading the "exact name" above, I have to say that it's very usual
> for a server to publish the same data with the same name (except maybe
> extension) in different forms (html, xml, json, rdf, csv,...).
> Frameworks such as Ruby on Rails have this capability built in.
>
>  From that, it's not such big a step anymore to have cases where the
> JSON can be misinterpreted as metadata for the CSV, or where the use
> case of JSON for publishing the same data and the use case of JSON for
> metadata about CSV conflict.

That risk can also be mitigated to whatever extent we think necessary 
by: (a) making the standard metadata URI path more unique; and (b) 
requiring the content of the metadata file to more uniquely identify 
itself as a CSVW metadata file, perhaps by incorporating a particular 
string (traditionally called a 'magic number').

Regarding point (a), the two standard URI path components currently 
specified for CSVW are:

  {+url}-metadata.json
  metadata.json

The phrase "metadata.json" is not particularly unique.  A google search 
shows 55,800 hits for that.  If we want to make it more unique we could 
change it to something like "csv-metadata.json" (22 hits) or "csvm.json" 
(4 hits).  AFAICT all but maybe one of those (22 or 4) hits is 
(correctly) about the CSVW work anyway.

Regarding point (b), I've just learned that the CSVW spec already 
requires the metadata JSON to contain an @context property containing 
"http://www.w3.org/ns/csvw".  See
http://w3c.github.io/csvw/metadata/#h-top-level-properties
Since URIs are what we fundamentally use for unique identification on 
the web, I think that pretty well lays to rest any concern that a URI 
collision could cause a non-CSVW file to be *accidentally* interpreted 
as a CSVW file.

I think my biggest concern about using .well-known in this circumstance 
is that it would add unnecessary cruft, and that is never good, because 
it adds burden and risk.  (A rarely used feature invariably gets tested 
less.)

Thanks,
David Booth

Received on Friday, 19 June 2015 21:43:00 UTC