- From: David Booth <david@dbooth.org>
- Date: Fri, 19 Jun 2015 17:42:31 -0400
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Mark Nottingham <mnot@mnot.net>
- CC: "www-tag@w3.org List" <www-tag@w3.org>
Hi Martin,
On 06/19/2015 04:55 AM, Martin J. Dürst wrote:
> Hello David, Mark, others,
>
> On 2015/06/19 16:32, David Booth wrote:
>
>> On 06/19/2015 12:29 AM, Mark Nottingham wrote:
>
>>> And,
>>> since the Web is so big, I certainly wouldn't rule out a collisions
>>> where it *is* misinterpreted as metadata.
>>
>> It certainly is possible in theory that someone with a CSV resource at a
>> particular URI could completely coincidentally and unintentionally
>> create a JSON file with the exact name and exact contents -- including
>> the URI of the CSV resource -- required to cause that JSON to be
>> misinterpreted as metadata for the CSV file. But it seems so unlikely
>> that virtually any non-zero cost to prevent it would be a waste.
>
> In general, I'm not as concerned with this issue as Mark, in particular
> for conventions that are local to a subdirectory. Also, when I read
> Mark's mail, I felt agreeing with David that this isn't really an issue,
> even if the Web is big.
>
> But reading the "exact name" above, I have to say that it's very usual
> for a server to publish the same data with the same name (except maybe
> extension) in different forms (html, xml, json, rdf, csv,...).
> Frameworks such as Ruby on Rails have this capability built in.
>
> From that, it's not such big a step anymore to have cases where the
> JSON can be misinterpreted as metadata for the CSV, or where the use
> case of JSON for publishing the same data and the use case of JSON for
> metadata about CSV conflict.
That risk can also be mitigated to whatever extent we think necessary
by: (a) making the standard metadata URI path more unique; and (b)
requiring the content of the metadata file to more uniquely identify
itself as a CSVW metadata file, perhaps by incorporating a particular
string (traditionally called a 'magic number').
Regarding point (a), the two standard URI path components currently
specified for CSVW are:
{+url}-metadata.json
metadata.json
The phrase "metadata.json" is not particularly unique. A google search
shows 55,800 hits for that. If we want to make it more unique we could
change it to something like "csv-metadata.json" (22 hits) or "csvm.json"
(4 hits). AFAICT all but maybe one of those (22 or 4) hits is
(correctly) about the CSVW work anyway.
Regarding point (b), I've just learned that the CSVW spec already
requires the metadata JSON to contain an @context property containing
"http://www.w3.org/ns/csvw". See
http://w3c.github.io/csvw/metadata/#h-top-level-properties
Since URIs are what we fundamentally use for unique identification on
the web, I think that pretty well lays to rest any concern that a URI
collision could cause a non-CSVW file to be *accidentally* interpreted
as a CSVW file.
I think my biggest concern about using .well-known in this circumstance
is that it would add unnecessary cruft, and that is never good, because
it adds burden and risk. (A rarely used feature invariably gets tested
less.)
Thanks,
David Booth
Received on Friday, 19 June 2015 21:43:00 UTC