- From: David Booth <david@dbooth.org>
- Date: Thu, 18 Jun 2015 15:15:07 -0400
- To: "www-tag@w3.org List" <www-tag@w3.org>
The CSVW working group recently sought the TAG's advice on locating
metadata associated with a tabular data document (typically CSV)
retrieved from a given URI:
https://github.com/w3ctag/meetings/blob/gh-pages/2015/telcons/06-03-csv-minutes.md
Among other mechanisms, the CSVW WG proposed that metadata could be
retrieved from two standard locations (one per file and one per
directory) relative to the original tabular data document URI:
http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/#standard-file-metadata
{+url}-metadata.json
metadata.json
where {+url} is the URL of the CSV document. For example, given a
tabular data URL http://example/foo.csv , a CSVW processor would
automatically look for its associated metadata at the following URLs:
http://example/foo.csv-metadata.json
http://example/metadata.json
Presumably out of a concern that this would be URI squatting and violate
RFC7320
http://tools.ietf.org/html/rfc7320#section-3
the TAG's guidance was to use the RFC5785 .well-known mechanism to
enable sites to specify custom metadata URIs based on templates, rather
than relying on those standard relative locations.
Although URI squatting is an important issue to guard against, I do not
believe it actually applies in this case, and use of .well-known would
cause more harm than good.
What distinguishes this case is that a tabular metadata file must
*explicitly* reference the associated data document in order for it to
be used as a CSVW metadata document. This is a critical point, which
IMO changes the balance of the situation. It means that: (a) the URI
owner has clearly indicated the intent to use that metadata URI for that
purpose; and (b) it does *not* prevent that URI from instead being used
for other purposes. It *does* prevent that URI from simultaneously
being used for the tabular metadata and for some other purpose, and
hence it does force the URI owner to choose between using it for tabular
metadata or for something else. But even in that case, if the URI owner
really wants to use that URI for another purpose while *still* providing
tabular metadata, then the URI owner still has the option of publishing
the metadata at an arbitrary custom URI, and publicizing that location,
because the metadata file will explicitly reference the data file
anyway. (In other words, although the most common case may be that a
user would first know the URL of the tabular *data* file, and from that
seek the associated metadata, it is perfectly acceptable -- and in some
ways better -- for the user to start with the URL of the metadata file,
and use that to find the desired data file URL.) For example, the URI
owner could publish the metadata at http://example/my-foo-metadata.json
(which in turn would point to http://example/foo.csv ) and then
advertise that URL.
Harms that would be caused by requiring the use of .well-known in this
case include:
- A *required* extra web access, nearly *every* time a conforming CSVW
processor is given a tabular data URL and wishes to find the associated
metadata -- because surely http://example/.well-known/csvm will be 404
(and not cachable) in the vast majority of cases.
- Greater complexity in all conforming CSVW implementations.
- Reduced security, because a change to .well-known/csvm could
completely change the interpretation of a given tabular data file, and
that change would be far afield from the directory containing the data
file, and thus may go completely unnoticed by the owner of the data file.
In short, I think the benefits of .well-known in this case are dubious,
and far outweighed by the harms. I think the TAG's guidance to the
CSVW group should be amended.
Thanks,
David Booth
-------- Forwarded Message --------
Subject: Re: .well-known
Resent-Date: Thu, 18 Jun 2015 16:56:48 +0000
Resent-From: public-csv-wg@w3.org
Date: Thu, 18 Jun 2015 09:56:15 -0700
From: Gregg Kellogg <gregg@greggkellogg.net>
To: David Booth <david@dbooth.org>
CC: Ivan Herman <ivan@w3.org>, W3C CSV on the Web Working Group
<public-csv-wg@w3.org>
> On Jun 17, 2015, at 7:43 PM, David Booth <david@dbooth.org> wrote:
>
> On 06/17/2015 02:29 AM, Ivan Herman wrote:
>> David,
>>
>> the .well-known mechanism is the result of a long discussion with the
>> TAG that had difficulties with the principle of baking in URI-schemes
>> like "-metadata.json".
>
> Is there a pointer to that discussion? It sounds like the TAG concern is URI squatting. URI squatting is an important concern, but I don't think it applies in this case, because -- if I've understood correctly -- a metadata file *explicitly* references the relevant data file, which in effect means that the URI owner has clearly indicated an intent to use that URI for that purpose.
Hi David, I found a link to the minutes here:
https://github.com/w3ctag/meetings/blob/gh-pages/2015/telcons/06-03-csv-minutes.md
(already added to the issue).
The minutes aren’t particularly illuminating, but the issue raised by
mnot was definitely concern over squatting. At this point, it seems to
be settled. I’ve implemented it in my implementation, and it was quite
straight-forward, although it requires an extra GET, the result of this
can be cached for some time (subject to policies, of course).
> HOWEVER, I no longer see any mention of .well-known in the current editor's draft, so maybe my concern is moot:
> http://w3c.github.io/csvw/syntax/#locating-metadata
It’s still in a PR that hasn’t yet been pulled:
https://github.com/w3c/csvw/pull/605. You likely say a page based on
that branch, rather than the gh-pages branch where the ED is available.
It’s awaiting resolution of some minor wording on what “no such file is
located” means, precisely.
Gregg
> Has the .well-known mechanism now been removed from the algorithm for finding metadata?
>
> Thanks,
> David Booth
>
>> Note that the agreement is to have a default
>> fall-back, ie, if the .well-known file does not exist then the client
>> can fall back to a default value which, actually, reproduces the
>> previous patterns. I think we should go ahead with this approach to
>> cover all points of views.
>>
>> Ivan
>>
>>
>>
>>> On 17 Jun 2015, at 05:20 , David Booth <david@dbooth.org> wrote:
>>>
>>> I'm sorry to ask this question at this point, but is .well-known
>>> *really* needed for this?
>>>
>>> I am concerned that it is just adding complexity and network
>>> accesses for dubious benefit. AFAICT -- but please correct me if
>>> I've overlooked something -- the only "benefit" that .well-known
>>> adds here is to allow users to use non-standard names for their
>>> metadata files. And what *real* benefit is that? It seems to me
>>> to be adding pointless variability. Are there really cases where
>>> users *cannot* name their metadata files to end with
>>> "-metadata.json"? If so what are they?
>>>
>>> David Booth
>>>
>>> On 06/16/2015 09:20 PM, Yakov Shafranovich wrote:
>>>> Hmm. I am wondering if we can use the host-meta file instead,
>>>> skipping the registration, as per this:
>>>>
>>>> https://tools.ietf.org/html/rfc6415#section-4.2
>>>>
>>>> On Tue, Jun 16, 2015 at 4:01 PM, Gregg Kellogg
>>>> <gregg@greggkellogg.net> wrote:
>>>>> On Jun 16, 2015, at 12:55 PM, Yakov Shafranovich
>>>>> <yakov-ietf@shaftek.org> wrote:
>>>>>
>>>>> What's the proposed format?
>>>>>
>>>>> It's simply a file with one URI pattern per line. You can see
>>>>> the proposed text here:
>>>>> https://rawgit.com/w3c/csvw/98e728bcfef8d30e68c10f9cd798da0d39c7d172/syntax/index.html#site-wide-location-configuration
>>>>>
>>>>>
>>>>>
> Gregg
>>>>>
>>>>>
>>>>> On Jun 16, 2015 3:38 PM, "Ivan Herman" <ivan@w3.org> wrote:
>>>>>>
>>>>>> Jeni, Gregg,
>>>>>>
>>>>>> I have just received the green light from our system people
>>>>>> to set up the .well-known csw file. Can you ping me when the
>>>>>> changes are added to the documents and the issue is closed? I
>>>>>> would also need to know if it should contain anything else
>>>>>> than the default.
>>>>>>
>>>>>> I will also take care of the registration when the document
>>>>>> is available.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>> ---- Ivan Herman +31 641044153
>>>>>>
>>>>>> (Written on my mobile. Excuses for brevity and frequent
>>>>>> misspellings...)
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID:
>> http://orcid.org/0000-0003-0782-2704
>>
>>
>>
>>
>
Received on Thursday, 18 June 2015 19:15:37 UTC