Re: Use of .well-known for CSV metadata: More harm than good from Melvin Carvalho on 2015-06-21 (www-tag@w3.org from June 2015)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Sun, 21 Jun 2015 23:42:20 +0200
To: David Booth <david@dbooth.org>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <CAKaEYh+SKyXfvgLy3f2VnAS-n64VT5XXu=A_9wX0q9KoJzyu9Q@mail.gmail.com>
On 18 June 2015 at 21:15, David Booth <david@dbooth.org> wrote:

> The CSVW working group recently sought the TAG's advice on locating
> metadata associated with a tabular data document (typically CSV) retrieved
> from a given URI:
>
> https://github.com/w3ctag/meetings/blob/gh-pages/2015/telcons/06-03-csv-minutes.md
> Among other mechanisms, the CSVW WG proposed that metadata could be
> retrieved from two standard locations (one per file and one per directory)
> relative to the original tabular data document URI:
>
> http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/#standard-file-metadata
>
>   {+url}-metadata.json
>   metadata.json
>
> where {+url} is the URL of the CSV document.  For example, given a tabular
> data URL http://example/foo.csv , a CSVW processor would automatically
> look for its associated metadata at the following URLs:
>
>   http://example/foo.csv-metadata.json
>   http://example/metadata.json
>
> Presumably out of a concern that this would be URI squatting and violate
> RFC7320
> http://tools.ietf.org/html/rfc7320#section-3
> the TAG's guidance was to use the RFC5785 .well-known mechanism to enable
> sites to specify custom metadata URIs based on templates, rather than
> relying on those standard relative locations.
>
> Although URI squatting is an important issue to guard against, I do not
> believe it actually applies in this case, and use of .well-known would
> cause more harm than good.
>
> What distinguishes this case is that a tabular metadata file must
> *explicitly* reference the associated data document in order for it to be
> used as a CSVW metadata document.  This is a critical point, which IMO
> changes the balance of the situation.  It means that: (a) the URI owner has
> clearly indicated the intent to use that metadata URI for that purpose; and
> (b) it does *not* prevent that URI from instead being used for other
> purposes.   It *does* prevent that URI from simultaneously being used for
> the tabular metadata and for some other purpose, and hence it does force
> the URI owner to choose between using it for tabular metadata or for
> something else.  But even in that case, if the URI owner really wants to
> use that URI for another purpose while *still* providing tabular metadata,
> then the URI owner still has the option of publishing the metadata at an
> arbitrary custom URI, and publicizing that location, because the metadata
> file will explicitly reference the data file anyway.  (In other words,
> although the most common case may be that a user would first know the URL
> of the tabular *data* file, and from that seek the associated metadata, it
> is perfectly acceptable -- and in some ways better -- for the user to start
> with the URL of the metadata file, and use that to find the desired data
> file URL.)  For example, the URI owner could publish the metadata at
> http://example/my-foo-metadata.json (which in turn would point to
> http://example/foo.csv ) and then advertise that URL.
>
> Harms that would be caused by requiring the use of .well-known in this
> case include:
>
>  - A *required* extra web access, nearly *every* time a conforming CSVW
> processor is given a tabular data URL and wishes to find the associated
> metadata -- because surely http://example/.well-known/csvm will be 404
> (and not cachable) in the vast majority of cases.
>
>  - Greater complexity in all conforming CSVW implementations.
>
>  - Reduced security, because a change to .well-known/csvm could completely
> change the interpretation of a given tabular data file, and that change
> would be far afield from the directory containing the data file, and thus
> may go completely unnoticed by the owner of the data file.
>
> In short, I think the benefits of .well-known in this case are dubious,
> and far outweighed by the harms.   I think the TAG's guidance to the CSVW
> group should be amended.
>

+1

I will probably be using this quite a bit.  I think there's going to be
quite a few cases of having access to the data directory, but not to
/well-known/.  Dropbox, I think is an example.  Perhaps this is also true
of many shared folders.  Your argument makes a lot of sense.


>
> Thanks,
> David Booth
>
> -------- Forwarded Message --------
> Subject: Re: .well-known
> Resent-Date: Thu, 18 Jun 2015 16:56:48 +0000
> Resent-From: public-csv-wg@w3.org
> Date: Thu, 18 Jun 2015 09:56:15 -0700
> From: Gregg Kellogg <gregg@greggkellogg.net>
> To: David Booth <david@dbooth.org>
> CC: Ivan Herman <ivan@w3.org>, W3C CSV on the Web Working Group <
> public-csv-wg@w3.org>
>
>  On Jun 17, 2015, at 7:43 PM, David Booth <david@dbooth.org> wrote:
>>
>> On 06/17/2015 02:29 AM, Ivan Herman wrote:
>>
>>> David,
>>>
>>> the .well-known mechanism is the result of a long discussion with the
>>> TAG that had difficulties with the principle of baking in URI-schemes
>>> like "-metadata.json".
>>>
>>
>> Is there a pointer to that discussion?   It sounds like the TAG concern
>> is URI squatting.  URI squatting is an important concern, but I don't think
>> it applies in this case, because -- if I've understood correctly -- a
>> metadata file *explicitly* references the relevant data file, which in
>> effect means that the URI owner has clearly indicated an intent to use that
>> URI for that purpose.
>>
>
> Hi David, I found a link to the minutes here:
> https://github.com/w3ctag/meetings/blob/gh-pages/2015/telcons/06-03-csv-minutes.md
> (already added to the issue).
>
> The minutes aren’t particularly illuminating, but the issue raised by mnot
> was definitely concern over squatting. At this point, it seems to be
> settled. I’ve implemented it in my implementation, and it was quite
> straight-forward, although it requires an extra GET, the result of this can
> be cached for some time (subject to policies, of course).
>
>  HOWEVER, I no longer see any mention of .well-known in the current
>> editor's draft, so maybe my concern is moot:
>> http://w3c.github.io/csvw/syntax/#locating-metadata
>>
>
> It’s still in a PR that hasn’t yet been pulled:
> https://github.com/w3c/csvw/pull/605. You likely say a page based on that
> branch, rather than the gh-pages branch where the ED is available.
>
> It’s awaiting resolution of some minor wording on what “no such file is
> located” means, precisely.
>
> Gregg
>
>  Has the .well-known mechanism now been removed from the algorithm for
>> finding metadata?
>>
>> Thanks,
>> David Booth
>>
>>  Note that the agreement is to have a default
>>> fall-back, ie, if the .well-known file does not exist then the client
>>> can fall back to a default value which, actually, reproduces the
>>> previous patterns. I think we should go ahead with this approach to
>>> cover all points of views.
>>>
>>> Ivan
>>>
>>>
>>>
>>>  On 17 Jun 2015, at 05:20 , David Booth <david@dbooth.org> wrote:
>>>>
>>>> I'm sorry to ask this question at this point, but is .well-known
>>>> *really* needed for this?
>>>>
>>>> I am concerned that it is just adding complexity and network
>>>> accesses for dubious benefit.  AFAICT -- but please correct me if
>>>> I've overlooked something -- the only "benefit" that .well-known
>>>> adds here is to allow users to use non-standard names for their
>>>> metadata files.  And what *real* benefit is that?  It seems to me
>>>> to be adding pointless variability.  Are there really cases where
>>>> users *cannot* name their metadata files to end with
>>>> "-metadata.json"?  If so what are they?
>>>>
>>>> David Booth
>>>>
>>>> On 06/16/2015 09:20 PM, Yakov Shafranovich wrote:
>>>>
>>>>> Hmm. I am wondering if we can use the host-meta file instead,
>>>>> skipping the registration, as per this:
>>>>>
>>>>> https://tools.ietf.org/html/rfc6415#section-4.2
>>>>>
>>>>> On Tue, Jun 16, 2015 at 4:01 PM, Gregg Kellogg
>>>>> <gregg@greggkellogg.net> wrote:
>>>>>
>>>>>> On Jun 16, 2015, at 12:55 PM, Yakov Shafranovich
>>>>>> <yakov-ietf@shaftek.org> wrote:
>>>>>>
>>>>>> What's the proposed format?
>>>>>>
>>>>>> It's simply a file with one URI pattern per line. You can see
>>>>>> the proposed text here:
>>>>>>
>>>>>> https://rawgit.com/w3c/csvw/98e728bcfef8d30e68c10f9cd798da0d39c7d172/syntax/index.html#site-wide-location-configuration
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Gregg
>>
>>>
>>>>>>
>>>>>> On Jun 16, 2015 3:38 PM, "Ivan Herman" <ivan@w3.org> wrote:
>>>>>>
>>>>>>>
>>>>>>> Jeni, Gregg,
>>>>>>>
>>>>>>> I have just received the green light from our system people
>>>>>>> to set up the .well-known csw file. Can you ping me when the
>>>>>>> changes are added to the documents and the issue is closed? I
>>>>>>> would also need to know if it should contain anything else
>>>>>>> than the default.
>>>>>>>
>>>>>>> I will also take care of the registration when the document
>>>>>>> is available.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Ivan
>>>>>>>
>>>>>>> ---- Ivan Herman +31 641044153
>>>>>>>
>>>>>>> (Written on my mobile. Excuses for brevity and frequent
>>>>>>> misspellings...)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID:
>>> http://orcid.org/0000-0003-0782-2704
>>>
>>>
>>>
>>>
>>>
>>
>
>
>
>
>
>
>
>
Received on Sunday, 21 June 2015 21:42:49 UTC