- From: Melvin Carvalho <melvincarvalho@gmail.com>
- Date: Sun, 21 Jun 2015 23:42:20 +0200
- To: David Booth <david@dbooth.org>
- Cc: "www-tag@w3.org List" <www-tag@w3.org>
- Message-ID: <CAKaEYh+SKyXfvgLy3f2VnAS-n64VT5XXu=A_9wX0q9KoJzyu9Q@mail.gmail.com>
On 18 June 2015 at 21:15, David Booth <david@dbooth.org> wrote: > The CSVW working group recently sought the TAG's advice on locating > metadata associated with a tabular data document (typically CSV) retrieved > from a given URI: > > https://github.com/w3ctag/meetings/blob/gh-pages/2015/telcons/06-03-csv-minutes.md > Among other mechanisms, the CSVW WG proposed that metadata could be > retrieved from two standard locations (one per file and one per directory) > relative to the original tabular data document URI: > > http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/#standard-file-metadata > > {+url}-metadata.json > metadata.json > > where {+url} is the URL of the CSV document. For example, given a tabular > data URL http://example/foo.csv , a CSVW processor would automatically > look for its associated metadata at the following URLs: > > http://example/foo.csv-metadata.json > http://example/metadata.json > > Presumably out of a concern that this would be URI squatting and violate > RFC7320 > http://tools.ietf.org/html/rfc7320#section-3 > the TAG's guidance was to use the RFC5785 .well-known mechanism to enable > sites to specify custom metadata URIs based on templates, rather than > relying on those standard relative locations. > > Although URI squatting is an important issue to guard against, I do not > believe it actually applies in this case, and use of .well-known would > cause more harm than good. > > What distinguishes this case is that a tabular metadata file must > *explicitly* reference the associated data document in order for it to be > used as a CSVW metadata document. This is a critical point, which IMO > changes the balance of the situation. It means that: (a) the URI owner has > clearly indicated the intent to use that metadata URI for that purpose; and > (b) it does *not* prevent that URI from instead being used for other > purposes. It *does* prevent that URI from simultaneously being used for > the tabular metadata and for some other purpose, and hence it does force > the URI owner to choose between using it for tabular metadata or for > something else. But even in that case, if the URI owner really wants to > use that URI for another purpose while *still* providing tabular metadata, > then the URI owner still has the option of publishing the metadata at an > arbitrary custom URI, and publicizing that location, because the metadata > file will explicitly reference the data file anyway. (In other words, > although the most common case may be that a user would first know the URL > of the tabular *data* file, and from that seek the associated metadata, it > is perfectly acceptable -- and in some ways better -- for the user to start > with the URL of the metadata file, and use that to find the desired data > file URL.) For example, the URI owner could publish the metadata at > http://example/my-foo-metadata.json (which in turn would point to > http://example/foo.csv ) and then advertise that URL. > > Harms that would be caused by requiring the use of .well-known in this > case include: > > - A *required* extra web access, nearly *every* time a conforming CSVW > processor is given a tabular data URL and wishes to find the associated > metadata -- because surely http://example/.well-known/csvm will be 404 > (and not cachable) in the vast majority of cases. > > - Greater complexity in all conforming CSVW implementations. > > - Reduced security, because a change to .well-known/csvm could completely > change the interpretation of a given tabular data file, and that change > would be far afield from the directory containing the data file, and thus > may go completely unnoticed by the owner of the data file. > > In short, I think the benefits of .well-known in this case are dubious, > and far outweighed by the harms. I think the TAG's guidance to the CSVW > group should be amended. > +1 I will probably be using this quite a bit. I think there's going to be quite a few cases of having access to the data directory, but not to /well-known/. Dropbox, I think is an example. Perhaps this is also true of many shared folders. Your argument makes a lot of sense. > > Thanks, > David Booth > > -------- Forwarded Message -------- > Subject: Re: .well-known > Resent-Date: Thu, 18 Jun 2015 16:56:48 +0000 > Resent-From: public-csv-wg@w3.org > Date: Thu, 18 Jun 2015 09:56:15 -0700 > From: Gregg Kellogg <gregg@greggkellogg.net> > To: David Booth <david@dbooth.org> > CC: Ivan Herman <ivan@w3.org>, W3C CSV on the Web Working Group < > public-csv-wg@w3.org> > > On Jun 17, 2015, at 7:43 PM, David Booth <david@dbooth.org> wrote: >> >> On 06/17/2015 02:29 AM, Ivan Herman wrote: >> >>> David, >>> >>> the .well-known mechanism is the result of a long discussion with the >>> TAG that had difficulties with the principle of baking in URI-schemes >>> like "-metadata.json". >>> >> >> Is there a pointer to that discussion? It sounds like the TAG concern >> is URI squatting. URI squatting is an important concern, but I don't think >> it applies in this case, because -- if I've understood correctly -- a >> metadata file *explicitly* references the relevant data file, which in >> effect means that the URI owner has clearly indicated an intent to use that >> URI for that purpose. >> > > Hi David, I found a link to the minutes here: > https://github.com/w3ctag/meetings/blob/gh-pages/2015/telcons/06-03-csv-minutes.md > (already added to the issue). > > The minutes aren’t particularly illuminating, but the issue raised by mnot > was definitely concern over squatting. At this point, it seems to be > settled. I’ve implemented it in my implementation, and it was quite > straight-forward, although it requires an extra GET, the result of this can > be cached for some time (subject to policies, of course). > > HOWEVER, I no longer see any mention of .well-known in the current >> editor's draft, so maybe my concern is moot: >> http://w3c.github.io/csvw/syntax/#locating-metadata >> > > It’s still in a PR that hasn’t yet been pulled: > https://github.com/w3c/csvw/pull/605. You likely say a page based on that > branch, rather than the gh-pages branch where the ED is available. > > It’s awaiting resolution of some minor wording on what “no such file is > located” means, precisely. > > Gregg > > Has the .well-known mechanism now been removed from the algorithm for >> finding metadata? >> >> Thanks, >> David Booth >> >> Note that the agreement is to have a default >>> fall-back, ie, if the .well-known file does not exist then the client >>> can fall back to a default value which, actually, reproduces the >>> previous patterns. I think we should go ahead with this approach to >>> cover all points of views. >>> >>> Ivan >>> >>> >>> >>> On 17 Jun 2015, at 05:20 , David Booth <david@dbooth.org> wrote: >>>> >>>> I'm sorry to ask this question at this point, but is .well-known >>>> *really* needed for this? >>>> >>>> I am concerned that it is just adding complexity and network >>>> accesses for dubious benefit. AFAICT -- but please correct me if >>>> I've overlooked something -- the only "benefit" that .well-known >>>> adds here is to allow users to use non-standard names for their >>>> metadata files. And what *real* benefit is that? It seems to me >>>> to be adding pointless variability. Are there really cases where >>>> users *cannot* name their metadata files to end with >>>> "-metadata.json"? If so what are they? >>>> >>>> David Booth >>>> >>>> On 06/16/2015 09:20 PM, Yakov Shafranovich wrote: >>>> >>>>> Hmm. I am wondering if we can use the host-meta file instead, >>>>> skipping the registration, as per this: >>>>> >>>>> https://tools.ietf.org/html/rfc6415#section-4.2 >>>>> >>>>> On Tue, Jun 16, 2015 at 4:01 PM, Gregg Kellogg >>>>> <gregg@greggkellogg.net> wrote: >>>>> >>>>>> On Jun 16, 2015, at 12:55 PM, Yakov Shafranovich >>>>>> <yakov-ietf@shaftek.org> wrote: >>>>>> >>>>>> What's the proposed format? >>>>>> >>>>>> It's simply a file with one URI pattern per line. You can see >>>>>> the proposed text here: >>>>>> >>>>>> https://rawgit.com/w3c/csvw/98e728bcfef8d30e68c10f9cd798da0d39c7d172/syntax/index.html#site-wide-location-configuration >>>>>> >>>>>> >>>>>> >>>>>> Gregg >> >>> >>>>>> >>>>>> On Jun 16, 2015 3:38 PM, "Ivan Herman" <ivan@w3.org> wrote: >>>>>> >>>>>>> >>>>>>> Jeni, Gregg, >>>>>>> >>>>>>> I have just received the green light from our system people >>>>>>> to set up the .well-known csw file. Can you ping me when the >>>>>>> changes are added to the documents and the issue is closed? I >>>>>>> would also need to know if it should contain anything else >>>>>>> than the default. >>>>>>> >>>>>>> I will also take care of the registration when the document >>>>>>> is available. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Ivan >>>>>>> >>>>>>> ---- Ivan Herman +31 641044153 >>>>>>> >>>>>>> (Written on my mobile. Excuses for brevity and frequent >>>>>>> misspellings...) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: >>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: >>> http://orcid.org/0000-0003-0782-2704 >>> >>> >>> >>> >>> >> > > > > > > > >
Received on Sunday, 21 June 2015 21:42:49 UTC