- From: Mark Nottingham <mnot@mnot.net>
- Date: Mon, 22 Jun 2015 13:08:44 +1000
- To: David Booth <david@dbooth.org>
- Cc: "www-tag@w3.org List" <www-tag@w3.org>
> On 19 Jun 2015, at 5:32 pm, David Booth <david@dbooth.org> wrote: >> And, >> since the Web is so big, I certainly wouldn't rule out a collisions >> where it *is* misinterpreted as metadata. > > It certainly is possible in theory that someone with a CSV resource at a particular URI could completely coincidentally and unintentionally create a JSON file with the exact name and exact contents -- including the URI of the CSV resource -- required to cause that JSON to be misinterpreted as metadata for the CSV file. But it seems so unlikely that virtually any non-zero cost to prevent it would be a waste. > > Furthermore, this is *exactly* the same risk that would *already* be present if the CSVW processor started with the JSON URI instead of the CSV URI: If the JSON *accidentally* looks like CSVW metadata and *accidentally* contains the URI of an existing CSV resource, then that CSV resource will be misinterpreted, regardless of the content of .well-known/csvm , because a CSVW processor must ignore .well-known/csvm if it is given CSVW metadata to start with, as described in section 6.1: > http://w3c.github.io/csvw/syntax/#h-creating-annotated-tables Right, and the way we prevent that on the Web is by giving something a distinctive media type. AFAICT the audience you're designing this for is "CSV downloads that don't have any context (e.g., a direct link, rather than one from HTML) where the author has no ability to set Link headers." Is that correct? >> Earlier, you talked about the downsides: >> >>> - A *required* extra web access, nearly *every* time a conforming >>> CSVW processor is given a tabular data URL and wishes to find the >>> associated metadata -- because surely >>> http://example/.well-known/csvm will be 404 (and not cachable) in >>> the vast majority of cases. >> >> Why is that bad? HTTP requests can be parallelised, so it's not >> latency. Is the extra request processing *really* that much of an >> overhead (considering we're talking about a comma- or tab- delimited >> file)? > > It's not a big cost, but it is an actual cost, and it's being weighed against a benefit that IMO is largely theoretical. In isolation, I agree that's the right technical determination. This isn't an isolated problem, however; there are lots of applications trying to stake a claim on various parts of URI space. The main reason that I wrote the BCP was because writing protocols on top of HTTP has become popular, and a lot of folks wanted to define "standard" URI paths. As such, this is really a problem of the commons; your small encroachment might not make a big impact on its own, but in concert with others — especially when the W3C as steward of the Web is seen doing this — it starts to have impact. In ten years, I really don't want to have a list of "filenames I can't use on my Web site" because you wanted to save the overhead of a single request in 2015 — especially when HTTP/2 makes requests really, really cheap. Is that "theoretical"? I don't know, but I do think it's important. >> As I pointed out earlier, you can specify a default heuristic for 404 >> on that resource so that you avoid it being uncacheable. > > I doubt many server owners will bother to make that 404 cachable, given that they didn't bother to install a .well-known/csvm file. You misunderstand. You can specify a heuristic for the 404 to be interpreted on the *client* side; it tells consumers that if there's a 404 without freshness information, they can assume a specified default. >>> - Greater complexity in all conforming CSVW implementations. >> >> I don't find this convincing; if we were talking about some involved >> scheme that involved lots of processing and tricky syntax, sure, but >> this is extremely simple, and all of the code to support it >> (libraries for HTTP, Link header parsing and URI Templates) is >> already at hand in most cases. > > I agree that it's not a lot of additional complexity -- in fact it's quite simple -- but it *is* additional code. And I find that really unconvincing. If the bar for doing the right thing is so small and still can't be overcome, we're in a really bad place. Cheers, -- Mark Nottingham https://www.mnot.net/
Received on Monday, 22 June 2015 03:09:13 UTC