Re: Locating file- and directory-specific metadata (Was: Re: Spec review request: CSV on the Web)

Hey Jeni,

> On 20 May 2015, at 3:16 am, Jeni Tennison <jeni@jenitennison.com> wrote:
> 
> To be more explicit about the requirement, publishers need to be able to publish CSV files and their associated metadata such that the metadata can be found by reusers. If it is hard to publish they simply won’t bother, and we won’t get a good impact from the CSV on the Web work. In many cases the publishers of CSV files are not particularly tech-literate, and they are fairly likely to be using shared publishing infrastructure, which is usually not oriented specifically to publishing data.

I'd like to be explicit about the tradeoffs. I see three things, one of which needs to give:

1) The syntax of the CSV format (which currently can't convey this kind of metadata)
2) The ability of your target audience to set metadata on their resources (e.g., HTTP headers, .well-known files)
3) The prohibition on standards squatting in URI namespace


> We have been employing a “Github test” to assess the difficulty of publication in a shared hosting environment. We could equally use a “GOV.UK test” or a “Wordpress test”; the issues are similar.

So these are appeals in favour of #2. I don't see how gov.uk comes into it, as they presumably have complete control over their site. Likewise, a Wordpress install can do pretty much anything with a plugin, which is simple enough for anyone to install (that's their main selling point).

If Github allowed setting Link headers in Pages, would that change the discussion? Because IME their engineers are pretty receptive...


> There is a significant impact on ease of use for publishers if they have to put metadata files into /.well-known as opposed to the same directory as the CSV file(s):
> 
>   * they have to negotiate access to the /.well-known directory (eg for CSV files in w3c.github.com/csvw such as the test suite we would have to ask the W3C staff to create a new repo that we could have access to)

They'd have to create the directory in the w3c.github.io repo and give you access to it.


>   * they have to mirror a potentially changing directory structure within that space
>   * having the files so separate means they’re likely to go out of sync (eg in Github, /.well-known would be a completely different repo)

See alternative solution in separate thread.


> I’m not dismissing this as an approach, just spelling out the concerns about the usability impact which I think is behind the pushback on the suggestion from the Working Group.
> 
> I wondered if there might be another option, perhaps using a .well-known subdirectory within the directory holding the CSV files. However, that’s not supported by RFC 5785, which says:
> 
>    4. Why aren't per-directory well-known locations defined?
> 
>       Allowing every URI path segment to have a well-known location
>       (e.g., "/images/.well-known/") would increase the risks of
>       colliding with a pre-existing URI on a site, and generally these
>       solutions are found not to scale well, because they're too
>       "chatty".
> 
> I don’t really understand the scalability or chattiness arguments here; perhaps you can expand on them?

That was more for site-wide use cases that were being discussed then. We didn't consider the very specific use case you have (retrofitting metadata into a format that doesn't support it well). 

My overall concern here is that there's always going to be pressure from various folks to squat on URI namespace, and the W3C blessing it sends a strong message that it's OK to do — something that would quickly get us into a mess, for all of the reasons outlined in the RFC. Yes, we could go to the trouble to define a per-directory .well-known, but that would only be encouraging people to create their applications in a way that doesn't work well with the Web. 

Cheers,


--
Mark Nottingham   https://www.mnot.net/

Received on Wednesday, 20 May 2015 00:35:50 UTC