- From: Phil Archer <phila@w3.org>
- Date: Thu, 11 Dec 2014 15:13:32 +0000
- To: Annette Greiner <amgreiner@lbl.gov>, Laufer <laufer@globo.com>
- CC: Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Hi everyone, some comments inline below. On 09/12/2014 18:38, Annette Greiner wrote: > Thanks for writing up a nice introduction to metadata. I really like that you addressed the issues of different granularity and different types. We may not even need to include the term as something readers need to be familiar with in advance. +1 In general, I like the idea of defining terms where they are first used in the text. I tend to think we should consider both technical people and their managers when determining what level of technicality to write to, so that someone charged with publishing data on the web can easily point a senior decision-maker to specific best practices in order to get buy-in. +1 > > Because we are really targeting publishers of data, I think the first few sentences are unnecessary. -1 I don't agree that we're only targeting publishers. The charter includes this: "Developers would like easy access to data that is 100% accurate, regularly updated and guaranteed to be available at all times. Data publishers are likely to take a different view. There are disparities between different developers too: for many, data means CSV files and APIs, for others it means linked data and the two sides are often disparaging of each other." So it talks about developers as much as publishers. The data usage vocab is a clear example where users are in scope. It's also often the case that data publishers are also data users and, actually, I rather like the term data broker (a much nicer word than the horrible mangling of the English language that the European Commission uses: 'Infomediary' - gah!). Data brokers seem particularly relevant if we're talking about data enrichment etc. So personally, I like a lot of what Laufer has captured, modulo trivial editorial nit picks. Oh heck, the only way I can explain what I mean is to edit it... [An hour passes] Right. I've edited the metadata section in my current fork of the doc at http://philarcher1.github.io/dwbp-1/bp.html#metadata My edited version of Laufer's text presents two primary classes of stakeholder - publisher and consumer - but then goes on to say that there are many other roles including data brokers. Incidentally, I had to look up the word 'subjacent.' I kicked off a short Twitter discussion and, after that, changed it to 'underlying.' https://twitter.com/philarcher1/status/543014767884259328 > You could start with the sentence, “Metadata is data about data.” That nicely clues the reader to the fact that this is an introduction that will explain what metadata is. > > I don’t understand why there is a paragraph about distribution formats included here. Not only is it out of scope, it seems largely off topic. Yes and no. I don't think we can be completely silent about different data formats. What we can do, as I've tried to do in my edits, is to say that the intentions are normative (an instance of an RFC 2119 keyword is coming up shortly). But the implementation is a suggestion. More in a sec. > > I think we should have here some explicit best practices that are about metadata more generally than specific fields, like “metadata should be available in human readable and machine-readable forms”. +1 Actually, the fact that you have to provide metadata at all is a BP as far as I'm concerned. So I write it out. That gave me a chance to write an actual best practice which is not as easy as one might imagine, even for one as basic as "provide metadata." I used RFC 2119 in the Intended Outcome section. My proposal is that each BP has such a keyword (MUST, SHOULD, MAY). Two more of Laufer's paragraphs could also be turned into BPs: 1. Human and Machine Readable 2. Standard vocabularies 3. A BP on descriptive metadata - in more detail than in the BP already provided. 4. A BP on structural metadata - ditto. 5. Domain-specific (I'm sure Annette and Eric S can come up with examples), mine might be GTFS for transport data. That is a best practice in itself, so I think it should get more than just a mention in the introduction. > > The organization of the numbered sections is confusing to me. The last sentence of the intro suggests that the data licenses and other sections below are subsections of metadata, but the numbers indicate otherwise, and it’s not at all clear where the metadata section is meant to end. There is also an allusion to an introduction for a “data organization” subsection that seems to be between the metadata level and the examples of metadata. > > In a larger issue, probably not something we can address in the current draft, I’m not sure that the data lifecycle-based document structure is very helpful in terms of finding a specific best practice. I’m finding it difficult to guess where things are. In a way, everything should fit under the rubric of best practices for data publication. Bernadette has answered these last two points. HTH Phil. (goes off to write to GitHub guru Yaso to work out how to heck to force a merge...) Phil. > > -- > Annette Greiner > NERSC Data and Analytics Services > Lawrence Berkeley National Laboratory > 510-495-2935 > > On Dec 5, 2014, at 9:38 AM, Laufer <laufer@globo.com> wrote: > >> Hello all, >> >> I wrote a description for the beginning of the metadata section and I want to ask the group to comment: >> >> http://w3c.github.io/dwbp/bp.html#metadata >> >> Thank you. >> >> Cheers, >> Laufer >> >> -- >> . . . .. . . >> . . . .. >> . .. . > > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Thursday, 11 December 2014 15:13:15 UTC