old issues we've been ignoring

Following up on the list of issues I had in my email of 6/18/15, I see 
we are just now getting around to addressing the things I marked as 
essential to address before the next publication. That made me wonder 
where we are on the items that weren't starred, so I went through the list.

The following are still issues:
-------------------------

Data Quality
The introduction says that quality “can affect the potentiality of the application that use data”. I don’t understand that phrase.

Provide descriptive metadata
Re the possible approach to implementation, the list of metadata fields 
to be included is not an implementation, so that should be moved up and 
listed under intended outcome. Spatial coverage and temporal period are 
irrelevant for lots of datasets, so they should be marked “if relevant". 
Keywords and themes/categories are dependent on the context of a 
catalog, so I think we should leave them out of this list, or say that 
they are needed in that case only.

  Use standardized terms
should be “Standardized terms should be used to provide metadata whenever they are available.” In scientific domains, often there are no standard terms yet available.
(The test for this one should at least allow for some terms to not be standardized, because often there is no standard.)

Use a trusted serialization format for preserved data dumps
To the extent that this is in scope, it is covered under the BP about using standardized formats. We could add a note to that mentioning the value for preservation. I don’t think this needs to be a separate BP.

Update the status of identifiers
To the extent that this is in scope, it should be covered under versioning or unavailability. What are “preserved” datasets? Are they available on the web? If not, it is out of scope. If they are, then they are versions.

Feedback
We say “blogs and other publicly available feedback should be displayed in a human-readable form through the user interface.” That suggests that publishers should re-publish blog content, which is probably not what we want (copyright issues, for one thing). Publishers of data can’t control the format of other people’s publications.


The following are new issues related to issues in that same email:
-----------------------------------------------------------------
BP28, Assess dataset coverage, is still written in the context of archiving data, which we have agreed was out of scope. It is valuable for the point that datasets should have minimal dependencies on external entities that may not be preserved. It needs to be rewritten to be about that rather than about assessing a dataset for its value in an archive.

Sensitive Data: The introduction gives a lot of advice that sounds like it should be in a BP. I find it awkward that we offer it in this form instead of a BP. If we want to say that it is out of scope, then we shouldn't be offering all this advice in an introduction.

BP32, provide information about feedback
The possible approach to implementation is about assigning metadata about the feedback. I don't think this is a best practice, and in any case, it's not an implementation of providing *useful* information about feedback. The useful information is the actual feedback, not metadata about it. I would suggest implementation with an issue tracker. The tests have the same problem, they are about testing metadata, not testing that the feedback itself can be read by other users.

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory

Received on Thursday, 24 March 2016 19:49:20 UTC