Re: old issues we've been ignoring

Hi Annette,

Thanks again for your review and your comments! I made some updates on the
doc [1] considering your last message, but I still have some comments.

>
>
>>
>> Provide descriptive metadata
>> Re the possible approach to implementation, the list of metadata fields
>> to be included is not an implementation, so that should be moved up and
>> listed under intended outcome. Spatial coverage and temporal period are
>> irrelevant for lots of datasets, so they should be marked “if relevant".
>> Keywords and themes/categories are dependent on the context of a catalog,
>> so I think we should leave them out of this list, or say that they are
>> needed in that case only.
>>
>
> ---> Suggestion:
>
> The machine readable version of the descriptive metadata can be provided
> using the vocabulary recommended by W3C to describe datasets, i.e. the
> Data Catalog Vocabulary [VOCAB-DCAT
> <http://w3c.github.io/dwbp/bp.html#bib-VOCAB-DCAT>]. This provides a
> framework in which datasets can be described as abstract entities.
>
> Descriptive metadata should include the following overall features of a
> dataset:
>
>    - The *title* and a *description* o--->f the dataset.
>    - The *keywords* describing the dataset.
>    - The *date of publication* of the dataset.
>    - The *entity responsible (publisher)* for making the dataset
>    available.
>    - The * contact point * of the dataset.
>
> When relevant, the following metadata can also be included:
>
>    - The *spatial coverage * of the dataset.
>    - The * temporal period * that the dataset covers.
>    - The * themes/categories * covered by a dataset.
>
>
> I'm a little confused about this one. Are we saying that all the fields
> listed in first group should be included in order to meet the criteria of
> the BP? If that's the case, I think that list belongs in the intended
> outcome rather than the implementation. The implementation section
> shouldn't be telling us what we *should* do, right? I think it would be
> okay if we just removed the "shoulds".
>

---> I replaced "should" by "can". This is just a suggestion of information
that can be provided as descriptive metadata. I think it should be as part
of the approach to implementation. Do you agree?

>
>
>>
>> Use a trusted serialization format for preserved data dumps
>> To the extent that this is in scope, it is covered under the BP about
>> using standardized formats. We could add a note to that mentioning the
>> value for preservation. I don’t think this needs to be a separate BP.
>>
>> Update the status of identifiers
>> To the extent that this is in scope, it should be covered under
>> versioning or unavailability. What are “preserved” datasets? Are they
>> available on the web? If not, it is out of scope. If they are, then they
>> are versions.
>>
>
> --> I created an issue to discuss this with the group - ISSUE-251 [1]
>
>
---> As agreed in our las meeting, I included a note on the introduction of
the Data Preservation section [2]


>
>> Sensitive Data: The introduction gives a lot of advice that sounds like
>> it should be in a BP. I find it awkward that we offer it in this form
>> instead of a BP. If we want to say that it is out of scope, then we
>> shouldn't be offering all this advice in an introduction.
>>
>
> ---> I don't agree. I think it is out of scope of the document to identify
> the sensitive data and to tell how to protect the sensitive data. But once
> the sensitive data was identified and properly protected, then the BP shows
> what should be done to tell consumers why the data is not available.
>
> I agree that it's out of scope to tell how to identify sensitive data and
> how to protect it. But the introduction still says to " identify all
> sensitive data, assess the exposure risk, determine the intended usage,
> data user audience and any related usage policies, obtain appropriate
> approval, and determine the appropriate security measures needed to taken
> to protect the data" and to " preserve the privacy of individuals where the
> release of personal information would endanger safety (unintended
> accidents) or security (deliberate attack)." Those sound like BPs to me.
> I'd like to hear what other people in the group think, though.
>

---> We had a long discussion about this during the F2F in São Paulo [3].
We agreed to remove the BP on Preserve People's Right to privacy and to
review the sensitive data section. So, I think we shouldn't create a new
BP. The paragraph in the introduction was rewritten considering the
discussion that we had during the F2F.

BP32, provide information about feedback
> The possible approach to implementation is about assigning metadata about
> the feedback. I don't think this is a best practice, and in any case, it's
> not an implementation of providing *useful* information about feedback. The
> useful information is the actual feedback, not metadata about it. I would
> suggest implementation with an issue tracker. The tests have the same
> problem, they are about testing metadata, not testing that the feedback
> itself can be read by other users.
>

---> I agree that we shouldn't mention metadata about feedback. I have a
suggestion for the rewriting of this BP:

Best Practice 32: Make feedback available

Feedback  should be available for both human users and computer applications

Why

Making feedback about datasets and distributions publicly available allows
users to become aware of other data consumers, supports a collaborative
environment, and allows user community experiences, concerns or questions
are currently being addressed. Providing feedback in a machine-readable
format allows computer applications to automatically collect and process
feedback about datasets.

Intended Outcome

It should be possible for humans to have access to feedback on a dataset or
distribution given by one or more data consumers.

It should be possible for machines to automatically process feedback  about
a dataset or distribution.

Possible Approach to Implementation

Feedback can be availabe  as part of an HTML Web page, but it can also be
provided in a machine-readable format according to the vocabulary to
describe dataset usage  [DUV <http://w3c.github.io/dwbp/bp.html#bib-DUV>].

How to Test
Check if a human consumer can access the feedback about the dataset or
distribution and check if a computer application can automatically process
the feedback.
Please let me know if you agree with my suggestions.


I like this except for the requirement of having the feedback machine
readable. I think it's a best practice to make it human readable, but I
don't see a compelling reason to make the feedback machine readable.  I
have never done that. Do other people think that is a common practice? It
seems to me one could get caught in an infinite loop of providing feedback
as a dataset and getting feedback on the feedback dataset, etc.

---> One of the reasons of having feedback machine readable is to make it
easier to collect feedback about datasets. It will also be possible to
process the feedback and it will be easier to share feedback with
consumers. Does it make sense for you?

Thanks!
Berna

[1] http://w3c.github.io/dwbp/bp.html
[2] http://w3c.github.io/dwbp/bp.html#dataPreservation
[3] https://www.w3.org/2015/09/24-dwbp-minutes





Thanks!
Bernadette

[1] https://www.w3.org/2013/dwbp/track/issues/251



>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------


-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory





-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Tuesday, 5 April 2016 16:28:13 UTC