Re: RFC Words - Levels

Hi Annette,

Thanks for your messsage! Please, find some comments below.

2015-09-14 22:03 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:

> Regarding the two questions, I don’t think we need to worry about whether
> a maturity level applies at the level of a dataset or a collection. It’s up
> to the publisher of the data to decide whether they will follow/claim a
> certain level for a dataset or a collection.
>

I agree with you that it's up to publisher to decide if they will follow a
certain level for a dataset or a collection. I made this question just to
have a better idea of how to describe the "maturity model". For example, a
publisher is level 1 if all datasets that he published meet the
requirements of level 1 or a publisher may have datasets published with
different levels?  Will  the categorization/level be given to the publisher
or to the datasets?



> For each BP, the relevant aspects will be different. Breaking BPs out into
> multiple levels is a matter of determining the aspects that are relevant to
> that BP or group of BPs. So, for metadata, you could say the lowest level
> is “provide structural metadata and provide localization metadata for
> locale-sensitive fields" , because we’d rather have some incomplete
> metadata than none at all, but the data is meaningless without structural
> metadata, and locale-sensitive fields are meaningless without the
> localization info. The next level could be “provide  descriptive metadata”,
> less crucial but still a huge help to have at least something. The third
> level could be “Provide complete descriptive metadata, including license
> information, provenance, quality information, and versioning information.”
>
> Some of the metadata BPs I mention above seem like they could still be
> separate BPs, just split into their own levels. “provide license
> information” could be satisfied at a low level by providing a custom
> description of licensing rules, and a higher level of maturity would be to
> use a standard license.
>
> This begs the question of how many levels we should have, and how we will
> assign them. If we go for three, how do we assign the groups of only two?
> We might be able to determine that by splitting up each BP or set of BPs
> however seems natural for that group, and seeing what turns out to be the
> highest number of levels for any such group. Then we can try and come up
> with some general rules to describe the levels. (I think the lowest and
> highest levels will be easy to generalize, but the middle ones will be
> hard.) Once we have generalized rules, it should be easy to assign the
> groups that have fewer levels.
>

I also agree with you that BP of the same subject may belong to different
levels and that we should have a way to categorize the BP. I was thinking
about this and maybe we could use the expected characteristics of a dataset
in order to specify the differente levels. This also helps to give more
meaning to the BP, for example, "if these BP are followed then the
resulting datasets will be comprehensible and discoverable". In other
words, if a publisher follows BPX and BPY to publish dataset Z then dataset
Z will be level 1, for example. So, instead of applying the levels for each
group of BP then BP of different groups will be combined at the same level.

Consider a possible list of expected characteristics of a dataset given
below:

comprehensible
acessible
reusable
trustworthy
discoverable
processable
interoperable
linkable

Then, based on these aspects, we would propose the different levels of
maturity. For example:

Level 1: Comprehensible
Best Practice 4: Provide structural metadata
Best Practice 3: Provide locale parameters metadata

Level 2: Discoverable
Best Practice 2: Provide descriptive metadata

Level 3: Trustworthy
Best Practice 6: Provide data provenance information
Best Practice 7: Provide data quality information
Best Practice 26: Provide data up to date

....

I am also not sure of how many levels to propose, but I think that some
aspects may be combined at the same level to avoid a long list of levels.


> I would like us to try and avoid the use of SHOULD and MUST altogether,
> since their use in a best practice recommendation cannot agree with their
> RFC2119 meanings. (The web will not break if you fail to provide metadata
> with your dataset.) Instead of saying “Datasets must have x, y, and z.” we
> can simply say “Provide x, y, and z.”
> -Annette
>

I agree!

Cheers,
Bernadette



>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Sep 14, 2015, at 5:00 PM, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
> wrote:
>
>
> Hi Laufer,
>
> I agree with you that we should have more fine grained sets of best
> practices. It is also important to review the BP to make sure that SHOULD
> and MUST were used correctly.  IMO we should also discuss what type of
> classification we'd like to have with the maturity model. I have some
> questions about this:
>
> The maturity model will be used to evaluate a single dataset or a set of
> datasets?
> Which main aspects should be considered for the evaluation?
>
> Thanks!
> Bernadette
>
> 2015-09-04 11:21 GMT-03:00 Laufer <laufer@globo.com>:
>
>> Hi All,
>>
>> After our discussions about maintaining or not the RFC words and creating
>> or not a mature model in conjunction with a set of BP levels, I grouped the
>> BPs by RFC words:
>>
>> MUST
>>     Best Practice  1: Provide metadata
>>     Best Practice  2: Provide descriptive metadata
>>     Best Practice  4: Provide structural metadata
>>     Best Practice 10: Use persistent URIs as identifiers
>>     Best Practice 12: Use machine-readable standardized data formats
>>     Best Practice 21: Preserve people's right to privacy
>>     Best Practice 26: Provide data up to date
>>     Best Practice 29: Use a trusted serialization format for preserved
>> data dumps
>>
>> SHOULD
>>     Best Practice  3: Provide locale parameters metadata
>>     Best Practice  5: Provide data license information
>>     Best Practice  6: Provide data provenance information
>>     Best Practice  7: Provide data quality information
>>     Best Practice  8: Provide versioning information
>>     Best Practice  9: Provide version history
>>     Best Practice 11: Assign URIs to dataset versions and series
>>     Best Practice 13: Use non-proprietary data formats
>>     Best Practice 14: Provide data in multiple formats
>>     Best Practice 15: Use standardized terms
>>     Best Practice 16: Document vocabularies
>>     Best Practice 17: Share vocabularies in an open way
>>     Best Practice 18: Vocabulary versioning
>>     Best Practice 19: Re-use vocabularies
>>     Best Practice 20: Choose the right formalization level
>>     Best Practice 22: Provide data unavailability reference
>>     Best Practice 23: Provide bulk download
>>     Best Practice 24: Follow REST principles when designing APIs
>>     Best Practice 25: Provide real-time access
>>     Best Practice 27: Maintain separate versions for a data API
>>     Best Practice 28: Assess dataset coverage
>>     Best Practice 30: Update the status of identifiers
>>     Best Practice 31: Gather feedback from data consumers
>>     Best Practice 32: Provide information about feedback
>>     Best Practice 33: Enrich data by generating new metadata.
>>
>> We currently have two groups of BPs to guide the publisher.
>>
>> Maybe we could, from this two groups, make an exercise to define a more
>> fine grained set of groups to, in some sense, assert some "quality"
>> (mature) to a published dataset.
>>
>> What do you think about this?
>>
>> Cheers,
>> Laufer
>>
>> --
>> .  .  .  .. .  .
>> .        .   . ..
>> .     ..       .
>>
>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
> ----------------------------------------------------------------------------
>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
> ----------------------------------------------------------------------------
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Tuesday, 15 September 2015 12:25:56 UTC