Re: RFC Words - Levels

Hi Annette,

Thanks for your feedback!

> I think it makes the most sense to say a dataset meets a certain level. A
> publisher may claim “we publish all our data at level 3” as opposed to “we
> are a level 3 publisher". The latter would require defining what it means
> to be a certain level of publisher, which doesn’t strike me as worth the
> effort or the politics. I think publishers should be free to decide for
> themselves which level they will address for each dataset and each group of
> BPs, based on their business constraints. We don’t need to define what the
> various combinations get called; we only need to define the levels and
> attempt to make everything at one level similar in importance.

I agree with you! It also makes more sense for me to say that a dataset
meets a certain level.

> It’s an interesting idea to use the characteristics you mentioned to
> define the levels. The only problem is that I don’t think the
> characteristics lend themselves well to ranking by maturity. Is
> trustworthiness more fundamental than accessibility? Different people would
> answer that question differently. For any single characteristic, there can
> be multiple levels of maturity in addressing it. I agree that we should
> split into the least number of levels that works for our BPs, preferably 5
> or fewer. I think some BPs will have only one suggestion, which we can
> still assign a level to.

Yes, it is not trivial to define which characteristic is more important.
Maybe, there is no need to specify a relevance or priority between the
levels. In this case, instead of having maturity levels, we may have a
classification system for the datasets. The publisher may choose the
aspects/characteristcs that she wants cover and then apply the
corresponding BP. We can also try to combine this with the idea of having
maturity levels.

However, it is not clear for me how we're gonna associate BP to maturity
levels. How can we determine that BPX for metadata is level 1 and BPY for
metadata as level 2? And when there is just one BP associated to a
challenge (Data Provenance, for example), how to choose the maturity level?

Maybe we can do this based on the characteristics, for example: if you
apply BPX then the dataset is "partially comprehensible", but if you apply
BPY then the dataset is "totally comprehensible". Something similar can be
made for other characteristics. Then, based on this we may try to
categorize BP into levels.


> -Annette
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
> On Sep 15, 2015, at 5:25 AM, Bernadette Farias Lóscio <>
> wrote:
> Hi Annette,
> Thanks for your messsage! Please, find some comments below.
> 2015-09-14 22:03 GMT-03:00 Annette Greiner <>:
>> Regarding the two questions, I don’t think we need to worry about whether
>> a maturity level applies at the level of a dataset or a collection. It’s up
>> to the publisher of the data to decide whether they will follow/claim a
>> certain level for a dataset or a collection.
> I agree with you that it's up to publisher to decide if they will follow a
> certain level for a dataset or a collection. I made this question just to
> have a better idea of how to describe the "maturity model". For example, a
> publisher is level 1 if all datasets that he published meet the
> requirements of level 1 or a publisher may have datasets published with
> different levels?  Will  the categorization/level be given to the publisher
> or to the datasets?
>> For each BP, the relevant aspects will be different. Breaking BPs out
>> into multiple levels is a matter of determining the aspects that are
>> relevant to that BP or group of BPs. So, for metadata, you could say the
>> lowest level is “provide structural metadata and provide localization
>> metadata for locale-sensitive fields" , because we’d rather have some
>> incomplete metadata than none at all, but the data is meaningless without
>> structural metadata, and locale-sensitive fields are meaningless without
>> the localization info. The next level could be “provide  descriptive
>> metadata”, less crucial but still a huge help to have at least something.
>> The third level could be “Provide complete descriptive metadata, including
>> license information, provenance, quality information, and versioning
>> information.”
>> Some of the metadata BPs I mention above seem like they could still be
>> separate BPs, just split into their own levels. “provide license
>> information” could be satisfied at a low level by providing a custom
>> description of licensing rules, and a higher level of maturity would be to
>> use a standard license.
>> This begs the question of how many levels we should have, and how we will
>> assign them. If we go for three, how do we assign the groups of only two?
>> We might be able to determine that by splitting up each BP or set of BPs
>> however seems natural for that group, and seeing what turns out to be the
>> highest number of levels for any such group. Then we can try and come up
>> with some general rules to describe the levels. (I think the lowest and
>> highest levels will be easy to generalize, but the middle ones will be
>> hard.) Once we have generalized rules, it should be easy to assign the
>> groups that have fewer levels.
> I also agree with you that BP of the same subject may belong to different
> levels and that we should have a way to categorize the BP. I was thinking
> about this and maybe we could use the expected characteristics of a dataset
> in order to specify the differente levels. This also helps to give more
> meaning to the BP, for example, "if these BP are followed then the
> resulting datasets will be comprehensible and discoverable". In other
> words, if a publisher follows BPX and BPY to publish dataset Z then dataset
> Z will be level 1, for example. So, instead of applying the levels for each
> group of BP then BP of different groups will be combined at the same level.
> Consider a possible list of expected characteristics of a dataset given
> below:
> comprehensible
> acessible
> reusable
> trustworthy
> discoverable
> processable
> interoperable
> linkable
> Then, based on these aspects, we would propose the different levels of
> maturity. For example:
> Level 1: Comprehensible
> Best Practice 4: Provide structural metadata
> Best Practice 3: Provide locale parameters metadata
> Level 2: Discoverable
> Best Practice 2: Provide descriptive metadata
> Level 3: Trustworthy
> Best Practice 6: Provide data provenance information
> Best Practice 7: Provide data quality information
> Best Practice 26: Provide data up to date
> ....
> I am also not sure of how many levels to propose, but I think that some
> aspects may be combined at the same level to avoid a long list of levels.
>> I would like us to try and avoid the use of SHOULD and MUST altogether,
>> since their use in a best practice recommendation cannot agree with their
>> RFC2119 meanings. (The web will not break if you fail to provide metadata
>> with your dataset.) Instead of saying “Datasets must have x, y, and z.” we
>> can simply say “Provide x, y, and z.”
>> -Annette
> I agree!
> Cheers,
> Bernadette

Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil

Received on Wednesday, 16 September 2015 22:46:40 UTC