Re: RFC Words - Levels from Annette Greiner on 2015-09-16 (public-dwbp-wg@w3.org from September 2015)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Wed, 16 Sep 2015 13:44:14 -0700
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-Id: <B499EA7C-5AE2-432F-89F7-E2E31AAC2598@lbl.gov>
Hi Bernadette,
I think it makes the most sense to say a dataset meets a certain level. A publisher may claim “we publish all our data at level 3” as opposed to “we are a level 3 publisher". The latter would require defining what it means to be a certain level of publisher, which doesn’t strike me as worth the effort or the politics. I think publishers should be free to decide for themselves which level they will address for each dataset and each group of BPs, based on their business constraints. We don’t need to define what the various combinations get called; we only need to define the levels and attempt to make everything at one level similar in importance. 

It’s an interesting idea to use the characteristics you mentioned to define the levels. The only problem is that I don’t think the characteristics lend themselves well to ranking by maturity. Is trustworthiness more fundamental than accessibility? Different people would answer that question differently. For any single characteristic, there can be multiple levels of maturity in addressing it. I agree that we should split into the least number of levels that works for our BPs, preferably 5 or fewer. I think some BPs will have only one suggestion, which we can still assign a level to.
-Annette

--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Sep 15, 2015, at 5:25 AM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> wrote:

> Hi Annette,
> 
> Thanks for your messsage! Please, find some comments below.
> 
> 2015-09-14 22:03 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:
> Regarding the two questions, I don’t think we need to worry about whether a maturity level applies at the level of a dataset or a collection. It’s up to the publisher of the data to decide whether they will follow/claim a certain level for a dataset or a collection.
> 
> I agree with you that it's up to publisher to decide if they will follow a certain level for a dataset or a collection. I made this question just to have a better idea of how to describe the "maturity model". For example, a publisher is level 1 if all datasets that he published meet the requirements of level 1 or a publisher may have datasets published with different levels?  Will  the categorization/level be given to the publisher or to the datasets?
> 
>  
> For each BP, the relevant aspects will be different. Breaking BPs out into multiple levels is a matter of determining the aspects that are relevant to that BP or group of BPs. So, for metadata, you could say the lowest level is “provide structural metadata and provide localization metadata for locale-sensitive fields" , because we’d rather have some incomplete metadata than none at all, but the data is meaningless without structural metadata, and locale-sensitive fields are meaningless without the localization info. The next level could be “provide  descriptive metadata”, less crucial but still a huge help to have at least something. The third level could be “Provide complete descriptive metadata, including license information, provenance, quality information, and versioning information.”
> 
> Some of the metadata BPs I mention above seem like they could still be separate BPs, just split into their own levels. “provide license information” could be satisfied at a low level by providing a custom description of licensing rules, and a higher level of maturity would be to use a standard license.
> 
> This begs the question of how many levels we should have, and how we will assign them. If we go for three, how do we assign the groups of only two? We might be able to determine that by splitting up each BP or set of BPs however seems natural for that group, and seeing what turns out to be the highest number of levels for any such group. Then we can try and come up with some general rules to describe the levels. (I think the lowest and highest levels will be easy to generalize, but the middle ones will be hard.) Once we have generalized rules, it should be easy to assign the groups that have fewer levels.
> 
> I also agree with you that BP of the same subject may belong to different levels and that we should have a way to categorize the BP. I was thinking about this and maybe we could use the expected characteristics of a dataset in order to specify the differente levels. This also helps to give more meaning to the BP, for example, "if these BP are followed then the resulting datasets will be comprehensible and discoverable". In other words, if a publisher follows BPX and BPY to publish dataset Z then dataset Z will be level 1, for example. So, instead of applying the levels for each group of BP then BP of different groups will be combined at the same level. 
> 
> Consider a possible list of expected characteristics of a dataset given below:
> 
> comprehensible
> acessible
> reusable
> trustworthy
> discoverable
> processable
> interoperable
> linkable
> 
> Then, based on these aspects, we would propose the different levels of maturity. For example: 
> 
> Level 1: Comprehensible 
> Best Practice 4: Provide structural metadata
> Best Practice 3: Provide locale parameters metadata
> 
> Level 2: Discoverable
> Best Practice 2: Provide descriptive metadata
> 
> Level 3: Trustworthy
> Best Practice 6: Provide data provenance information
> Best Practice 7: Provide data quality information
> Best Practice 26: Provide data up to date
> 
> ....
> 
> I am also not sure of how many levels to propose, but I think that some aspects may be combined at the same level to avoid a long list of levels.
> 
> 
> I would like us to try and avoid the use of SHOULD and MUST altogether, since their use in a best practice recommendation cannot agree with their RFC2119 meanings. (The web will not break if you fail to provide metadata with your dataset.) Instead of saying “Datasets must have x, y, and z.” we can simply say “Provide x, y, and z.”
> -Annette
> 
> I agree!
> 
> Cheers,
> Bernadette
> 
>
Received on Wednesday, 16 September 2015 20:44:57 UTC