- From: Laufer <laufer@globo.com>
- Date: Tue, 15 Sep 2015 13:02:06 -0300
- To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
- Cc: Annette Greiner <amgreiner@lbl.gov>, "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
- Message-ID: <CA+pXJih=84Zb=TmAcOaY99nuSd7Y83VF3URarYE23=W=7-Xzkw@mail.gmail.com>
Hi, Bernadette, Annette, Good starting points. The idea of characteristics (as Erik has also pointed to) I think can clarify the purpose of the BPs. I don't know if a set of levels for each characteristic could be confusing for users (I don't know). If we could define a single set of levels I think it could be more simple to comprehend (I don't know). I also agree in avoiding the use of MUST and SHOULD. Each BP has a title that do not use RFC words, and a subtitle, more descriptive, using the RFC words. For example: Best Practice 4: Provide structural metadata Information about the schema and internal structure of a distribution MUST be described by metadata What we can do is to rephrase the subtitles, for example: Best Practice 4: Provide structural metadata Provide metadata with information about the schema and internal structure of a distribution But to suppress these words in a new draft of the document we need to introduce the ideas of characteristics and levels. Cheers, Laufer 2015-09-15 9:25 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>: > Hi Annette, > > Thanks for your messsage! Please, find some comments below. > > 2015-09-14 22:03 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>: > >> Regarding the two questions, I don’t think we need to worry about whether >> a maturity level applies at the level of a dataset or a collection. It’s up >> to the publisher of the data to decide whether they will follow/claim a >> certain level for a dataset or a collection. >> > > I agree with you that it's up to publisher to decide if they will follow a > certain level for a dataset or a collection. I made this question just to > have a better idea of how to describe the "maturity model". For example, a > publisher is level 1 if all datasets that he published meet the > requirements of level 1 or a publisher may have datasets published with > different levels? Will the categorization/level be given to the publisher > or to the datasets? > > > >> For each BP, the relevant aspects will be different. Breaking BPs out >> into multiple levels is a matter of determining the aspects that are >> relevant to that BP or group of BPs. So, for metadata, you could say the >> lowest level is “provide structural metadata and provide localization >> metadata for locale-sensitive fields" , because we’d rather have some >> incomplete metadata than none at all, but the data is meaningless without >> structural metadata, and locale-sensitive fields are meaningless without >> the localization info. The next level could be “provide descriptive >> metadata”, less crucial but still a huge help to have at least something. >> The third level could be “Provide complete descriptive metadata, including >> license information, provenance, quality information, and versioning >> information.” >> >> Some of the metadata BPs I mention above seem like they could still be >> separate BPs, just split into their own levels. “provide license >> information” could be satisfied at a low level by providing a custom >> description of licensing rules, and a higher level of maturity would be to >> use a standard license. >> >> This begs the question of how many levels we should have, and how we will >> assign them. If we go for three, how do we assign the groups of only two? >> We might be able to determine that by splitting up each BP or set of BPs >> however seems natural for that group, and seeing what turns out to be the >> highest number of levels for any such group. Then we can try and come up >> with some general rules to describe the levels. (I think the lowest and >> highest levels will be easy to generalize, but the middle ones will be >> hard.) Once we have generalized rules, it should be easy to assign the >> groups that have fewer levels. >> > > I also agree with you that BP of the same subject may belong to different > levels and that we should have a way to categorize the BP. I was thinking > about this and maybe we could use the expected characteristics of a dataset > in order to specify the differente levels. This also helps to give more > meaning to the BP, for example, "if these BP are followed then the > resulting datasets will be comprehensible and discoverable". In other > words, if a publisher follows BPX and BPY to publish dataset Z then dataset > Z will be level 1, for example. So, instead of applying the levels for each > group of BP then BP of different groups will be combined at the same level. > > Consider a possible list of expected characteristics of a dataset given > below: > > comprehensible > acessible > reusable > trustworthy > discoverable > processable > interoperable > linkable > > Then, based on these aspects, we would propose the different levels of > maturity. For example: > > Level 1: Comprehensible > Best Practice 4: Provide structural metadata > Best Practice 3: Provide locale parameters metadata > > Level 2: Discoverable > Best Practice 2: Provide descriptive metadata > > Level 3: Trustworthy > Best Practice 6: Provide data provenance information > Best Practice 7: Provide data quality information > Best Practice 26: Provide data up to date > > .... > > I am also not sure of how many levels to propose, but I think that some > aspects may be combined at the same level to avoid a long list of levels. > > >> I would like us to try and avoid the use of SHOULD and MUST altogether, >> since their use in a best practice recommendation cannot agree with their >> RFC2119 meanings. (The web will not break if you fail to provide metadata >> with your dataset.) Instead of saying “Datasets must have x, y, and z.” we >> can simply say “Provide x, y, and z.” >> -Annette >> > > I agree! > > Cheers, > Bernadette > > > >> >> -- >> Annette Greiner >> NERSC Data and Analytics Services >> Lawrence Berkeley National Laboratory >> 510-495-2935 >> >> On Sep 14, 2015, at 5:00 PM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> >> wrote: >> >> >> Hi Laufer, >> >> I agree with you that we should have more fine grained sets of best >> practices. It is also important to review the BP to make sure that SHOULD >> and MUST were used correctly. IMO we should also discuss what type of >> classification we'd like to have with the maturity model. I have some >> questions about this: >> >> The maturity model will be used to evaluate a single dataset or a set of >> datasets? >> Which main aspects should be considered for the evaluation? >> >> Thanks! >> Bernadette >> >> 2015-09-04 11:21 GMT-03:00 Laufer <laufer@globo.com>: >> >>> Hi All, >>> >>> After our discussions about maintaining or not the RFC words and >>> creating or not a mature model in conjunction with a set of BP levels, I >>> grouped the BPs by RFC words: >>> >>> MUST >>> Best Practice 1: Provide metadata >>> Best Practice 2: Provide descriptive metadata >>> Best Practice 4: Provide structural metadata >>> Best Practice 10: Use persistent URIs as identifiers >>> Best Practice 12: Use machine-readable standardized data formats >>> Best Practice 21: Preserve people's right to privacy >>> Best Practice 26: Provide data up to date >>> Best Practice 29: Use a trusted serialization format for preserved >>> data dumps >>> >>> SHOULD >>> Best Practice 3: Provide locale parameters metadata >>> Best Practice 5: Provide data license information >>> Best Practice 6: Provide data provenance information >>> Best Practice 7: Provide data quality information >>> Best Practice 8: Provide versioning information >>> Best Practice 9: Provide version history >>> Best Practice 11: Assign URIs to dataset versions and series >>> Best Practice 13: Use non-proprietary data formats >>> Best Practice 14: Provide data in multiple formats >>> Best Practice 15: Use standardized terms >>> Best Practice 16: Document vocabularies >>> Best Practice 17: Share vocabularies in an open way >>> Best Practice 18: Vocabulary versioning >>> Best Practice 19: Re-use vocabularies >>> Best Practice 20: Choose the right formalization level >>> Best Practice 22: Provide data unavailability reference >>> Best Practice 23: Provide bulk download >>> Best Practice 24: Follow REST principles when designing APIs >>> Best Practice 25: Provide real-time access >>> Best Practice 27: Maintain separate versions for a data API >>> Best Practice 28: Assess dataset coverage >>> Best Practice 30: Update the status of identifiers >>> Best Practice 31: Gather feedback from data consumers >>> Best Practice 32: Provide information about feedback >>> Best Practice 33: Enrich data by generating new metadata. >>> >>> We currently have two groups of BPs to guide the publisher. >>> >>> Maybe we could, from this two groups, make an exercise to define a more >>> fine grained set of groups to, in some sense, assert some "quality" >>> (mature) to a published dataset. >>> >>> What do you think about this? >>> >>> Cheers, >>> Laufer >>> >>> -- >>> . . . .. . . >>> . . . .. >>> . .. . >>> >> >> >> >> -- >> Bernadette Farias Lóscio >> Centro de Informática >> Universidade Federal de Pernambuco - UFPE, Brazil >> >> ---------------------------------------------------------------------------- >> >> >> >> -- >> Bernadette Farias Lóscio >> Centro de Informática >> Universidade Federal de Pernambuco - UFPE, Brazil >> >> ---------------------------------------------------------------------------- >> >> >> > > > -- > Bernadette Farias Lóscio > Centro de Informática > Universidade Federal de Pernambuco - UFPE, Brazil > > ---------------------------------------------------------------------------- > -- . . . .. . . . . . .. . .. .
Received on Tuesday, 15 September 2015 16:02:36 UTC