Re: RFC Words - Levels from Laufer on 2015-09-15 (public-dwbp-wg@w3.org from September 2015)

From: Laufer <laufer@globo.com>
Date: Tue, 15 Sep 2015 13:02:06 -0300
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Cc: Annette Greiner <amgreiner@lbl.gov>, "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJih=84Zb=TmAcOaY99nuSd7Y83VF3URarYE23=W=7-Xzkw@mail.gmail.com>
Hi, Bernadette, Annette,

Good starting points.

The idea of characteristics (as Erik has also pointed to) I think can
clarify the purpose of the BPs.

I don't know if a set of levels for each characteristic could be confusing
for users (I don't know). If we could define a single set of levels I think
it could be more simple to comprehend (I don't know).

I also agree in avoiding the use of MUST and SHOULD.
Each BP has a title that do not use RFC words, and a subtitle, more
descriptive, using the RFC words.
For example:
Best Practice 4: Provide structural metadata
Information about the schema and internal structure of a distribution MUST
be described by metadata

What we can do is to rephrase the subtitles, for example:
Best Practice 4: Provide structural metadata
Provide metadata with information about the schema and internal structure
of a distribution

But to suppress these words in a new draft of the document we need to
introduce the ideas of characteristics and levels.

Cheers,
Laufer



2015-09-15 9:25 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:

> Hi Annette,
>
> Thanks for your messsage! Please, find some comments below.
>
> 2015-09-14 22:03 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:
>
>> Regarding the two questions, I don’t think we need to worry about whether
>> a maturity level applies at the level of a dataset or a collection. It’s up
>> to the publisher of the data to decide whether they will follow/claim a
>> certain level for a dataset or a collection.
>>
>
> I agree with you that it's up to publisher to decide if they will follow a
> certain level for a dataset or a collection. I made this question just to
> have a better idea of how to describe the "maturity model". For example, a
> publisher is level 1 if all datasets that he published meet the
> requirements of level 1 or a publisher may have datasets published with
> different levels?  Will  the categorization/level be given to the publisher
> or to the datasets?
>
>
>
>> For each BP, the relevant aspects will be different. Breaking BPs out
>> into multiple levels is a matter of determining the aspects that are
>> relevant to that BP or group of BPs. So, for metadata, you could say the
>> lowest level is “provide structural metadata and provide localization
>> metadata for locale-sensitive fields" , because we’d rather have some
>> incomplete metadata than none at all, but the data is meaningless without
>> structural metadata, and locale-sensitive fields are meaningless without
>> the localization info. The next level could be “provide  descriptive
>> metadata”, less crucial but still a huge help to have at least something.
>> The third level could be “Provide complete descriptive metadata, including
>> license information, provenance, quality information, and versioning
>> information.”
>>
>> Some of the metadata BPs I mention above seem like they could still be
>> separate BPs, just split into their own levels. “provide license
>> information” could be satisfied at a low level by providing a custom
>> description of licensing rules, and a higher level of maturity would be to
>> use a standard license.
>>
>> This begs the question of how many levels we should have, and how we will
>> assign them. If we go for three, how do we assign the groups of only two?
>> We might be able to determine that by splitting up each BP or set of BPs
>> however seems natural for that group, and seeing what turns out to be the
>> highest number of levels for any such group. Then we can try and come up
>> with some general rules to describe the levels. (I think the lowest and
>> highest levels will be easy to generalize, but the middle ones will be
>> hard.) Once we have generalized rules, it should be easy to assign the
>> groups that have fewer levels.
>>
>
> I also agree with you that BP of the same subject may belong to different
> levels and that we should have a way to categorize the BP. I was thinking
> about this and maybe we could use the expected characteristics of a dataset
> in order to specify the differente levels. This also helps to give more
> meaning to the BP, for example, "if these BP are followed then the
> resulting datasets will be comprehensible and discoverable". In other
> words, if a publisher follows BPX and BPY to publish dataset Z then dataset
> Z will be level 1, for example. So, instead of applying the levels for each
> group of BP then BP of different groups will be combined at the same level.
>
> Consider a possible list of expected characteristics of a dataset given
> below:
>
> comprehensible
> acessible
> reusable
> trustworthy
> discoverable
> processable
> interoperable
> linkable
>
> Then, based on these aspects, we would propose the different levels of
> maturity. For example:
>
> Level 1: Comprehensible
> Best Practice 4: Provide structural metadata
> Best Practice 3: Provide locale parameters metadata
>
> Level 2: Discoverable
> Best Practice 2: Provide descriptive metadata
>
> Level 3: Trustworthy
> Best Practice 6: Provide data provenance information
> Best Practice 7: Provide data quality information
> Best Practice 26: Provide data up to date
>
> ....
>
> I am also not sure of how many levels to propose, but I think that some
> aspects may be combined at the same level to avoid a long list of levels.
>
>
>> I would like us to try and avoid the use of SHOULD and MUST altogether,
>> since their use in a best practice recommendation cannot agree with their
>> RFC2119 meanings. (The web will not break if you fail to provide metadata
>> with your dataset.) Instead of saying “Datasets must have x, y, and z.” we
>> can simply say “Provide x, y, and z.”
>> -Annette
>>
>
> I agree!
>
> Cheers,
> Bernadette
>
>
>
>>
>> --
>> Annette Greiner
>> NERSC Data and Analytics Services
>> Lawrence Berkeley National Laboratory
>> 510-495-2935
>>
>> On Sep 14, 2015, at 5:00 PM, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
>> wrote:
>>
>>
>> Hi Laufer,
>>
>> I agree with you that we should have more fine grained sets of best
>> practices. It is also important to review the BP to make sure that SHOULD
>> and MUST were used correctly.  IMO we should also discuss what type of
>> classification we'd like to have with the maturity model. I have some
>> questions about this:
>>
>> The maturity model will be used to evaluate a single dataset or a set of
>> datasets?
>> Which main aspects should be considered for the evaluation?
>>
>> Thanks!
>> Bernadette
>>
>> 2015-09-04 11:21 GMT-03:00 Laufer <laufer@globo.com>:
>>
>>> Hi All,
>>>
>>> After our discussions about maintaining or not the RFC words and
>>> creating or not a mature model in conjunction with a set of BP levels, I
>>> grouped the BPs by RFC words:
>>>
>>> MUST
>>>     Best Practice  1: Provide metadata
>>>     Best Practice  2: Provide descriptive metadata
>>>     Best Practice  4: Provide structural metadata
>>>     Best Practice 10: Use persistent URIs as identifiers
>>>     Best Practice 12: Use machine-readable standardized data formats
>>>     Best Practice 21: Preserve people's right to privacy
>>>     Best Practice 26: Provide data up to date
>>>     Best Practice 29: Use a trusted serialization format for preserved
>>> data dumps
>>>
>>> SHOULD
>>>     Best Practice  3: Provide locale parameters metadata
>>>     Best Practice  5: Provide data license information
>>>     Best Practice  6: Provide data provenance information
>>>     Best Practice  7: Provide data quality information
>>>     Best Practice  8: Provide versioning information
>>>     Best Practice  9: Provide version history
>>>     Best Practice 11: Assign URIs to dataset versions and series
>>>     Best Practice 13: Use non-proprietary data formats
>>>     Best Practice 14: Provide data in multiple formats
>>>     Best Practice 15: Use standardized terms
>>>     Best Practice 16: Document vocabularies
>>>     Best Practice 17: Share vocabularies in an open way
>>>     Best Practice 18: Vocabulary versioning
>>>     Best Practice 19: Re-use vocabularies
>>>     Best Practice 20: Choose the right formalization level
>>>     Best Practice 22: Provide data unavailability reference
>>>     Best Practice 23: Provide bulk download
>>>     Best Practice 24: Follow REST principles when designing APIs
>>>     Best Practice 25: Provide real-time access
>>>     Best Practice 27: Maintain separate versions for a data API
>>>     Best Practice 28: Assess dataset coverage
>>>     Best Practice 30: Update the status of identifiers
>>>     Best Practice 31: Gather feedback from data consumers
>>>     Best Practice 32: Provide information about feedback
>>>     Best Practice 33: Enrich data by generating new metadata.
>>>
>>> We currently have two groups of BPs to guide the publisher.
>>>
>>> Maybe we could, from this two groups, make an exercise to define a more
>>> fine grained set of groups to, in some sense, assert some "quality"
>>> (mature) to a published dataset.
>>>
>>> What do you think about this?
>>>
>>> Cheers,
>>> Laufer
>>>
>>> --
>>> .  .  .  .. .  .
>>> .        .   . ..
>>> .     ..       .
>>>
>>
>>
>>
>> --
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>>
>> ----------------------------------------------------------------------------
>>
>>
>>
>> --
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>>
>> ----------------------------------------------------------------------------
>>
>>
>>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
> ----------------------------------------------------------------------------
>



-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Tuesday, 15 September 2015 16:02:36 UTC