Re: partial review from Bernadette Farias Lóscio on 2016-04-16 (public-dwbp-wg@w3.org from April 2016)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Sat, 16 Apr 2016 14:00:41 -0300
To: Annette Greiner <amgreiner@lbl.gov>
Cc: DWBP Public List <public-dwbp-wg@w3.org>
Message-ID: <CANx1PzxUkHammTF+cm082YiwtZ2QAXu5JCnE7s2=tPieBOq=sw@mail.gmail.com>
Hi Annette,

Thank you very much for your detailed review! After receiving more feedback
from the group, I think we're gonna be able to make some proposals for
updates. Please, find some comments below.

2016-04-14 23:12 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:

> Whew, I've gotten through section 8.7. This is taking way too long, so I'm
> going to stop at this point and put this out. Following are things that I
> noticed in a partial read-through of the BP document.
> -Annette
>
>
> General issues
> --
>
> Possible approaches to implementation should not include the word
> "should". That implies normativeness. This is a general issue with
> implementation sections. We say in the Audience section that "The normative
> element of each best practice is the intended outcome."
>

I agree Annette! I think we should also remove this sentence "The normative
element of each best practice is the intended outcome".


>
> Subtitles should all be written in the same mode. (Mine were written in
> imperative -- "do this, don't do that", but most are declarative -- "this
> should be done".) I think imperative is better, because it gets away from
> RFC2119 keywords, which we voted not to use. It becomes a call to action,
> which is our goal, right?
>

"this should be done" reflects the use of the RFC2119 keywords. I'm ok with
using the imperative mode, but I think we should bring this discussion to
the group.


>
> 1. provide metadata
> --
>
> The intended outcome is "Human-readable metadata will enable humans to
> understand the metadata and machine-readable metadata will enable computer
> applications, notably user agents, to process the metadata."
> This is tautological. Metadata is necessary because, without it, the data
> will have no context or meaning.
>

The idea of the intended outcome is to show what it will be possible if you
provide metadata both human-readable and machine-readable instead of
metadata in general.
Please, let us know if you have a better proposal for this intended outcome.



>
> Possible approach to implementation should not include the word "should".


I agree! We're gonna remove it.



> Also, I disagree that "If multiple formats are published separately, they
> should be served from the same URL using content negotiation." publishing
> multiple files is also reasonable, and it's even what we used in all our
> examples about metadata. (in BP2, the machine readable example gives the
> name of the distribution as bus-stops-2015-05-05.csv; in BP4, the entire
> URI is given, ending in .csv, etc.)
>

ok! What do you suggest? to remove "If multiple formats are published
separately, they should be served from the same URL using content
negotiation" or to change the URIs?


>
>
> 2. descriptive metadata
> --
>
> There is an inconsistency between the suggestion that one should use
> content negotiation for different formats (csv vs. rdf) and the .
> :mobility and :themes are referred to as URIs, but they are not URIs. (I
> know DCAT did this, but I think it's a mistake; colons are not legal in the
> first segment of a relative URI.)
>
>
I'm sorry, but I'm not sure if I understood. Why :mobility and :themes are
not URIs?




>
> 3. locale parameters
> --
>
> The human-readable example for the first three BPs is exactly the same.
> Can we make the examples more specific (maybe include them in the doc
> rather than link to one big external example)? The ttl in the
> machine-readable example could be trimmed to just the bold parts.
>

I don't agree with showing the human-readable metadata in the doc. The doc
is very long already. I prefer having an external example, but we can
discuss this with group. Instead of splitting the example, maybe we can
link to specific parts of the page according to the BP.

I think we can make the machine-readable example shorter.


>
> 5. Licenses
> --
>
> We say "the license of a dataset can be specified within the data". I
> think we mean within the *metadata*.
>

I think it should be the dataset instead of metadata because metadata is
part of the dataset.



> The "Why" misuses the phrase "for example." User agent actions are not an
> example of data consumer actions.
>

ok! we're gonna remove it.


> We say "Data license information can be provided as a link to a
> human-readable license or as a link/embedded machine-readable license."
> Since licensing info is part of metadata, and we tell people to provide
> metadata for both humans and machines, we should also require licensing
> info for both humans and machines.
>

Do you propose to change the subtitle of the BP?



>
>
> 6. Provenance
> --
>
> The "Why" is pretty sparse and essentially says the same thing as the
> intended outcome. I think we could make it stronger. "Provenance is one
> means by which consumers of a dataset judge its quality. Understanding its
> origin and history helps one determine whether to trust the data and
> provides important interpretive context."
>

I agree to improve the Why section, but I think we shouldn't mention
quality because we already have a BP for data quality.


>
> The example links to the metadata example page. It would be more helpful
> to put the provenance-specific info into the BP doc itself.
>

I'm not sure. I think if we present just parts of the human-readable
example it will be out of context, but we're gonna see if it looks nice.

>
>
> 7. Quality
> --
>
> We say "Data quality information will enable humans to know the quality of
> the dataset and its distributions, and software agents to automatically
> process quality information about the dataset and its distributions."
> That's rather tautological. We could say something about enabling humans to
> determine whether the dataset is suitable for their purposes.
>

Yes, it sounds good! We're gonna make a proposal to change this intended
outcome. I think we should also dicuss this with Antoine and Riccardo.


>
> We probably should refer to DQV as a finished thing, as it will be soon.
>
> The human-readable example links to the metadata one.
>

Ok! We're gonna fix it.


>
> 8. Versioning
> --
>
> Of the four implementation bullets, only the last is really a possible
> approach. The first three belong in the intended outcome.
>

I'm sorry, but again I think I don't understand. Why the first three belong
in the intended outcome? If they are intended outcomes, then the whole
intended outcome section needs to be rewritten. In this case, would you
like to make a proposal?


>
> The human-readable example links to the metadata one. The version history
> there lists only 1.1, which is illogical. (1.0 must exist at least.)
>

ok! we're gonna fix it!

>
>
> 9. Version history
> --
>
> The human-readable example links to the metadata one. The version history
> there lists only 1.1, which is illogical. (1.0 must exist at least.) This
> example doesn't meet the requirements of the BP.
>
> Neither the ttl version nor the Memento example provides a full version
> history, only a list of versions released. This BP is intended to be about
> providing the details of what changed.
>
> In the machine-readable example of this BP there is a property
rdfs:comment to show how the dataset was updated. If this is not enough,
could you please tell us what else we should present.



>
> Intro to Identifiers
> --
>
> Intro item 5 refers to an API which could be confusing, since we talk
> about APIs as web APIs elsewhere.
>
>
> 10. Persistent URIs as identifiers
> --
>
> We say "This requires a different mindset to that used when creating a Web
> site designed for humans to navigate their way through." When creating a
> web site for humans to navigate, one should also consider persistence, so
> that sentence is not strictly accurate.
>
> The example uses the city domain instead of the transport agency's domain,
> which is not realistic for a large city. The agency domain is likely to
> persist as long as the information it makes available is relevant. Try
> Googling "transit agency" and see what comes up for domain names. The issue
> depends on how stable the transit service is. For a small town, the transit
> function might not be given over to a separate agency, and the guidance
> would be right, but for a big city, where the transit function is run by an
> independent agency, it's not realistic.
>
> The example is rather redundant. It is data.mycity..., and yet /dataset
> also appears in the path. The path also contains /bus as well as
> /bus-stops. It's unlikely that the agency has so many transit modes that
> they need to be split between road and rail and water. The same info is
> conveyed as well by the much shorter
> http://data.mycitytransit.example.org/bus/stops
>
> We say "Ideally, the relevant Web site includes a description of the
> process..." I think we mean a controlled scheme.
>
>
> 11. Persistent URIs within datasets
> --
>
> The word "affordances" is misused. Affordances are how we know what
> something is intended to do, not what the thing does. Affordances do not
> act on things, they inform.
>
> The intended outcome should be a free-standing piece of text. Starting
> with "that one item" is confusing.
>
> Much of the implementation section is about minting new URIs, which is the
> subject of the previous BP. It is off topic here. Everything from "If you
> can't find an existing set of identifiers that meet your needs, you'll need
> to create your own" down to the end of the example doesn't belong in a BP
> that is about using other people's identifiers.
>
> The last paragraph of the example is almost exactly the same as the last
> paragraph before the example.
>
>
> 12.  URIs for versions and series
> --
>
> This BP is confusing two issues. One is the use of a shorter URI for the
> latest version of a dataset while also assigning a version-specific URI for
> it. The other issue is making a landing page for a collection of datasets.
> The initial intent was the former.
>
> The examples in the Why aren't series or groups except for the first item,
> yet they are introduced as examples of series or groups.
>
> How to Test says to check "that logical groups of datasets are also
> identifiable." That is vague. It should say "that a URI is also provided
> for the latest version or most recent real-time value."
>
> I don't think this applies to time series. What we're talking about here
> is use of dates for version identifiers.
>
> The example is incomplete; it doesn't say what the latest version URI
> would be.
>


I'm gonna ask Phil to give a feedback about your comments on the Data
Identifier section. I'm sure that he can give a better feedback than I can
;)

Once again thanks a lot for your review! We're looking forward to your
comments in the remainder sections.

Cheers,
Bernadette





>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Saturday, 16 April 2016 17:01:30 UTC