Re: partial review from Annette Greiner on 2016-04-21 (public-dwbp-wg@w3.org from April 2016)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Thu, 21 Apr 2016 14:53:21 -0700
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Cc: DWBP Public List <public-dwbp-wg@w3.org>
Message-ID: <57194BD1.2090600@lbl.gov>
Hi Bernadette, my responses are inline, as usual. I hope these are helpful.
-Annette

On 4/16/16 10:00 AM, Bernadette Farias Lóscio wrote:
> Hi Annette,
>
> Thank you very much for your detailed review! After receiving more 
> feedback from the group, I think we're gonna be able to make some 
> proposals for updates. Please, find some comments below.
>
> 2016-04-14 23:12 GMT-03:00 Annette Greiner <amgreiner@lbl.gov 
> <mailto:amgreiner@lbl.gov>>:
>
>     Whew, I've gotten through section 8.7. This is taking way too
>     long, so I'm going to stop at this point and put this out.
>     Following are things that I noticed in a partial read-through of
>     the BP document.
>     -Annette
>
>
>     General issues
>     --
>
>     Possible approaches to implementation should not include the word
>     "should". That implies normativeness. This is a general issue with
>     implementation sections. We say in the Audience section that "The
>     normative element of each best practice is the intended outcome."
>
>
> I agree Annette! I think we should also remove this sentence "The 
> normative element of each best practice is the intended outcome".
I have no problem with the sentence stating that the intended outcome is 
normative. We just need to stick to that and keep the implementation 
sections from getting normative. Your comment on the wiki suggests you 
want to remove "shoulds" from the intended outcomes, but that is the one 
place where they belong. I want to remove them from the implementation.
>
>
>     Subtitles should all be written in the same mode. (Mine were
>     written in imperative -- "do this, don't do that", but most are
>     declarative -- "this should be done".) I think imperative is
>     better, because it gets away from RFC2119 keywords, which we voted
>     not to use. It becomes a call to action, which is our goal, right?
>
>
> "this should be done" reflects the use of the RFC2119 keywords. I'm ok 
> with using the imperative mode, but I think we should bring this 
> discussion to the group.
>
>
>     1. provide metadata
>     --
>
>     The intended outcome is "Human-readable metadata will enable
>     humans to understand the metadata and machine-readable metadata
>     will enable computer applications, notably user agents, to process
>     the metadata."
>     This is tautological. Metadata is necessary because, without it,
>     the data will have no context or meaning.
>
>
> The idea of the intended outcome is to show what it will be possible 
> if you provide metadata both human-readable and machine-readable 
> instead of metadata in general.
> Please, let us know if you have a better proposal for this intended 
> outcome.
>
The best practice is telling people to provide metadata. Providing it 
for machines and humans is a secondary suggestion. The intended outcome 
should address the primary suggestion at least.
>
>
>     Possible approach to implementation should not include the word
>     "should". 
>
>
> I agree! We're gonna remove it.
Great, we just need to be clear that we are not specifically removing it 
from the intended outcomes.
>
>     Also, I disagree that "If multiple formats are published
>     separately, they should be served from the same URL using content
>     negotiation." publishing multiple files is also reasonable, and
>     it's even what we used in all our examples about metadata. (in
>     BP2, the machine readable example gives the name of the
>     distribution as bus-stops-2015-05-05.csv; in BP4, the entire URI
>     is given, ending in .csv, etc.)
>
>
> ok! What do you suggest? to remove "If multiple formats are published 
> separately, they should be served from the same URL using content 
> negotiation" or to change the URIs?
I suggest we remove the sentence.
>
>
>
>     2. descriptive metadata
>     --
>
>     There is an inconsistency between the suggestion that one should
>     use content negotiation for different formats (csv vs. rdf) and the .
>     :mobility and :themes are referred to as URIs, but they are not
>     URIs. (I know DCAT did this, but I think it's a mistake; colons
>     are not legal in the first segment of a relative URI.)
>
>
> I'm sorry, but I'm not sure if I understood. Why :mobility and :themes 
> are not URIs?
>
>
The colons are not allowed.
>
>
>     3. locale parameters
>     --
>
>     The human-readable example for the first three BPs is exactly the
>     same. Can we make the examples more specific (maybe include them
>     in the doc rather than link to one big external example)? The ttl
>     in the machine-readable example could be trimmed to just the bold
>     parts.
>
>
> I don't agree with showing the human-readable metadata in the doc. The 
> doc is very long already. I prefer having an external example, but we 
> can discuss this with group. Instead of splitting the example, maybe 
> we can link to specific parts of the page according to the BP.
>
> I think we can make the machine-readable example shorter.
I suggest we keep the external metadata doc for the one BP that just 
says to include metadata. It's great for that. For other BPs that just 
talk about a small part of the metadata, we can include only the lines 
that are relevant. Right now, many of them just link over to the 
external doc without even a fragment id, so the user has no idea which 
part is relevant.
>
>
>     5. Licenses
>     --
>
>     We say "the license of a dataset can be specified within the
>     data". I think we mean within the *metadata*.
>
>
> I think it should be the dataset instead of metadata because metadata 
> is part of the dataset.
>
Dataset or metadata is fine with me. "within the data" suggests that the 
license info would be somehow embedded into a data row, which is confusing.
>
>     The "Why" misuses the phrase "for example." User agent actions are
>     not an example of data consumer actions.
>
>
> ok! we're gonna remove it.
>
>     We say "Data license information can be provided as a link to a
>     human-readable license or as a link/embedded machine-readable
>     license." Since licensing info is part of metadata, and we tell
>     people to provide metadata for both humans and machines, we should
>     also require licensing info for both humans and machines.
>
>
> Do you propose to change the subtitle of the BP?
No, I would just change the "or" to "and" in the implementation. I do 
think, though, that "link/embedded machine readable license" is unclear 
(Is the solidus indicating another "or" or an "and"? What is being 
embedded into what?).
>
>
>
>
>     6. Provenance
>     --
>
>     The "Why" is pretty sparse and essentially says the same thing as
>     the intended outcome. I think we could make it stronger.
>     "Provenance is one means by which consumers of a dataset judge its
>     quality. Understanding its origin and history helps one determine
>     whether to trust the data and provides important interpretive
>     context."
>
>
> I agree to improve the Why section, but I think we shouldn't mention 
> quality because we already have a BP for data quality.
I don't see why having a BP about quality limits our ability to use the 
word elsewhere, but we could use another word instead, like "usefulness".
>
>
>     The example links to the metadata example page. It would be more
>     helpful to put the provenance-specific info into the BP doc itself.
>
>
> I'm not sure. I think if we present just parts of the human-readable 
> example it will be out of context, but we're gonna see if it looks nice.
>
>
okay, sounds good. At a minimum, we need to link to a fragment id, so 
readers can find it.
>
>
>     7. Quality
>     --
>
>     We say "Data quality information will enable humans to know the
>     quality of the dataset and its distributions, and software agents
>     to automatically process quality information about the dataset and
>     its distributions." That's rather tautological. We could say
>     something about enabling humans to determine whether the dataset
>     is suitable for their purposes.
>
>
> Yes, it sounds good! We're gonna make a proposal to change this 
> intended outcome. I think we should also dicuss this with Antoine and 
> Riccardo.
>
>
>     We probably should refer to DQV as a finished thing, as it will be
>     soon.
>
>     The human-readable example links to the metadata one.
>
>
> Ok! We're gonna fix it.
>
>
>     8. Versioning
>     --
>
>     Of the four implementation bullets, only the last is really a
>     possible approach. The first three belong in the intended outcome.
>
>
> I'm sorry, but again I think I don't understand. Why the first three 
> belong in the intended outcome? If they are intended outcomes, then 
> the whole intended outcome section needs to be rewritten. In this 
> case, would you like to make a proposal?
The bullets list basic guidelines. They use the word "should" and the 
imperative mode. These are evidences that they are guidance for what to 
do, not how to go about doing it.
>
>
>     The human-readable example links to the metadata one. The version
>     history there lists only 1.1, which is illogical. (1.0 must exist
>     at least.)
>
>
> ok! we're gonna fix it!
>
>
>
>     9. Version history
>     --
>
>     The human-readable example links to the metadata one. The version
>     history there lists only 1.1, which is illogical. (1.0 must exist
>     at least.) This example doesn't meet the requirements of the BP.
>
>     Neither the ttl version nor the Memento example provides a full
>     version history, only a list of versions released. This BP is
>     intended to be about providing the details of what changed.
>
> In the machine-readable example of this BP there is a property 
> rdfs:comment to show how the dataset was updated. If this is not 
> enough, could you please tell us what else we should present.
>
A version history is something like this: 
http://itol.embl.de/version_history.cgi. I think we could show a few 
lines of example text, like

Version 1.1
     Added latitude and longitude fields to bus timetables.
     Added two new bus stops for line 52.
     Corrected spelling errors in street names along the Riverwalk 
subway line.
Version 1.0
     Initial release
>
>
>     Intro to Identifiers
>     --
>
>     Intro item 5 refers to an API which could be confusing, since we
>     talk about APIs as web APIs elsewhere.
>
>
>     10. Persistent URIs as identifiers
>     --
>
>     We say "This requires a different mindset to that used when
>     creating a Web site designed for humans to navigate their way
>     through." When creating a web site for humans to navigate, one
>     should also consider persistence, so that sentence is not strictly
>     accurate.
>
>     The example uses the city domain instead of the transport agency's
>     domain, which is not realistic for a large city. The agency domain
>     is likely to persist as long as the information it makes available
>     is relevant. Try Googling "transit agency" and see what comes up
>     for domain names. The issue depends on how stable the transit
>     service is. For a small town, the transit function might not be
>     given over to a separate agency, and the guidance would be right,
>     but for a big city, where the transit function is run by an
>     independent agency, it's not realistic.
>
>     The example is rather redundant. It is data.mycity..., and yet
>     /dataset also appears in the path. The path also contains /bus as
>     well as /bus-stops. It's unlikely that the agency has so many
>     transit modes that they need to be split between road and rail and
>     water. The same info is conveyed as well by the much shorter
>     http://data.mycitytransit.example.org/bus/stops
>
>     We say "Ideally, the relevant Web site includes a description of
>     the process..." I think we mean a controlled scheme.
>
>
>     11. Persistent URIs within datasets
>     --
>
>     The word "affordances" is misused. Affordances are how we know
>     what something is intended to do, not what the thing does.
>     Affordances do not act on things, they inform.
>
>     The intended outcome should be a free-standing piece of text.
>     Starting with "that one item" is confusing.
>
>     Much of the implementation section is about minting new URIs,
>     which is the subject of the previous BP. It is off topic here.
>     Everything from "If you can't find an existing set of identifiers
>     that meet your needs, you'll need to create your own" down to the
>     end of the example doesn't belong in a BP that is about using
>     other people's identifiers.
>
>     The last paragraph of the example is almost exactly the same as
>     the last paragraph before the example.
>
>
>     12.  URIs for versions and series
>     --
>
>     This BP is confusing two issues. One is the use of a shorter URI
>     for the latest version of a dataset while also assigning a
>     version-specific URI for it. The other issue is making a landing
>     page for a collection of datasets. The initial intent was the former.
>
>     The examples in the Why aren't series or groups except for the
>     first item, yet they are introduced as examples of series or groups.
>
>     How to Test says to check "that logical groups of datasets are
>     also identifiable." That is vague. It should say "that a URI is
>     also provided for the latest version or most recent real-time value."
>
>     I don't think this applies to time series. What we're talking
>     about here is use of dates for version identifiers.
>
>     The example is incomplete; it doesn't say what the latest version
>     URI would be.
>
>
>
> I'm gonna ask Phil to give a feedback about your comments on the Data 
> Identifier section. I'm sure that he can give a better feedback than I 
> can ;)
>
> Once again thanks a lot for your review! We're looking forward to your 
> comments in the remainder sections.
>
> Cheers,
> Bernadette
>
>
>
>
>     -- 
>     Annette Greiner
>     NERSC Data and Analytics Services
>     Lawrence Berkeley National Laboratory
>
>
>
>
>
> -- 
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
> ----------------------------------------------------------------------------

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Thursday, 21 April 2016 21:53:54 UTC