- From: Annette Greiner <amgreiner@lbl.gov>
- Date: Thu, 14 Apr 2016 19:12:18 -0700
- To: DWBP Public List <public-dwbp-wg@w3.org>
Whew, I've gotten through section 8.7. This is taking way too long, so I'm going to stop at this point and put this out. Following are things that I noticed in a partial read-through of the BP document. -Annette General issues -- Possible approaches to implementation should not include the word "should". That implies normativeness. This is a general issue with implementation sections. We say in the Audience section that "The normative element of each best practice is the intended outcome." Subtitles should all be written in the same mode. (Mine were written in imperative -- "do this, don't do that", but most are declarative -- "this should be done".) I think imperative is better, because it gets away from RFC2119 keywords, which we voted not to use. It becomes a call to action, which is our goal, right? 1. provide metadata -- The intended outcome is "Human-readable metadata will enable humans to understand the metadata and machine-readable metadata will enable computer applications, notably user agents, to process the metadata." This is tautological. Metadata is necessary because, without it, the data will have no context or meaning. Possible approach to implementation should not include the word "should". Also, I disagree that "If multiple formats are published separately, they should be served from the same URL using content negotiation." publishing multiple files is also reasonable, and it's even what we used in all our examples about metadata. (in BP2, the machine readable example gives the name of the distribution as bus-stops-2015-05-05.csv; in BP4, the entire URI is given, ending in .csv, etc.) 2. descriptive metadata -- There is an inconsistency between the suggestion that one should use content negotiation for different formats (csv vs. rdf) and the . :mobility and :themes are referred to as URIs, but they are not URIs. (I know DCAT did this, but I think it's a mistake; colons are not legal in the first segment of a relative URI.) 3. locale parameters -- The human-readable example for the first three BPs is exactly the same. Can we make the examples more specific (maybe include them in the doc rather than link to one big external example)? The ttl in the machine-readable example could be trimmed to just the bold parts. 5. Licenses -- We say "the license of a dataset can be specified within the data". I think we mean within the *metadata*. The "Why" misuses the phrase "for example." User agent actions are not an example of data consumer actions. We say "Data license information can be provided as a link to a human-readable license or as a link/embedded machine-readable license." Since licensing info is part of metadata, and we tell people to provide metadata for both humans and machines, we should also require licensing info for both humans and machines. 6. Provenance -- The "Why" is pretty sparse and essentially says the same thing as the intended outcome. I think we could make it stronger. "Provenance is one means by which consumers of a dataset judge its quality. Understanding its origin and history helps one determine whether to trust the data and provides important interpretive context." The example links to the metadata example page. It would be more helpful to put the provenance-specific info into the BP doc itself. 7. Quality -- We say "Data quality information will enable humans to know the quality of the dataset and its distributions, and software agents to automatically process quality information about the dataset and its distributions." That's rather tautological. We could say something about enabling humans to determine whether the dataset is suitable for their purposes. We probably should refer to DQV as a finished thing, as it will be soon. The human-readable example links to the metadata one. 8. Versioning -- Of the four implementation bullets, only the last is really a possible approach. The first three belong in the intended outcome. The human-readable example links to the metadata one. The version history there lists only 1.1, which is illogical. (1.0 must exist at least.) 9. Version history -- The human-readable example links to the metadata one. The version history there lists only 1.1, which is illogical. (1.0 must exist at least.) This example doesn't meet the requirements of the BP. Neither the ttl version nor the Memento example provides a full version history, only a list of versions released. This BP is intended to be about providing the details of what changed. Intro to Identifiers -- Intro item 5 refers to an API which could be confusing, since we talk about APIs as web APIs elsewhere. 10. Persistent URIs as identifiers -- We say "This requires a different mindset to that used when creating a Web site designed for humans to navigate their way through." When creating a web site for humans to navigate, one should also consider persistence, so that sentence is not strictly accurate. The example uses the city domain instead of the transport agency's domain, which is not realistic for a large city. The agency domain is likely to persist as long as the information it makes available is relevant. Try Googling "transit agency" and see what comes up for domain names. The issue depends on how stable the transit service is. For a small town, the transit function might not be given over to a separate agency, and the guidance would be right, but for a big city, where the transit function is run by an independent agency, it's not realistic. The example is rather redundant. It is data.mycity..., and yet /dataset also appears in the path. The path also contains /bus as well as /bus-stops. It's unlikely that the agency has so many transit modes that they need to be split between road and rail and water. The same info is conveyed as well by the much shorter http://data.mycitytransit.example.org/bus/stops We say "Ideally, the relevant Web site includes a description of the process..." I think we mean a controlled scheme. 11. Persistent URIs within datasets -- The word "affordances" is misused. Affordances are how we know what something is intended to do, not what the thing does. Affordances do not act on things, they inform. The intended outcome should be a free-standing piece of text. Starting with "that one item" is confusing. Much of the implementation section is about minting new URIs, which is the subject of the previous BP. It is off topic here. Everything from "If you can't find an existing set of identifiers that meet your needs, you'll need to create your own" down to the end of the example doesn't belong in a BP that is about using other people's identifiers. The last paragraph of the example is almost exactly the same as the last paragraph before the example. 12. URIs for versions and series -- This BP is confusing two issues. One is the use of a shorter URI for the latest version of a dataset while also assigning a version-specific URI for it. The other issue is making a landing page for a collection of datasets. The initial intent was the former. The examples in the Why aren't series or groups except for the first item, yet they are introduced as examples of series or groups. How to Test says to check "that logical groups of datasets are also identifiable." That is vague. It should say "that a URI is also provided for the latest version or most recent real-time value." I don't think this applies to time series. What we're talking about here is use of dates for version identifiers. The example is incomplete; it doesn't say what the latest version URI would be. -- Annette Greiner NERSC Data and Analytics Services Lawrence Berkeley National Laboratory
Received on Friday, 15 April 2016 02:12:39 UTC