Re: Comments and questions about Data Access BP from Annette Greiner on 2016-04-06 (public-dwbp-wg@w3.org from April 2016)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Wed, 6 Apr 2016 14:45:47 -0700
To: public-dwbp-wg@w3.org
Message-ID: <5705838B.8040503@lbl.gov>
Hi Bernadette,
Please see my comments inline.
Thanks for your diligence!
-Annette

On 4/5/16 8:40 PM, Bernadette Farias Lóscio wrote:
>
> Hi all,
>
> I am reviewing the DWBP document and I have some comments/questions 
> about the Data Access Section.
>
> @Annette, as you wrote big part of this section, I'd like to kindly 
> ask your help with the following comments.
>
> 1. Introduction
>
> I’m not sure if the following paragraph fits in this section:
>
> On a further note, it can be observed that data on the Web is 
> essentially about the description of entities identified by a unique, 
> Web-based, identifier (an URI). Once the data is dumped and sent to an 
> institute specialised in digital preservation the link with the Web is 
> broken (dereferencing) but the role of the URI as a unique identifier 
> still remains. In order to increase the usability of preserved dataset 
> dumps it is relevant to maintain a list of these identifiers.

I agree. I don't think that fits.
>
> 2. BP 19 Provide bulk download
>
> Data or datasets should be available for bulk download? I think the BP 
> should refer to datasets instead of data. I think the meaning of bulk 
> download should be more clear.
I think "datasets" is fine, as you suggest.

>
> I don’t understand this phrase: “When Web data is distributed across 
> many URIs but might logically be organized as one container, accessing 
> the data in bulk can be useful." Again, I think the BP should consider 
> datasets instead of data.
As I understand it, the idea is that, if you have data that would 
logically be organized as a dataset but it is spread over multiple 
endpoints (for example, it's available piecewise through an API or 
through subsets for download), so that getting a copy of the entire 
dataset would require multiple requests, that would be a pain in the 
neck to reassemble as the complete dataset. Since it's referring to the 
dataset being broken up, "data" makes more sense. Does it help to 
s/container/dataset/?

>
> I’m not sure if I understood the example. Is one dataset with multiple 
> CSV files? or multiple datasets each one with a CSV distribution? The 
> bulk download contains one dataset or multiple datasets?
It's probably best to think of it as one dataset with multiple CSV 
files. The bulk download contains one dataset. But the definition of a 
dataset is pretty flexible, and one person's dataset is another person's 
collection or subset, so the term "dataset" can be confusing in this 
context.
>
> 3. Best Practice 20: Provide Subsets for Large Datasets
>
> In the example, can we use CSV format instead of PDF format?
I was trying to keep it realistic, thinking of what transit agencies 
really do. I suppose we could use CSV, but it would be less realistic. I 
think PDF is fine in addition to having an API.
>
> R-Citable is an evidence for this BP?
Having a separate URI for the subset makes the subset citable.

>
> 4. BP 23 Provide data up to date
>
> The description of BP 23 says: “Data must be available in an 
> up-to-date manner and the update frequency made explicit. " But the BP 
> doesn’t mention how to make the update frequency available. I suggest 
> to remove   “and the update frequency made explicit" from the description.
Yeah, the update frequency often is not predictable. I do like the idea 
of reporting the frequency when it is known. If we don't have a 
recommendation about how to do that, I think we can still suggest that 
people do it.
It looks like DCAT found a way of doing that in machine-readable form 
[1], though the link resolves to a page that doesn't look very official. 
If nothing else, one can include a textual statement in the documentation.
>
> 5. BP 25 : Use Web Standards as the foundation of your API"
> Is possible to rewrite the description of the BP to make the text 
> smaller? In general, BP descriptions are one or two lines.
>
I agree it's awfully long. I'd suggest
"When designing APIs, use an architectural style that is founded on the 
technologies of the Web itself."

If some people insist that we need to list the technologies, we could say
"When designing APIs, use an architectural style that is founded on the 
technologies of the Web itself, such as URIs, HTTP verbs, HTTP response 
codes, MIME types, typed HTTP Links, and content negotiation."


> I’m not sure if the example is suitable for this BP. Maybe the example 
> needs a better explanation or the BP needs a better example :)
That example shows what makes a hypermedia API a hypermedia API. I would 
want to keep that but maybe add an example for REST more generally. It's 
difficult for me to think of a way to show an example of a REST API, 
though, other than linking to one (possibly 
https://w3c.github.io/w3c-api/). Or do we want to build and host a 
little example REST API for the transit agency?
>
> The same for the the How to test section: “Check that the service 
> avoids using http as a tunnel for calls to custom methods, and check 
> that URIs do not contain method names”. I don’t see how this is a test 
> about using Web standards.
The way to implement a nonstandard architecture on the web is to hide it 
within standard calls. Using http as a tunnel for custom methods rather 
than using http itself is symptomatic of not using http for anything 
other than a transport mechanism. URIs that contain method names are a 
dead giveaway that one is inventing new methods rather than using http 
verbs and URIs.
>
> 6. BP 26: Provide complete documentation for your API
>
> It would be better if the example of this BP should be related with 
> the bus stops example.

I agree. Maybe we need to implement an example transit API doc site in 
Swagger or something. If we want an equally nice example as the pet 
store one, that's not trivial.
>
> I think the following phrases should be on the approach to 
> implementation and not on the how to test section: “The quality of 
> documentation is also related to usage and feedback from developers. 
> Try to get constant feedback from your users about the documentation."
I agree.
>
> 7. BP 27 Avoid Breaking Changes to Your API
>
> The how to test section  seems more like an approach to implementation 
> than to a test. Is it possible to rewrite?
I disagree. The bit about testing shows how to test that changes to the 
API do not break it, which is not the same as showing how to implement 
changes to the API. It is literally how to test it.
>
> It would be great to have an example that also uses the bus stop 
> dataset. Maybe the example of BP 27 can be related with the example of 
> BP 26.
Maybe we could add something like this:

Suppose the MyCity transit agency's API responds to a request for a 
certain bus's arrival time at a single station as 
http://api.mycitytransit.example.org/arrivals/buses/53/stop/12, but the 
agency decides it wants to make it possible to query for a range of 
stops at once. Rather than change the form of the request to require a 
range, like 
http://api.mycitytransit.example.org/arrivals/buses/53/stop/12-12, the 
agency can keep the old API call and add a new one for multiple 
arrivals, like 
http://api.mycitytransit.example.org/arrivals/buses/53/stops/1-12.
>
> Thanks a lot!
> Bernadette
>
>
> -- 
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
> ----------------------------------------------------------------------------
[1] https://www.w3.org/TR/vocab-dcat/ " In order to express frequency of 
update in the example above, we chose to use an instance from the 
Content-Oriented Guidelines 
<http://www.w3.org/TR/vocab-data-cube/#dsd-cog> developed as part of the 
W3C Data Cube Vocabulary efforts."

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Wednesday, 6 April 2016 21:46:18 UTC