Re: Comments and questions about Data Access BP from Annette Greiner on 2016-04-08 (public-dwbp-wg@w3.org from April 2016)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Fri, 8 Apr 2016 09:35:43 -0700
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Phil Archer <phila@w3.org>
Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-Id: <FA80340A-8ECE-4AFB-8FBE-53DB8727FD19@lbl.gov>
Okay,
So I think we should have a different example for the subsetting, since I don’t think it makes sense to distribute bus schedules to the general public via CSV.
Let me think about a new one. 
I agree that we could improve the text of the data-up-to-date BP.
It would be fun to build a little example API and document it with Swagger, but that will take some time. We also need to have a place to host it. What are the options? Phil?
-Annette

> On Apr 8, 2016, at 5:33 AM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> wrote:
> 
> Hi Annette,
> 
> Thank you for your helpful review and comments! I already made some updates, but I still have some comments.
> 
>> 1. Introduction 
>> 
>> I’m not sure if the following paragraph fits in this section:
>> 
>> On a further note, it can be observed that data on the Web is essentially about the description of entities identified by a unique, Web-based, identifier (an URI). Once the data is dumped and sent to an institute specialised in digital preservation the link with the Web is broken (dereferencing) but the role of the URI as a unique identifier still remains. In order to increase the usability of preserved dataset dumps it is relevant to maintain a list of these identifiers. 
> 
> I agree. I don't think that fits.
> 
> 
> ---> Removed! 
>> 
>> 2. BP 19 Provide bulk download
>> 
>> Data or datasets should be available for bulk download? I think the BP should refer to datasets instead of data. I think the meaning of bulk download should be more clear.
> I think "datasets" is fine, as you suggest.
> 
> ---> done! 
> 
>> 
>> I don’t understand this phrase: “When Web data is distributed across many URIs but might logically be organized as one container, accessing the data in bulk can be useful." Again, I think the BP should consider datasets instead of data.
> As I understand it, the idea is that, if you have data that would logically be organized as a dataset but it is spread over multiple endpoints (for example, it's available piecewise through an API or through subsets for download), so that getting a copy of the entire dataset would require multiple requests, that would be a pain in the neck to reassemble as the complete dataset. Since it's referring to the dataset being broken up, "data" makes more sense. Does it help to s/container/dataset/?
> 
> ---> I see Annette! I think dataset is better than container.  
> 
>> 
>> I’m not sure if I understood the example. Is one dataset with multiple CSV files? or multiple datasets each one with a CSV distribution? The bulk download contains one dataset or multiple datasets?
> It's probably best to think of it as one dataset with multiple CSV files. The bulk download contains one dataset. But the definition of a dataset is pretty flexible, and one person's dataset is another person's collection or subset, so the term "dataset" can be confusing in this context.
> 
> ---> I understand, but I'm not sure if this clear for the public. Let's keep like this and let's try to have some feedback from the community. 
>> 
>> 3. Best Practice 20: Provide Subsets for Large Datasets
>> 
>> In the example, can we use CSV format instead of PDF format?
> I was trying to keep it realistic, thinking of what transit agencies really do. I suppose we could use CSV, but it would be less realistic. I think PDF is fine in addition to having an API.
> 
> ---> I think it would be better to use CSV than PDF because we are always talking about machines be able to process the data. In this case, CSV is better, no? 
>> 
>> R-Citable is an evidence for this BP?
> Having a separate URI for the subset makes the subset citable.
> 
> ---> Ok! I agree! 
> 
>> 
>> 4. BP 23 Provide data up to date 
>> 
>> The description of BP 23 says: “Data must be available in an up-to-date manner and the update frequency made explicit. " But the BP doesn’t mention how to make the update frequency available. I suggest to remove   “and the update frequency made explicit" from the description.
> Yeah, the update frequency often is not predictable. I do like the idea of reporting the frequency when it is known. If we don't have a recommendation about how to do that, I think we can still suggest that people do it.
> It looks like DCAT found a way of doing that in machine-readable form [1], though the link resolves to a page that doesn't look very official. If nothing else, one can include a textual statement in the documentation.
> 
> ---> I think we should rewrite this BP to make this more explicit.   
>> 
>> 5. BP 25 : Use Web Standards as the foundation of your API"
>> Is possible to rewrite the description of the BP to make the text smaller? In general, BP descriptions are one or two lines. 
>> 
> I agree it's awfully long. I'd suggest
> "When designing APIs, use an architectural style that is founded on the technologies of the Web itself."
> 
> If some people insist that we need to list the technologies, we could say
> "When designing APIs, use an architectural style that is founded on the technologies of the Web itself, such as URIs, HTTP verbs, HTTP response codes, MIME types, typed HTTP Links, and content negotiation."
> 
> ---> I used the first one! 
>> I’m not sure if the example is suitable for this BP. Maybe the example needs a better explanation or the BP needs a better example :)
> That example shows what makes a hypermedia API a hypermedia API. I would want to keep that but maybe add an example for REST more generally. It's difficult for me to think of a way to show an example of a REST API, though, other than linking to one (possibly https://w3c.github.io/w3c-api/). Or do we want to build and host a little example REST API for the transit agency?
> 
> ---> It would be great if we could build a little example REST API for the transit agency. Is it possible? 
>> 
>> The same for the the How to test section: “Check that the service avoids using http as a tunnel for calls to custom methods, and check that URIs do not contain method names”. I don’t see how this is a test about using Web standards. 
> The way to implement a nonstandard architecture on the web is to hide it within standard calls. Using http as a tunnel for custom methods rather than using http itself is symptomatic of not using http for anything other than a transport mechanism. URIs that contain method names are a dead giveaway that one is inventing new methods rather than using http verbs and URIs.
> 
> ---> Ok Annette! Thanks a lot for the explanation. I'm learning a lot about APIs :) 
>> 
>> 6. BP 26: Provide complete documentation for your API
>> 
>> It would be better if the example of this BP should be related with the bus stops example. 
> 
> I agree. Maybe we need to implement an example transit API doc site in Swagger or something. If we want an equally nice example as the pet store one, that's not trivial.
> 
> ---> Again, it would be great to have an example for the Transit Agency. Let's see if we can work on that. 
>> 
>> I think the following phrases should be on the approach to implementation and not on the how to test section: “The quality of documentation is also related to usage and feedback from developers. Try to get constant feedback from your users about the documentation." 
> I agree.
> 
> ---> ok! moved! 
>> 
>> 7. BP 27 Avoid Breaking Changes to Your API
>> 
>> The how to test section  seems more like an approach to implementation than to a test. Is it possible to rewrite?
> I disagree. The bit about testing shows how to test that changes to the API do not break it, which is not the same as showing how to implement changes to the API. It is literally how to test it.
> 
> ok! Now I undesrtand and I agree with you! Let's keep the original test ;)
>  
>> 
>> It would be great to have an example that also uses the bus stop dataset. Maybe the example of BP 27 can be related with the example of BP 26.
> Maybe we could add something like this:
> 
> Suppose the MyCity transit agency's API responds to a request for a certain bus's arrival time at a single station as http://api.mycitytransit.example.org/arrivals/buses/53/stop/12, but the agency decides it wants to make it possible to query for a range of stops at once. Rather than change the form of the request to require a range, like http://api.mycitytransit.example.org/arrivals/buses/53/stop/12-12, the agency can keep the old API call and add a new one for multiple arrivals, likehttp://api.mycitytransit.example.org/arrivals/buses/53/stops/1-12.
> 
> Nice! Example added!
> 
> Just summarizing, let's see if:
> - we can improve the BP Provide data up to date
> - we can add examples for BP 25 and BP 26 using the transit agency example
> 
> Thanks a lot!
> Berna 
>> 
>> Thanks a lot!
>> Bernadette
>> 
>> 
>> -- 
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>> ----------------------------------------------------------------------------
> [1] https://www.w3.org/TR/vocab-dcat/ " In order to express frequency of update in the example above, we chose to use an instance from the Content-Oriented Guidelines developed as part of the W3C Data Cube Vocabulary efforts."
> -- 
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 
> 
> 
> 
> 
> -- 
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
> ----------------------------------------------------------------------------
Received on Friday, 8 April 2016 16:36:17 UTC