Re: Comments and questions about Data Access BP from Bernadette Farias Lóscio on 2016-04-08 (public-dwbp-wg@w3.org from April 2016)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Fri, 8 Apr 2016 09:33:43 -0300
To: Annette Greiner <amgreiner@lbl.gov>
Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-ID: <CANx1Pzytvicw9n6Yn1pLw4Gi9U36iwYCgN_7zV2mTc2DAnJBjg@mail.gmail.com>
Hi Annette,

Thank you for your helpful review and comments! I already made some
updates, but I still have some comments.

1. Introduction
>
> I’m not sure if the following paragraph fits in this section:
>
> On a further note, it can be observed that data on the Web is essentially
> about the description of entities identified by a unique, Web-based,
> identifier (an URI). Once the data is dumped and sent to an institute
> specialised in digital preservation the link with the Web is broken
> (dereferencing) but the role of the URI as a unique identifier still
> remains. In order to increase the usability of preserved dataset dumps it
> is relevant to maintain a list of these identifiers.
>
>
> I agree. I don't think that fits.
>


---> Removed!

>
> 2. BP 19 Provide bulk download
>
> Data or datasets should be available for bulk download? I think the BP
> should refer to datasets instead of data. I think the meaning of bulk
> download should be more clear.
>
> I think "datasets" is fine, as you suggest.
>

---> done!

>
>
> I don’t understand this phrase: “When Web data is distributed across many
> URIs but might logically be organized as one container, accessing the data
> in bulk can be useful." Again, I think the BP should consider datasets
> instead of data.
>
> As I understand it, the idea is that, if you have data that would
> logically be organized as a dataset but it is spread over multiple
> endpoints (for example, it's available piecewise through an API or through
> subsets for download), so that getting a copy of the entire dataset would
> require multiple requests, that would be a pain in the neck to reassemble
> as the complete dataset. Since it's referring to the dataset being broken
> up, "data" makes more sense. Does it help to s/container/dataset/?
>

---> I see Annette! I think dataset is better than container.

>
>
> I’m not sure if I understood the example. Is one dataset with multiple CSV
> files? or multiple datasets each one with a CSV distribution? The bulk
> download contains one dataset or multiple datasets?
>
> It's probably best to think of it as one dataset with multiple CSV files.
> The bulk download contains one dataset. But the definition of a dataset is
> pretty flexible, and one person's dataset is another person's collection or
> subset, so the term "dataset" can be confusing in this context.
>

---> I understand, but I'm not sure if this clear for the public. Let's
keep like this and let's try to have some feedback from the community.

>
> 3. Best Practice 20: Provide Subsets for Large Datasets
>
> In the example, can we use CSV format instead of PDF format?
>
> I was trying to keep it realistic, thinking of what transit agencies
> really do. I suppose we could use CSV, but it would be less realistic. I
> think PDF is fine in addition to having an API.
>

---> I think it would be better to use CSV than PDF because we are always
talking about machines be able to process the data. In this case, CSV is
better, no?

>
> R-Citable is an evidence for this BP?
>
> Having a separate URI for the subset makes the subset citable.
>

---> Ok! I agree!

>
>
> 4. BP 23 Provide data up to date
>
> The description of BP 23 says: “Data must be available in an up-to-date
> manner and the update frequency made explicit. " But the BP doesn’t mention
> how to make the update frequency available. I suggest to remove   “and the
> update frequency made explicit" from the description.
>
> Yeah, the update frequency often is not predictable. I do like the idea of
> reporting the frequency when it is known. If we don't have a recommendation
> about how to do that, I think we can still suggest that people do it.
> It looks like DCAT found a way of doing that in machine-readable form [1],
> though the link resolves to a page that doesn't look very official. If
> nothing else, one can include a textual statement in the documentation.
>

---> I think we should rewrite this BP to make this more explicit.

>
> 5. BP 25 : Use Web Standards as the foundation of your API"
> Is possible to rewrite the description of the BP to make the text smaller?
> In general, BP descriptions are one or two lines.
>
> I agree it's awfully long. I'd suggest
> "When designing APIs, use an architectural style that is founded on the
> technologies of the Web itself."
>
> If some people insist that we need to list the technologies, we could say
> "When designing APIs, use an architectural style that is founded on the
> technologies of the Web itself, such as URIs, HTTP verbs, HTTP response
> codes, MIME types, typed HTTP Links, and content negotiation."
>
> ---> I used the first one!

> I’m not sure if the example is suitable for this BP. Maybe the example
> needs a better explanation or the BP needs a better example :)
>
> That example shows what makes a hypermedia API a hypermedia API. I would
> want to keep that but maybe add an example for REST more generally. It's
> difficult for me to think of a way to show an example of a REST API,
> though, other than linking to one (possibly https://w3c.github.io/w3c-api/).
> Or do we want to build and host a little example REST API for the transit
> agency?
>

---> It would be great if we could build a little example REST API for the
transit agency. Is it possible?

>
> The same for the the How to test section: “Check that the service avoids
> using http as a tunnel for calls to custom methods, and check that URIs do
> not contain method names”. I don’t see how this is a test about using Web
> standards.
>
> The way to implement a nonstandard architecture on the web is to hide it
> within standard calls. Using http as a tunnel for custom methods rather
> than using http itself is symptomatic of not using http for anything other
> than a transport mechanism. URIs that contain method names are a dead
> giveaway that one is inventing new methods rather than using http verbs and
> URIs.
>

---> Ok Annette! Thanks a lot for the explanation. I'm learning a lot about
APIs :)

>
> 6. BP 26: Provide complete documentation for your API
>
> It would be better if the example of this BP should be related with the
> bus stops example.
>
>
> I agree. Maybe we need to implement an example transit API doc site in
> Swagger or something. If we want an equally nice example as the pet store
> one, that's not trivial.
>

---> Again, it would be great to have an example for the Transit Agency.
Let's see if we can work on that.

>
> I think the following phrases should be on the approach to implementation
> and not on the how to test section: “The quality of documentation is also
> related to usage and feedback from developers. Try to get constant feedback
> from your users about the documentation."
>
> I agree.
>

---> ok! moved!

>
> 7. BP 27 Avoid Breaking Changes to Your API
>
> The how to test section  seems more like an approach to implementation
> than to a test. Is it possible to rewrite?
>
> I disagree. The bit about testing shows how to test that changes to the
> API do not break it, which is not the same as showing how to implement
> changes to the API. It is literally how to test it.
>

ok! Now I undesrtand and I agree with you! Let's keep the original test ;)


>
> It would be great to have an example that also uses the bus stop dataset.
> Maybe the example of BP 27 can be related with the example of BP 26.
>
> Maybe we could add something like this:
>
> Suppose the MyCity transit agency's API responds to a request for a
> certain bus's arrival time at a single station as
> http://api.mycitytransit.example.org/arrivals/buses/53/stop/12, but the
> agency decides it wants to make it possible to query for a range of stops
> at once. Rather than change the form of the request to require a range,
> like http://api.mycitytransit.example.org/arrivals/buses/53/stop/12-12,
> the agency can keep the old API call and add a new one for multiple
> arrivals, like
> http://api.mycitytransit.example.org/arrivals/buses/53/stops/1-12.
>

Nice! Example added!

Just summarizing, let's see if:
- we can improve the BP Provide data up to date
- we can add examples for BP 25 and BP 26 using the transit agency example

Thanks a lot!
Berna

>
> Thanks a lot!
> Bernadette
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
> ----------------------------------------------------------------------------
>
> [1] https://www.w3.org/TR/vocab-dcat/ " In order to express frequency of
> update in the example above, we chose to use an instance from the Content-Oriented
> Guidelines <http://www.w3.org/TR/vocab-data-cube/#dsd-cog> developed as
> part of the W3C Data Cube Vocabulary efforts."
>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Friday, 8 April 2016 12:34:32 UTC