Re: Fwd: Best Practice 26.docx from Pieter Colpaert on 2015-12-22 (public-dwbp-wg@w3.org from December 2015)

From: Pieter Colpaert <pieter.colpaert@ugent.be>
Date: Tue, 22 Dec 2015 10:06:08 +0100
To: public-dwbp-wg@w3.org
Message-ID: <56791280.6080704@ugent.be>
Hi Annette,

Very interesting view which is very close to the (Linked) Data Fragments 
(LDF) axis idea [1,2]. It sets out the two extremes, data dump vs. query 
API services, out on an axis. The axis indicates that there are various 
options in between these two extremes.

The left-hand side could be seen as "data publishing", where a dataset 
can be fragmented in different subsets. The more the dataset is 
fragmented, the more expressive we could say the interface becomes, and 
the more we're moving to the right on the axis. I believe this way of 
thinking is perfect for publishing data for maximized reuse, over 
publishing a service for maximum expressiveness.

The right-hand side could be seen as "data services", where a HTTP 
service exposes all query functionality. This becomes tedious to keep 
high available, as on the open Web, you cannot predict the type of 
queries you'll receive. The query interface will be limited in 
expressiveness, moving to the left on the axis, to make sure it can 
serve specific use cases. I believe this way of thinking is perfect for 
data services that need to provide functionalities to apps.

When describing best practices for data on the Web, I'd only describe 
the left hand-side: more expressive servers by fragmenting datasets. 
This way, high availability, maximized reuse and the REST principles 
come first.

[1] Standard LDF axis: 
https://speakerdeck.com/pietercolpaert/using-open-data-to-promote-data-innovation-1?slide=12
[2] LDF axis for transport data: 
https://speakerdeck.com/pietercolpaert/linked-connections?slide=15

Kind regards,

Pieter

On 22-12-15 03:13, Annette Greiner wrote:
> Hi Peter,
> Your point about APIs enabling some restriction in what users have 
> access to is a good one, and I completely agree with it. That's what I 
> am referring to when I talk about subsetting data. I see that as an 
> important part of what makes using an API worthwhile. Your text seemed 
> to be saying that an API was a way to avoid subsetting, which may have 
> just been an ambiguity in the phrasing.
>
> Regarding simplicity of use, I admit that in many circumstances using 
> an API can be simpler than using an alternative, but that is not 
> always the case, and I don't think it's the true advantage of an API. 
> The true value is that it is an intentionally built programming 
> interface. One needn't rely on brittle scraping or complex workflows 
> that involve downloading and parsing through unnecessary data to force 
> it to interface with the rest of one's code.
>
> My concern with the question of how difficult it is to set up an API 
> is that we remain realistic about the effort involved. Most people 
> will be comparing the option of setting up an API against the option 
> of simply posting datasets for download as files. The latter is 
> definitely easier to do, so it would not be accurate to offer building 
> an API as an easier option. Admittedly, if the infrastructure is 
> already in place, enabling an API in a data management system can be 
> pretty easy, though setting it up correctly and documenting things 
> takes a bit of work and an understanding of what's going on under the 
> hood. Compared to copying a file into a directory, it's a step up.
>
> cheers,
> -Annette
>
>
>
>
> On 12/21/15 4:53 AM, Peter.Winstanley@gov.scot wrote:
>> Hi Annette
>>
>> re: resource-intensive queries When trying to maintain a quality of 
>> service one might want to prevent access to specific sets of queries 
>> that would involve significant table scans or high memory 
>> consumption, and the use of an API as an alternative to e.g. an open 
>> sparql endpoint is a way of constraining the query options so as to 
>> protect the overall service
>>
>> The simplicity angle is an important one insofar as it is part of the 
>> democratisation of access to data on the web.  If we can simplify the 
>> process of accessing datastores (e.g. through APIs) then a wider 
>> range of people will begin to make data-driven applications etc.
>>
>> The same applies to the 'elementary programming' bit.  If we don't 
>> let people know that it is not rocket science to provide a simple API 
>> to some simple data sets then they may body-swerve a useful 
>> additional bit of work.  Many people who commission work that gets 
>> data onto the web are driven by the need to show a website and not by 
>> the need to provide an API.  We need to use the BPs paper as an 
>> opportunity not only to give guidance and "what" and "why" but also 
>> some insight into the challenges of "how" and to help people overcome 
>> any inertia preventing adoption.
>>
>> The goal of the re-working was simply because in the meeting it was 
>> one of the elements of the document that was identified as being 
>> incomplete and in need of some work.
>>
>> Peter
>>
>> -----Original Message-----
>> From: Annette Greiner [mailto:amgreiner@lbl.gov]
>> Sent: 14 December 2015 21:24
>> To: public-dwbp-wg@w3.org
>> Subject: Re: Fwd: Best Practice 26.docx
>>
>> Peter,
>> Thanks for working to improve this.
>>
>> While I like the idea of explaining what an API is for those who may be
>> less familiar, we should be careful about how we define it. The main
>> alternatives to an API for web developers are downloads and scraping,
>> which are actually pretty simple but tedious approaches. I think the
>> value of an API for web development is not so much a matter of greater
>> simplicity but in having actual programmatic access, or hooks into the
>> data. The point is that an API is designed to explicitly enable
>> programming, whereas reusing without that requires grabbing more than
>> you want and munging the data. The last sentence of the first paragraph
>> suggests that REST is the only way to make an API, which is not the
>> case. Let's leave that argument out of this BP, as it's handled 
>> elsewhere.
>>
>> The second paragraph now reiterates the simplicity concept, which I
>> don't think is accurate or particularly helpful. As for protecting
>> against resource-intensive subsetting, I'm not sure what you mean. The
>> alternatives to using an API are not about subsetting and are not
>> particularly resource intensive; subsetting is actually a virtue of
>> using an API, because it allows one to download only the data needed
>> (something I've been pushing for a BP about for a long time, BTW).
>> Regarding other transport protocols than HTTP, I'm not sure what that
>> has to do with the intended outcome.
>>
>> As for the third paragraph, again, I don't think we should get into the
>> how-to-implement-REST discussion here. There is another BP for that.
>> Also, the suggestion that creating a web API for relational data is
>> "elementary programming" whereas RDF "can be provided with more
>> sophisticated APIs" strikes me as potentially a bit insulting to devs
>> who work with relational data.
>>
>> I'm curious what the goal of this reworking was. Perhaps we can find
>> other ways to address the underlying issues.
>> -Annette
>>
>> On 12/11/15 7:08 AM, Phil Archer wrote:
>>> This should be in the mail archive (Peter used an alternative e-mail
>>> address which is why it bounced)
>>>
>>>
>>> -------- Forwarded Message --------
>>> Subject: Best Practice 26.docx
>>> Date: Fri, 11 Dec 2015 14:43:55 +0000
>>> From: Peter.Winstanley@gov.scot
>>> To: public-dwbp-wg@w3.org
>>> CC: phila@w3.org, laufer@globo.com
>>>
>>>
>>>
>>> I have tried make some steps to improve the BP #26 from
>>> http://w3c.github.io/dwbp/bp.html#useanAPI
>>>
>>> Hope it is a helpful move.  It you think the direction is right then
>>> let me know and I'll complete.
>>>
>>> Peter
>>>
>>>
>>> **********************************************************************
>>> This e-mail (and any files or other attachments transmitted with it)
>>> is intended solely for the attention of the addressee(s). Unauthorised
>>> use, disclosure, storage, copying or distribution of any part of this
>>> e-mail is not permitted. If you are not the intended recipient please
>>> destroy the email, remove any copies from your system and inform the
>>> sender immediately by return.
>>>
>>> Communications with the Scottish Government may be monitored or
>>> recorded in order to secure the effective operation of the system and
>>> for other lawful purposes. The views or opinions contained within this
>>> e-mail may not necessarily reflect those of the Scottish Government.
>>>
>>>
>>> Tha am post-d seo (agus faidhle neo ceanglan  cÃ²mhla ris) dhan neach
>>> neo luchd-ainmichte a-mhÃ in. Chan eil e ceadaichte a chleachdadh ann
>>> an dÃ²igh sam bith, aâ toirt a-steach cÃ²raichean, foillseachadh neo
>>> sgaoileadh,  gun chead. Ma âs e is gun dâfhuair sibh seo le
>>> gun fhiosdâ, bu choir cur Ã s dhan phost-d agus lethbhreac sam bith
>>> air an t-siostam agaibh, leig fios chun  neach a sgaoil am post-d  gun
>>> dÃ il.
>>>
>>> Dhâfhaodadh gum bi teachdaireachd sam bith bho Riaghaltas na h-Alba
>>> air a chlÃ radh neo air a sgrÃ¹dadh airson dearbhadh gu bheil an
>>> siostam ag obair gu h-Ã¨ifeachdach neo airson adhbhar laghail
>>> eile. Dhâfhaodadh nach  eil beachdan anns aâ phost-d seo co-ionann
>>> ri beachdan Riaghaltas na h-Alba.
>>> **********************************************************************
>>>
>>>
>>>
>>> The original of this email was scanned for viruses by the Government
>>> Secure Intranet virus scanning service supplied by Vodafone in
>>> partnership with Symantec. (CCTM Certificate Number 2009/09/0052.)
>>> This email has been certified virus free.
>>> Communications via the GSi may be automatically logged, monitored
>>> and/or recorded for legal purposes.
>>>
>>>
>>>
>

-- 
+32486747122
Linked Open Transport Data researcher
UGent - MMLab - iMinds

Board of Directors Open Knowledge Belgium
http://openknowledge.be

Open Transport working group coordinator at Open Knowledge International
http://transport.okfn.org
Received on Tuesday, 22 December 2015 09:06:40 UTC