RE: Best API to consume SKOS data from Armando Stellato on 2020-03-16 (public-esw-thes@w3.org from March 2020)

From: Armando Stellato <stellato@uniroma2.it>
Date: Mon, 16 Mar 2020 15:14:04 +0000
To: Vladimir Alexiev <vladimir.alexiev@ontotext.com>, "public-esw-thes@w3.org" <public-esw-thes@w3.org>
CC: Jem Rayfield <jem.rayfield@ontotext.com>
Message-ID: <DB6PR1001MB1013508A3A78B4BC29B279C5C7F90@DB6PR1001MB1013.EURPRD10.PROD.OUTLOOK.>
Dear Vladimir,

thanks to you for the, as usual, very interesting pointers.

First, a reply on the pagination. Currently, we do not support it, but there’s a reason and an alternative…and future development.

While being a general purpose RDF platform, ST has until now (manly) backed:


  1.  Editors, as VocBench, the old Semantic Turkey Firefox extension and a few more developed in specific projects (so we assume the data can change quite quickly, even in the middle of a paginated request)
  2.  General-purpose editing/browsing software, not specific data that might be well-prepared for optimizing some services (though this happened as well)

As additional background: ST provides some facilities for the API developer, aiming at letting the developer focus on the results to be returned and not on their presentation. The “presentation” includes additional, usual, information that is always retrieved in (almost) all services which includes things such as:

  *   Rendering of the resource. The system has an extension point for customizing the rendering. The basic one foresees a lexicalization-based renderer (the kind of lexicalization being retrieved: rdfs labels, skos labels, skosxl labels or ontolex lexical forms, depends on the lexical model specified for the project) with the possibility to configure some customization based on other properties and a template to be defined by the user (e.g. something like <skos:notation>_<list of labels>. The list of labels is always configurable by each user.
  *   Additional information, such as its type, where the type has been defined, if the resource is locally defined (it’s in the working data graph and not, for instance, in an imported ontology or being inferred) etc..
All of these things are always added automatically to the query developed for the service (which can thus be only focused on the resources to be returned) by means of dedicated query builders.

Given all of the above, we preferred to avoid pagination, which would not be synchronized and which, in any case, would require to compute all results for sorting them. The typical pagination, in general, is based on 1) stable data, usually read-only (so to allow even basic pagination implementations such as offset-based techniques) 2) presence of IDs to be used for the sorting, which can be queried in advance without computing all the query (keyset and seek pagination techniques) while on a general purpose exploration of data, we cannot assume to have such IDs.
Simply, over a certain number of results, our data consuming clients are supposed (this is supported by the services, which can be configured to throw an exception if too many results are given, or force the results to be returned) to switch (by informing the user) to search-based mode. That is, our same panels that are normally used for visualizing all results, in this case are populated with data matching only certain search constraints.
Additionally, search can be performed in several ways: simple text search (configurable, to include labels, URIs, notes etc..), advanced search (possibility to define constraints on subjects of ingoing triples or values of outgoing ones) and custom search (stored SPARQL queries for more elaborated searches with variables bound to user forms)

I’ve mentioned also the UI here, while the question was about API, but this gives the full picture and explains some choices on the API. Add that in many cases (still in the orders of thousands of results, wouldn’t go for that on larger orders of magnitude) some optimization is possible (and we do in VocBench) client side, by caching server results without building the related structures in the client, doing it lazily upon request (sort of client-side pagination).

Given that, and considering that now we have this read-only system backed by Semantic Turkey, we are considering a possible addendum of/switch to API providing pagination. Besides common techniques used on separated requests (above mentioned “offset”, “keyset” and “seek” approaches) another possibility is to keep connections open on the resultset between ST and the triplestore, using a token based increment having the iterator advancing and consuming data on request (still have to inspect the possibility).

Going to the API, thanks for the pointers.
I must confess that I superficially know GraphQL as, from my perspective, I considered it a possible addenda to the range of data provision API, yet not necessarily the main one. In general, it seems a good standard for the creation of Content APIs for specific applications yet I didn’t see the added value for our target (obviously, not discarding the hypothesis, just had different priorities to evaluate).
I didn’t give back a look at LDP since before it went into recommendation status. In particular, I see there are dedicated specs for paging (https://www.w3.org/TR/ldp-paging/) and definitely I should look into it.
The idea behind Hydra also looks very promising and as we are developing the possibility to allow the user to create custom services, so allowing for the creation of a vocabulary for them is surely an interesting feature.

Sorry for the lengthy reply :-) besides the further discussion that emerged, I hope it can give more background to Philippe for his investigation,

Kind regards,

Armando





From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
Sent: Sunday, March 15, 2020 2:41 PM
To: public-esw-thes@w3.org
Cc: Jem Rayfield <jem.rayfield@ontotext.com>
Subject: Re: Best API to consume SKOS data

Everyone, thanks for your feedback!

JSKOS is not just a spec, but also 14 open source softwares, including a mapping server and a HTTP server.

None of these seem to support pagination? Armando, how about your API?

Seth, I agree with your sentiment against custom APIs that make no relation to RDF, but how do you represent operations like
- search or
- "gave me en labels only not fr, and prefer en-CA if available" ?

Is there interest in the community for APIs that are based on:
- LDP, which is the w3c spec for a LOD APIs
- Hydra, which allows to represent API operations as RDF, return links for further operations on an object, and allows the development of generic API clients
- GraphQL, which sees a huge adoption amongst frontend devs and recent popularity amongst RDF devs, including TopQuadrant, the Ontotext Platform, a w3 Community Group, etc

Btw, all these 3 support pagination, though in different ways.
Received on Monday, 16 March 2020 15:14:22 UTC