Re: A Web Annotation Protocol compliant "Search" API from Luca De Santis on 2020-05-18 (public-openannotation@w3.org from May 2020)

From: Luca De Santis <desantis@netseven.it>
Date: Mon, 18 May 2020 19:43:50 +0200
To: Gerben <gerben@treora.com>
Cc: public-openannotation@w3.org
Message-Id: <897A0068-F80F-4F9E-A7ED-5DB77B8009C3@netseven.it>
Dear Gerben,
thank you very much for your answer.

What you are proposing is quite intriguing, albeit my use case is much simpler.

In particular in the Triple research project my company is involved in (https://www.gotriple.eu/ <https://www.gotriple.eu/>), we are integrating our Pundit annotation tool.
In Triple we are building a discovery platform for “content” (e.g. articles) related to Social Science and Humanities. In Triple we would like to show if an article has been annotated with Pundit (and possibly with other interoperable annotation tools).

We need a very simple API on our Annotation Server that, given the document URL, returns the annotations in it. 
We already have our own API for this, as Hypothes.is <http://hypothes.is/> has (see https://hypothes.is/api/search?uri=<..> <https://hypothes.is/api/search?uri=%3C..%3E>).
I just wondered if there is a more Web Annotation Protocol-savvy way to do that.

For what I understand, also reading the first part of your email, the answer is no, which IMHO is quite a pity. 

Since the Annotation Container is quite a handy concept, we were thinking of implementing a new API for retrieving annotations based on ( https://www.w3.org/TR/annotation-protocol/#representations-with-annotation-descriptions ), adding a url parameter to filter only those belonging to that specific target.
The goal was to try and being more compliant as possible to the Web Annotation standard, but it seems that there isn’t a 100% savvy way of implementing our use case.

Am I wrong? Any other idea?

TIA!

Sincerely,
Luca




> Il giorno 18 mag 2020, alle ore 16:28, Gerben <gerben@treora.com> ha scritto:
> 
> Hello Luca and all,
> 
> Great that you bring this up; I have been intending to send a similar email to this list in the near future. Hereby!
> 
> As for the Web Annotation specs (tl;dr: looks like the WG never got around to spec a search method)
> 
> I have not been involved in the standardisation process, but my understanding has been that the Web Annotation Protocol was made to define how to interact with an “Annotation Container”, but not how to find (a URL for) a container[1]. I suppose that defining a search protocol could be boiled down to defining a URL template for containers, plus possibly defining a vocabulary to add search-specific information such as the relevancy of each result.
> 
> Specifying search appears to have been discussed in 2015 in issue 48 “Support for search”[2], with a resolution made in a call[3]:
> 
> RESOLUTION: The WG will consider a separate document defining a non-exclusive search interface to be published at least as a Note and potentially part of Protocol
> 
> As far as I can see, this plan did not turn into anything; I asked Benjamin Young about this, perhaps he (or other WG members) will follow up about what happened to this plan.
> 
> As for ways forward (tl;dr: should we extend OpenSearch or something?)
> 
> Personally I have been planning to take another stab at creating interoperable annotation services (I poked a bit at this nearly six years ago[4]). I plan to start with making a simple browser extension that lets the user subscribe to multiple annotation sources to receive their annotations; much like an RSS/Atom feed aggregator. It would contain a discovery mechanism (probably via <link> tags) so that the user can discover annotation services by visiting their websites (again, much like with RSS/Atom). It would appear like a button to subscribe to a blog, but now you can take it with you and get its content in context wherever you go on the web (who is following whom then?). :)
> 
> Different than with RSS/Atom, one would need the ability to search for a subset of relevant annotations, especially to get annotations targeting a given page. For this the most obvious prior art is OpenSearch[5]. It defines how a small XML file can be used to describe how to query a search engine. The file is discoverable through e.g. a <link rel="search"> tag, so that e.g. browsers can offer the user to use that search provider. The description would have an URL template to specify the endpoint to use, for example GitHub’s description document contains this line[6]:
> 
> <Url type="text/html" method="get" template="https://github.com/search?q={searchTerms}&ref=opensearch" <https://github.com/search?q={searchTerms}&ref=opensearch>/>
> 
> OpenSearch is designed to be extended and allows arbitrary parameters using xml namespaces, so we could introduce new parameter types as needed. In particular we would need the ability to pass a target URL instead of (or besides) the {searchTerms} parameter, plus any desired filters for author, date, etcetera. The URI template allows using custom namespaces[7], so we could invent something like this (taking the liberty of assuming a new vocabulary at "http://www.w3.org/ns/wap#" <http://www.w3.org/ns/wap#>):
> 
> <Url
>   type="application/ld+json;profile="http://www.w3.org/ns/anno.jsonld" <http://www.w3.org/ns/anno.jsonld>
>   method="get"
>   xmlns:wap="http://www.w3.org/ns/wap# <http://www.w3.org/ns/wap#>"
>   xmlns:oa="http://www.w3.org/ns/anno.jsonld#" <http://www.w3.org/ns/anno.jsonld#>
>   template="https://example.org/annotations?uri={wap:target?}&t={wap:createdAfter?}&by={oa:creator?}&q={searchTerms?}" <https://example.org/annotations?uri={wap:target?}&t={wap:createdAfter?}&by={oa:creator?}&q={searchTerms?}>
> />
> 
> Some advantages of extending OpenSearch, that I can think of:
> 
> We’d be extending an ecosystem instead of reinventing the wheel.
> Many aspects such as searching for text queries have already been defined, and will be understood by existing tools, which should make text search among annotations work out of the box with existing browsers or meta-search engines.
> OpenSearch descriptors can specify any (and multiple) response formats, each with its own URL template; a search server could thus provide an endpoint to get the search results as an Annotation Container, and another endpoint to obtain results in an HTML page, or Atom, etc.
> Autodiscovery of search services is part of the spec, so e.g. a website can include a <link rel="search" …> element to announce its annotation service.
> But also some possible disadvantages:
> 
> Although it might be the most popular standard for describing search endpoints, OpenSearch nowadays lacks a website or an organisation behind it, and appears mostly dormant since many years now. Trying to help blow life back into it seems possibly worthwhile but a big step.
> Adopting an existing spec introduces more complexity than may be required. For example, descriptors are expressed in XML, thus any tool would have to be able to parse XML to use it.
> It seems mainly designed for public, gratis search services. One may for example want a way to describe authentication methods to get a personal(ised) annotation feed. For mechanisms beyond just putting a secret code into the URL template, this may require another (ideally orthogonal) OpenSearch extension.
> In many cases one may want to describe more capabilities (e.g. creating annotations) that may seem inappropriate to shoehorn into OpenSearch; and if one find/creates separate ‘annotation service descriptor’ spec for those purposes, it is tempting to just describe the search endpoints in there.
> I would be very open to other suggestions than extending OpenSearch; it just seemed the most fitting solution I found so far. But perhaps some approach that makes use of the linked data ecosystem, like Linked Data Fragments[8], would be more a natural fit. Does anyone have tips?
> 
> Also it seems important to think about the bigger picture of annotation search services. While in a typical use case one may want to discover annotations from multiple sources on the web pages one visits, it seems undesirable to have to query each source with the URL of each page (again the question: who’s following whom?). To improve both on privacy and efficiency, I imagine one could use a trusted aggregation service that queries sources on behalf of the user, and which moreover might not run search queries but rather crawl (or subscribe to) the annotation services to get the content in bulk; somewhat like usual web search engines, except the user specifies which sources to crawl. While both the sources and the aggregator could in theory based on the same search protocol, such an architecture may be better of with extra protocol features both for the annotation sources (to support bulk annotation crawling/subscribing), and to the aggregators (e.g. a method to add a new source to subscribe to).
> 
> Whichever the approach will be, I think it would be great to collaborate to make some sort of interoperable annotation ecosystem. Thoughts welcome!
> 
> — Gerben
> 
> 
> 
> [1]: Except one discovery mechanism, serving only to let a resource announce that “Annotations on the resource SHOULD be created within the referenced Container”: https://www.w3.org/TR/annotation-protocol/#discovery-of-annotation-containers <https://www.w3.org/TR/annotation-protocol/#discovery-of-annotation-containers>
> [2]: https://github.com/w3c/web-annotation/issues/48
>  <https://github.com/w3c/web-annotation/issues/48>[3]: https://www.w3.org/2015/12/16-annotation-minutes.html#item03
>  <https://www.w3.org/2015/12/16-annotation-minutes.html#item03>[4]: See https://web.hypothes.is/blog/supporting-open-annotation/ <https://web.hypothes.is/blog/supporting-open-annotation/>
> [5]: https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md <https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md> ; https://en.wikipedia.org/wiki/OpenSearch <https://en.wikipedia.org/wiki/OpenSearch> ; https://web.archive.org/web/20180421215752/http://www.opensearch.org/Home
>  <https://en.wikipedia.org/wiki/OpenSearch>[6]: https://github.com/opensearch.xml <https://github.com/opensearch.xml>
> [7]: https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md#fully-qualified-parameter-names <https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md#fully-qualified-parameter-names> ; related: https://web.archive.org/web/20180408193434/http://www.opensearch.org/Specifications/OpenSearch/Extensions/Parameter/1.0 <https://web.archive.org/web/20180408193434/http://www.opensearch.org/Specifications/OpenSearch/Extensions/Parameter/1.0>
> [8]: https://linkeddatafragments.org/ <https://linkeddatafragments.org/>
>  <https://github.com/w3c/web-annotation/issues/48>
> 
> On 13/05/2020 22:37, Luca De Santis wrote:
>> Dear all,
>> I’m Luca De Santis of Net7, the company behind the Pundit annotation tool ( https://thepund.it <https://thepund.it/> ).
>> We are currently working on some updates of our tool. Amongst them, we are planning to develop an endpoint that supports, in read-only mode, the Web Annotation Protocol (WAP). Currently Pundit is compliant to the Web Annotation Data Model (well, quite compliant…).
>> 
>> Basically the APIs that we’d like to implement are:
>> 1. the (filtered) retrieval of “a group” of annotations
>> 2. the retrieval of a single annotation.
>> 
>> No problem for point 2, which is pretty clear.
>> 
>> Point 1, which corresponds in our use case to a “search for annotations”, is not completely clear to me. 
>> In fact, while the concept of “Annotation Containers” is very handy, I haven't seen a WAP compliant mode to pass parameters to filter results. 
>> Some examples of these parameters:
>> - the URI of the target document
>> - some conditions (e.g.: on author, date, etc).
>> 
>> Is there any standardization of the possible parameters to pass to filter annotations in a container? 
>> In particular we are planning to implement this method https://www.w3.org/TR/annotation-protocol/#representations-with-annotation-descriptions <https://www.w3.org/TR/annotation-protocol/#representations-with-annotation-descriptions> .
>> 
>> Other tools/services like Hypothes.is <http://hypothes.is/> or Europeana seem to have implemented a specific search endpoint (e.g. https://hypothes.is/api/search?uri=https://www.repubblica.it <https://hypothes.is/api/search?uri=https://www.repubblica.it> ), but if there is a clean and WAP complaint way to implement this feature I'd stick with it.
>> 
>> Any idea on that? Thanks in advance.
>> 
>> Regards,
>> Luca De Santis
>> 
>> -- 
>> ------------------------------------------------------------------------------------------------
>> Luca De Santis / Chief Technology Officer
>> desantis@netseven.it <mailto:desantis@netseven.it> 
>> www.netseven.it <http://www.netseven.it/>
>> +39 050 55 25 74   
>> +39 335 7376 153
>> skype: lucadex
>> 
>> <logomail.png>
>> 
>> via G. Carducci 60 | 56017 Ghezzano (PI) - Italy 
>> 
>> P.Iva e CF 01577590506 <>
>> CCIAA di Pisa n. 01577590506 <> del 26/04/2001
>> Capitale Sociale 10.000,00 €
>> ------------------------------------------------------------------------------------------------
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

-- 
------------------------------------------------------------------------------------------------
Luca De Santis / Chief Technology Officer
desantis@netseven.it <mailto:desantis@netseven.it> 
www.netseven.it <http://www.netseven.it/>
+39 050 55 25 74   
+39 335 7376 153
skype: lucadex



via G. Carducci 60 | 56017 Ghezzano (PI) - Italy 

P.Iva e CF 01577590506 <>
CCIAA di Pisa n. 01577590506 <> del 26/04/2001
Capitale Sociale 10.000,00 €
------------------------------------------------------------------------------------------------
Attachments

text/html attachment: stored
image/png attachment: logomail.png
Received on Monday, 18 May 2020 17:44:09 UTC