Re: A Web Annotation Protocol compliant "Search" API

Hello Luca and all,

Great that you bring this up; I have been intending to send a similar
email to this list in the near future. Hereby!

*As for the Web Annotation specs* (tl;dr: looks like the WG never got
around to spec a search method)
**

I have not been involved in the standardisation process, but my
understanding has been that the Web Annotation Protocol was made to
define how to interact with an “Annotation Container”, but not how to
find (a URL for) a container[1]. I suppose that defining a search
protocol could be boiled down to defining a URL template for containers,
plus possibly defining a vocabulary to add search-specific information
such as the relevancy of each result.

Specifying search appears to have been discussed in 2015 in issue 48
“Support for search”[2], with a resolution made in a call[3]:

    RESOLUTION: The WG will consider a separate document defining a
    non-exclusive search interface to be published at least as a Note
    and potentially part of Protocol

As far as I can see, this plan did not turn into anything; I asked
Benjamin Young about this, perhaps he (or other WG members) will follow
up about what happened to this plan.

*As for ways forward* (tl;dr: should we extend OpenSearch or something?)*
*

Personally I have been planning to take another stab at creating
interoperable annotation services (I poked a bit at this nearly six
years ago[4]). I plan to start with making a simple browser extension
that lets the user subscribe to multiple annotation sources to receive
their annotations; much like an RSS/Atom feed aggregator. It would
contain a discovery mechanism (probably via <link> tags) so that the
user can discover annotation services by visiting their websites (again,
much like with RSS/Atom). It would appear like a button to subscribe to
a blog, but now you can take it with you and get its content in context
wherever you go on the web (who is following whom then?). :)

Different than with RSS/Atom, one would need the ability to search for a
subset of relevant annotations, especially to get annotations targeting
a given page. For this the most obvious prior art is OpenSearch[5]. It
defines how a small XML file can be used to describe how to query a
search engine. The file is discoverable through e.g. a <link
rel="search"> tag, so that e.g. browsers can offer the user to use that
search provider. The description would have an URL template to specify
the endpoint to use, for example GitHub’s description document contains
this line[6]:

    <Url type="text/html" method="get"
    template="https://github.com/search?q={searchTerms}&ref=opensearch"/>

OpenSearch is designed to be extended and allows arbitrary parameters
using xml namespaces, so we could introduce new parameter types as
needed. In particular we would need the ability to pass a target URL
instead of (or besides) the {searchTerms} parameter, plus any desired
filters for author, date, etcetera. The URI template allows using custom
namespaces[7], so we could invent something like this (taking the
liberty of assuming a new vocabulary at "http://www.w3.org/ns/wap#"):

    <Url
      type="application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"
      method="get"
      xmlns:wap="http://www.w3.org/ns/wap#"
      xmlns:oa="http://www.w3.org/ns/anno.jsonld#"
     
    template="https://example.org/annotations?uri={wap:target?}&t={wap:createdAfter?}&by={oa:creator?}&q={searchTerms?}"
    />

Some advantages of extending OpenSearch, that I can think of:

  * We’d be extending an ecosystem instead of reinventing the wheel.
  * Many aspects such as searching for text queries have already been
    defined, and will be understood by existing tools, which should make
    text search among annotations work out of the box with existing
    browsers or meta-search engines.
  * OpenSearch descriptors can specify any (and multiple) response
    formats, each with its own URL template; a search server could thus
    provide an endpoint to get the search results as an Annotation
    Container, and another endpoint to obtain results in an HTML page,
    or Atom, etc.
  * Autodiscovery of search services is part of the spec, so e.g. a
    website can include a <link rel="search" …> element to announce its
    annotation service.

But also some possible disadvantages:

  * Although it might be the most popular standard for describing search
    endpoints, OpenSearch nowadays lacks a website or an organisation
    behind it, and appears mostly dormant since many years now. Trying
    to help blow life back into it seems possibly worthwhile but a big step.
  * Adopting an existing spec introduces more complexity than may be
    required. For example, descriptors are expressed in XML, thus any
    tool would have to be able to parse XML to use it.
  * It seems mainly designed for public, gratis search services. One may
    for example want a way to describe authentication methods to get a
    personal(ised) annotation feed. For mechanisms beyond just putting a
    secret code into the URL template, this may require another (ideally
    orthogonal) OpenSearch extension.
  * In many cases one may want to describe more capabilities (e.g.
    creating annotations) that may seem inappropriate to shoehorn into
    OpenSearch; and if one find/creates separate ‘annotation service
    descriptor’ spec for those purposes, it is tempting to just describe
    the search endpoints in there.

I would be very open to other suggestions than extending OpenSearch; it
just seemed the most fitting solution I found so far. But perhaps some
approach that makes use of the linked data ecosystem, like Linked Data
Fragments[8], would be more a natural fit. Does anyone have tips?

Also it seems important to think about the bigger picture of annotation
search services. While in a typical use case one may want to discover
annotations from multiple sources on the web pages one visits, it seems
undesirable to have to query each source with the URL of each page
(again the question: who’s following whom?). To improve both on privacy
and efficiency, I imagine one could use a trusted aggregation service
that queries sources on behalf of the user, and which moreover might not
run search queries but rather crawl (or subscribe to) the annotation
services to get the content in bulk; somewhat like usual web search
engines, except the user specifies which sources to crawl. While both
the sources and the aggregator could in theory based on the same search
protocol, such an architecture may be better of with extra protocol
features both for the annotation sources (to support bulk annotation
crawling/subscribing), and to the aggregators (e.g. a method to add a
new source to subscribe to).

Whichever the approach will be, I think it would be great to collaborate
to make some sort of interoperable annotation ecosystem. Thoughts welcome!

— Gerben


[1]: Except one discovery mechanism, serving only to let a resource
announce that “Annotations on the resource /SHOULD/ be created within
the referenced Container”:
https://www.w3.org/TR/annotation-protocol/#discovery-of-annotation-containers
[2]: https://github.com/w3c/web-annotation/issues/48
[3]: https://www.w3.org/2015/12/16-annotation-minutes.html#item03
[4]: See https://web.hypothes.is/blog/supporting-open-annotation/
[5]:
https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md
; https://en.wikipedia.org/wiki/OpenSearch ;
https://web.archive.org/web/20180421215752/http://www.opensearch.org/Home
<https://en.wikipedia.org/wiki/OpenSearch>[6]:
https://github.com/opensearch.xml
[7]:
https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md#fully-qualified-parameter-names
; related:
https://web.archive.org/web/20180408193434/http://www.opensearch.org/Specifications/OpenSearch/Extensions/Parameter/1.0
[8]: https://linkeddatafragments.org/
<https://github.com/w3c/web-annotation/issues/48>


On 13/05/2020 22:37, Luca De Santis wrote:
> Dear all,
> I’m Luca De Santis of Net7, the company behind the Pundit annotation
> tool ( https://thepund.it ).
> We are currently working on some updates of our tool. Amongst them, we
> are planning to develop an endpoint that supports, in read-only mode,
> the Web Annotation Protocol (WAP). Currently Pundit is compliant to
> the Web Annotation Data Model (well, quite compliant…).
>
> Basically the APIs that we’d like to implement are:
> 1. the (filtered) retrieval of “a group” of annotations
> 2. the retrieval of a single annotation.
>
> No problem for point 2, which is pretty clear.
>
> Point 1, which corresponds in our use case to a “search for
> annotations”, is not completely clear to me. 
> In fact, while the concept of “Annotation Containers” is very handy, I
> haven't seen a WAP compliant mode to pass parameters to filter results. 
> Some examples of these parameters:
> - the URI of the target document
> - some conditions (e.g.: on author, date, etc).
>
> Is there any standardization of the possible parameters to pass to
> filter annotations in a container? 
> In particular we are planning to implement this
> method https://www.w3.org/TR/annotation-protocol/#representations-with-annotation-descriptions .
>
> Other tools/services like Hypothes.is
> <http://Hypothes.is> or Europeana seem to have implemented a specific
> search endpoint (e.g.
> https://hypothes.is/api/search?uri=https://www.repubblica.it ), but if
> there is a clean and WAP complaint way to implement this feature I'd
> stick with it.
>
> Any idea on that? Thanks in advance.
>
> Regards,
> Luca De Santis
>
> -- 
> ------------------------------------------------------------------------------------------------
> *Luca De Santis / *Chief Technology Officer
> desantis@netseven.it <mailto:desantis@netseven.it> 
> www.netseven.it <http://www.netseven.it/>
> +39 050 55 25 74   
> +39 335 7376 153
> skype: lucadex
>
>
> via G. Carducci 60 | 56017 Ghezzano (PI) - Italy 
>
> P.Iva e CF 01577590506
> CCIAA di Pisa n. 01577590506 del 26/04/2001
> Capitale Sociale 10.000,00 €
> ------------------------------------------------------------------------------------------------
>
>
>
>
>
>
>
>
>
>

Received on Monday, 18 May 2020 23:28:32 UTC