Re: A Web Annotation Protocol compliant "Search" API from Christian Chiarcos on 2020-05-18 (public-openannotation@w3.org from May 2020)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Tue, 19 May 2020 01:03:44 +0200
To: Gerben <gerben@treora.com>, "Luca De Santis" <desantis@netseven.it>
Cc: public-openannotation@w3.org, "chiarcos@informatik.uni-frankfurt.de" <chiarcos@informatik.uni-frankfurt.de>
Message-ID: <op.0kt40iy5br5td5@kitaba>
Dear Luca, dear all,

this is remotely related only, but in the LD4LT CG  
(https://www.w3.org/community/ld4lt), we're in the process of discussing  
an extension of Web Annotation for the requirements of language technology  
on the web, largely based on a harmonization between Web Annotation, the  
NLP Interchange Format and several ISO TC37 standards, and with use cases  
in language technology and DH. This will include a reconsideration of the  
WA API specifications (using WA and   
https://persistence.uni-leipzig.org/nlp2rdf/specification/api.html as  
starting points), and any input or feature requests would be welcome.

We're still in the process of requirement analysis, with an intermediate  
survey under   
https://github.com/ld4lt/linguistic-annotation/blob/master/survey/required-features.md.  
This survey did not tackle the API, yet, but so far focused on the  
vocabulary.

Best,
Christian

Am .05.2020, 19:43 Uhr, schrieb Luca De Santis <desantis@netseven.it>:

> Dear Gerben,
> thank you very much for your answer.
>
> What you are proposing is quite intriguing, albeit my use case is much  
> simpler.
>
> In particular in the Triple research project my company is involved in  
> (https://www.gotriple.eu/), we are integrating our Pundit annotation  
> tool.
> In Triple we are building a discovery platform for “content” (e.g.  
> articles) related to Social Science and Humanities. In Triple we would  
> like to show if an >article has been annotated with Pundit (and possibly  
> with other interoperable annotation tools).
>
> We need a very simple API on our Annotation Server that, given the  
> document URL, returns the annotations in it.We already have our own API  
> for this, as Hypothes.is has (see  
> https://hypothes.is/api/search?uri=<..>).
> I just wondered if there is a more Web Annotation Protocol-savvy way to  
> do that.
>
> For what I understand, also reading the first part of your email, the  
> answer is no, which IMHO is quite a pity.
> Since the Annotation Container is quite a handy concept, we were  
> thinking of implementing a new API for retrieving annotations based on (  
> https://>www.w3.org/TR/annotation-protocol/#representations-with-annotation-descriptions  
> ), adding a url parameter to filter only those belonging to that  
> >specific target.
> The goal was to try and being more compliant as possible to the Web  
> Annotation standard, but it seems that there isn’t a 100% savvy way of  
> >implementing our use case.
>
> Am I wrong? Any other idea?
>
> TIA!
>
> Sincerely,
> Luca
>
>
>
>
>> Il giorno 18 mag 2020, alle ore 16:28, Gerben <gerben@treora.com> ha  
>> scritto:
>>
>>
>> Hello Luca and all,
>>
>> Great that you bring this up; I have been intending to send a similar  
>> email to this list in the near future. Hereby!
>>
>> As for the Web Annotation specs (tl;dr: looks like the WG never got  
>> around to spec a search method)
>>
>> I have not been involved in the standardisation process, but my  
>> understanding has been that the Web Annotation Protocol was made to  
>> >>define how to interact with an “Annotation Container”, but not how to  
>> find (a URL for) a container[1]. I suppose that defining a search  
>> >>protocol could be boiled down to defining a URL template for  
>> containers, plus possibly defining a vocabulary to add search-specific  
>> >>information such as the relevancy of each result.
>>
>> Specifying search appears to have been discussed in 2015 in issue 48  
>> “Support for search”[2], with a resolution made in a call[3]:
>>>
>>> RESOLUTION: The WG will consider a separate document defining a  
>>> non-exclusive search interface to be published at >>>least as a Note  
>>> and potentially part of Protocol
>>
>> As far as I can see, this plan did not turn into anything; I asked  
>> Benjamin Young about this, perhaps he (or other WG members) will follow  
>> >>up about what happened to this plan.
>>
>> As for ways forward (tl;dr: should we extend OpenSearch or something?)
>>
>> Personally I have been planning to take another stab at creating  
>> interoperable annotation services (I poked a bit at this nearly six  
>> years >>ago[4]). I plan to start with making a simple browser extension  
>> that lets the user subscribe to multiple annotation sources to receive  
>> their >>annotations; much like an RSS/Atom feed aggregator. It would  
>> contain a discovery mechanism (probably via <link> tags) so that the  
>> user >>can discover annotation services by visiting their websites  
>> (again, much like with RSS/Atom). It would appear like a button to  
>> subscribe to >>a blog, but now you can take it with you and get its  
>> content in context wherever you go on the web (who is following whom  
>> then?). :)
>>
>> Different than with RSS/Atom, one would need the ability to search for  
>> a subset of relevant annotations, especially to get annotations  
>> >>targeting a given page. For this the most obvious prior art is  
>> OpenSearch[5]. It defines how a small XML file can be used to describe  
>> how >>to query a search engine. The file is discoverable through e.g. a  
>> <link rel="search"> tag, so that e.g. browsers can offer the user >>to  
>> use that search provider. The description would have an URL template to  
>> specify the endpoint to use, for example GitHub’s description  
>> >>document contains this line[6]:
>>>
>>> <Url type="text/html" method="get"  
>>> template="https://github.com/search?q=>>>{searchTerms}&ref=opensearch"/>
>>
>> OpenSearch is designed to be extended and allows arbitrary parameters  
>> using xml namespaces, so we could introduce new parameter >>types as  
>> needed. In particular we would need the ability to pass a target URL  
>> instead of (or besides) the {searchTerms} parameter, >>plus any desired  
>> filters for author, date, etcetera. The URI template allows using  
>> custom namespaces[7], so we could invent something like >>this (taking  
>> the liberty of assuming a new vocabulary at  
>> "http://www.w3.org/ns/wap#"):
>>>
>>> <Url
>>>  type="application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"
>>>  method="get"
>>>  xmlns:wap="http://www.w3.org/ns/wap#"
>>>  xmlns:oa="http://www.w3.org/ns/anno.jsonld#"
>>>  template="https://example.org/annotations?uri={wap:target?}&t=>>>{wap:createdAfter?}&by={oa:creator?}&q={searchTerms?}"
>>> />
>>
>> Some advantages of extending OpenSearch, that I can think of:
>> We’d be extending an ecosystem instead of reinventing the wheel.
>> Many aspects such as searching for text queries have already been  
>> defined, and will be understood by existing tools, which >>should make  
>> text search among annotations work out of the box with existing  
>> browsers or meta-search engines.
>> OpenSearch descriptors can specify any (and multiple) response formats,  
>> each with its own URL template; a search server could >>thus provide an  
>> endpoint to get the search results as an Annotation Container, and  
>> another endpoint to obtain results in an HTML >>page, or Atom, etc.
>> Autodiscovery of search services is part of the spec, so e.g. a website  
>> can include a <link rel="search" …> element to >>announce its  
>> annotation service.
>>
>> But also some possible disadvantages:
>> Although it might be the most popular standard for describing search  
>> endpoints, OpenSearch nowadays lacks a website or an >>organisation  
>> behind it, and appears mostly dormant since many years now. Trying to  
>> help blow life back into it seems possibly >>worthwhile but a big step.
>> Adopting an existing spec introduces more complexity than may be  
>> required. For example, descriptors are expressed in XML, thus >>any  
>> tool would have to be able to parse XML to use it.
>> It seems mainly designed for public, gratis search services. One may  
>> for example want a way to describe authentication methods to >>get a  
>> personal(ised) annotation feed. For mechanisms beyond just putting a  
>> secret code into the URL template, this may require >>another (ideally  
>> orthogonal) OpenSearch extension.
>> In many cases one may want to describe more capabilities (e.g. creating  
>> annotations) that may seem inappropriate to shoehorn into >>OpenSearch;  
>> and if one find/creates separate ‘annotation service descriptor’ spec  
>> for those purposes, it is tempting to just >>describe the search  
>> endpoints in there.
>>
>> I would be very open to other suggestions than extending OpenSearch; it  
>> just seemed the most fitting solution I found so far. But perhaps  
>> >>some approach that makes use of the linked data ecosystem, like  
>> Linked Data Fragments[8], would be more a natural fit. Does anyone have  
>> >>tips?
>>
>> Also it seems important to think about the bigger picture of annotation  
>> search services. While in a typical use case one may want to >>discover  
>> annotations from multiple sources on the web pages one visits, it seems  
>> undesirable to have to query each source with the URL >>of each page  
>> (again the question: who’s following whom?). To improve both on privacy  
>> and efficiency, I imagine one could use a trusted >>aggregation service  
>> that queries sources on behalf of the user, and which moreover might  
>> not run search queries but rather crawl (or >>subscribe to) the  
>> annotation services to get the content in bulk; somewhat like usual web  
>> search engines, except the user specifies which >>sources to crawl.  
>> While both the sources and the aggregator could in theory based on the  
>> same search protocol, such an architecture may >>be better of with  
>> extra protocol features both for the annotation sources (to support  
>> bulk annotation crawling/subscribing), and to the >>aggregators (e.g. a  
>> method to add a new source to subscribe to).
>>
>> Whichever the approach will be, I think it would be great to  
>> collaborate to make some sort of interoperable annotation ecosystem.  
>> >>Thoughts welcome!
>>
>> — Gerben
>>
>>
>>
>> [1]: Except one discovery mechanism, serving only to let a resource  
>> announce that “Annotations on the resource SHOULD be created >>within  
>> the referenced Container”:  
>> https://www.w3.org/TR/annotation-protocol/#discovery-of-annotation-containers
>> [2]: https://github.com/w3c/web-annotation/issues/48
>> [3]: https://www.w3.org/2015/12/16-annotation-minutes.html#item03
>> [4]: See https://web.hypothes.is/blog/supporting-open-annotation/
>> [5]:  
>> https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md  
>> ; https://en.wikipedia.org/wiki/OpenSearch ;  
>> https://>>web.archive.org/web/20180421215752/http://www.opensearch.org/Home
>> [6]: https://github.com/opensearch.xml
>> [7]:  
>> https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md#fully-qualified-parameter-names  
>> ; related:  
>> https://>>web.archive.org/web/20180408193434/http://www.opensearch.org/Specifications/OpenSearch/Extensions/Parameter/1.0
>> [8]: https://linkeddatafragments.org/
>>
>>
>> On 13/05/2020 22:37, Luca De Santis wrote:
>>> Dear all,I’m Luca De Santis of Net7, the company behind the Pundit  
>>> annotation tool ( https://thepund.it ).
>>> We are currently working on some updates of our tool. Amongst them, we  
>>> are planning to develop an endpoint that >>>supports, in read-only  
>>> mode, the Web Annotation Protocol (WAP). Currently Pundit is compliant  
>>> to the Web Annotation >>>Data Model (well, quite compliant…).
>>>
>>> Basically the APIs that we’d like to implement are:
>>> 1. the (filtered) retrieval of “a group” of annotations
>>> 2. the retrieval of a single annotation.
>>>
>>> No problem for point 2, which is pretty clear.
>>>
>>> Point 1, which corresponds in our use case to a “search for  
>>> annotations”, is not completely clear to me.In fact, while the concept  
>>> of “Annotation Containers” is very handy, I haven't seen a WAP  
>>> compliant mode to pass >>>parameters to filter results.Some examples  
>>> of these parameters:
>>> - the URI of the target document
>>> - some conditions (e.g.: on author, date, etc).
>>>
>>> Is there any standardization of the possible parameters to pass to  
>>> filter annotations in a container?In particular we are planning to  
>>> implement this method  
>>> https://www.w3.org/TR/annotation-protocol/#representations-with->>>annotation-descriptions  
>>> .
>>>
>>> Other tools/services like Hypothes.is or Europeana seem to have  
>>> implemented a specific search endpoint (e.g.  
>>> https://>>>hypothes.is/api/search?uri=https://www.repubblica.it ), but  
>>> if there is a clean and WAP complaint way to implement this >>>feature  
>>> I'd stick with it.
>>>
>>> Any idea on that? Thanks in advance.
>>>
>>> Regards,
>>> Luca De Santis
>>>
>>> -- 
>>> ------------------------------------------------------------------------------------------------
>>> Luca De Santis / Chief Technology Officer
>>> desantis@netseven.it www.netseven.it
>>> +39 050 55 25 74  +39 335 7376 153
>>> skype: lucadex
>>>
>>>>>> <logomail.png> >>>
>>>>>> via G. Carducci 60 | 56017 Ghezzano (PI) - Italy
>>>
>>> P.Iva e CF 01577590506
>>> CCIAA di Pisa n. 01577590506 del 26/04/2001
>>> Capitale Sociale 10.000,00 €
>>> ------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> -- 
> ------------------------------------------------------------------------------------------------
> Luca De Santis / Chief Technology Officer
> desantis@netseven.itwww.netseven.it
> +39 050 55 25 74  +39 335 7376 153
> skype: lucadex
>
>>>
>> via G. Carducci 60 | 56017 Ghezzano (PI) - Italy
>
> P.Iva e CF 01577590506
> CCIAA di Pisa n. 01577590506 del 26/04/2001
> Capitale Sociale 10.000,00 €
> ------------------------------------------------------------------------------------------------
Received on Monday, 18 May 2020 23:04:04 UTC