Count of annotations by xpath? from Ronald Snyder on 2017-10-03 (public-openannotation@w3.org from October 2017)

From: Ronald Snyder <Ronald.Snyder@ithaka.org>
Date: Tue, 3 Oct 2017 12:14:05 +0000
To: "public-openannotation@w3.org" <public-openannotation@w3.org>
Message-ID: <996AAA06-6538-4081-9C62-1B1D896DB890@ithaka.org>

Greetings -

For an application that we’re developing we need to get a summary of annotations for a target that includes a count of all annotations grouped by xpath selector value (this would be analogous to a faceted search request/response).  It’s not clear if/how this is supported by the Web Annotation Protocol.  We are currently using the MangoServer implementation for an early prototype of the application.  Any suggestions, examples, pointers to documentation, etc on how this might be accomplished would be much appreciated.

For a little more background and context…

We (JSTOR Labs) developed a proof of concept application a couple years ago to explore the idea of connecting scholarship to primary texts (literary works, historic documents, etc) using quoted passages that were mined from journal articles and connected to the primary text using a fuzzy text matching algorithm.  Two public prototypes of the concept were produced, one for Shakespeare plays and another for the US Constitution.  The Shakespeare prototype can be seen here – https://labs.jstor.org/shakespeare.  The prototype for the US Constitution was developed as a mobile app and is described here – http://labs.jstor.org/constitution/.


As a proof of concept this has received a very positive reaction by the academic community and we are now embarking on a project to significantly expand the approach providing matches to many more texts and ideally do so in a manner that would enable other providers of scholarship (or anyone, for that matter) to connect non-JSTOR content to the same texts.  In this next generation version of the tools/infrastructure, we intend to base the implementation on the Web Annotation Data Model and Protocol and open source code to maximize interoperability and community involvement.  In this next version, the matched passages in the primary text and journal articles will be represented as a pair of annotations, one anchored in the primary text using an XPathSelector and another anchored in the journal article (often as a media fragment as these targets will typically be page scan images).

As can be seen in the Understanding Shakespeare site some of the texts (Hamlet, for instance) have thousands of matched quotes, the line “To be or not to be” alone has nearly 1000 quoting articles (http://labs.jstor.org/shakespeare/hamlet#line-3.1.64).  Given the volume of matches for many works it’s really not practical to grab all of the matches (annotations) for a work at one time.  Our approach has been to get a summary count that reflects the number of matched quotes/articles for a given chunk of text in a work (e.g., each act, scene, and/or line in a Shakespeare play) and then get matches for that chunk of text only after a user has expressed interest in it (clicking on a linked summary count, etc).

In an implementation of something like the Understanding Shakespeare site using the Web Annotation Protocol how would one request a count of annotations based on distinct XPath value?

Ron

--
Ron Snyder
Director of Research & Development, JSTOR Labs
301 E. Liberty St
Suite 300
Ann Arbor, MI 48104
Ron.Snyder@ithaka.org<mailto:Ron.Snyder@ithaka.org>
Twitter: @rdsnyderjr

Received on Tuesday, 3 October 2017 14:38:01 UTC