Web Annotation Discovery

Hi all, for your interests: I published a proposal for a Web Annotation 
Discovery mechanism 
<https://code.treora.com/gerben/web-annotation-discovery>, along with a 
first implementation as a browser extension 
<https://code.treora.com/gerben/web-annotation-discovery-webextension> 
(and a compatible server 
<https://code.treora.com/gerben/web-annotation-discovery-server>).

The goal is that people can discover annotation sources and subscribe to 
annotation ‘feeds’ while browsing the web, then view the annotations on 
other pages they visit.

Here is a 3-minute introduction video 
<https://archive.treora.com/2022/web-annotation-discovery/introduction-screencast.webm> 
with a little demo. (To try it yourself: here are the Firefox addon 
<https://addons.mozilla.org/en-US/firefox/addon/web-annotation-discovery/> 
and example collection <https://cothink.org/gerben/random_notes/>.)

Below is a section of the proposal 
<https://code.treora.com/gerben/web-annotation-discovery>:


          Approach

    To show annotations on a visited page, the web browser needs to
    somehow obtain these annotations. Various previous annotation
    projects depend on a single global service to index the annotations
    by their target, which browsers would query for annotations
    targeting a particular page. To avoid such centralisation and cater
    for the diversity of use cases, the browser could instead query any
    annotation services of the user’s choice.

    However, querying services for annotations on visited pages has an
    enormous impact on reader privacy: to find for annotations on pages
    you read, you have to tell the service which pages you read.
    Subscribing to multiple sources would reveal this information to
    even more parties.

    In many usage scenarios, the annotations a person is actually
    interested in is limited and from a known source. Centralised
    services (e.g. Hypothes.is <https://hypothes.is/>) can help discover
    annotations from any other user, but are often used for annotating
    in well-defined groups: in classrooms, among colleagues, etc.

    In such cases, there is no need for a central global index, and
    moreover the total set of annotations of interest could easily fit
    on the user’s device. This would solve the reader privacy issue as
    no querying is needed — the browser can simply look up if it has any
    relevant annotations for any visited page (and can thereby be much
    quicker too).

    Also for somewhat larger-scale annotation consumption, the total
    size may well remain managable. For example, if an investigative
    journalist subscribes to a thousand colleagues each writing ten
    annotations per day of 1KB each, this produces roughly 4GB in a year
    — significant, but perhaps worth it for their work (at which privacy
    may be more important than disk storage). This size could still be
    reduced by an order of magnitude if, of each annotation, only its
    own URL and the URL it targets are stored (with a tradeoff for
    latency and privacy, see further below
    <https://code.treora.com/gerben/web-annotation-discovery#user-content-compacted-storage>).

    The current proposal omits any querying mechanism and adopts this
    approach of a local ‘annotation library’. The mechanisms defined
    below serve to populate this library: How to discover annotation
    sources and import their current annotations, and subscribe to a
    source/‘feed’ to obtain their future annotations. To this end, it
    selects and combines existing parts of the Web Annotation
    specifications.

    Two discovery mechanisms are defined:

     1. Annotations encountered directly, either served as a file or
        embedded in a web page.
     2. Annotation ‘feeds’: collections of annotations discovered via
        links in web pages.

The essence of the ‘feeds’ is simple: exactly like RSS Autodiscovery, a 
website can add a <link> pointing to an Annotation Collection/Container 
<https://www.w3.org/TR/annotation-protocol/#annotation-containers>, with 
the appropriate rel and type attributes, e.g.:

    |<link rel="alternate"
    type='application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"'
    href="https://annotations.fredsfrets.example/all/" title="Fred’s
    frets" />|

Annotations ‘encountered directly’ are even simpler: these are 
annotations that the browser visits/opens directly (detected via their 
Content-Type), or that are embedded in a page (as described in the 
Embedding Web Annotations in HTML 
<https://www.w3.org/TR/annotation-html/#embed-json-ld> note).

Initially, my intention was in fact to (also) specify a querying 
mechanism, so the browser could ask subscribed sources for annotations 
that specifically target the visited page. But that approach would 
reveal each page one visits to every annotation source/service one is 
‘subscribed’ to — a huge privacy problem, as described above and 
mentioned before 
<https://lists.w3.org/Archives/Public/public-openannotation/2021May/0004.html#:~:text=Note%20however%20that,pages%20it%20visits.> 
on this mailing list. For many use cases, local storage seems sufficient 
and preferable.

Curious to hear if anyone has thoughts on this proposal or might like to 
try it out in practice.

— Gerben

Received on Friday, 21 October 2022 22:34:10 UTC