Web Annotation Discovery from Gerben on 2022-10-21 (public-openannotation@w3.org from October 2022)

From: Gerben <gerben@treora.com>
Date: Sat, 22 Oct 2022 00:34:49 +0200
To: public-openannotation@w3.org
Message-ID: <8a48aaa7-0dec-8e57-d8c2-9a251fec4e12@treora.com>

Hi all, for your interests: I published a proposal for a Web Annotation
Discovery mechanism
<https://code.treora.com/gerben/web-annotation-discovery>, along with a
first implementation as a browser extension
<https://code.treora.com/gerben/web-annotation-discovery-webextension>
(and a compatible server
<https://code.treora.com/gerben/web-annotation-discovery-server>).

The goal is that people can discover annotation sources and subscribe to
annotation ‘feeds’ while browsing the web, then view the annotations on
other pages they visit.

Here is a 3-minute introduction video
<https://archive.treora.com/2022/web-annotation-discovery/introduction-screencast.webm>
with a little demo. (To try it yourself: here are the Firefox addon
<https://addons.mozilla.org/en-US/firefox/addon/web-annotation-discovery/>
and example collection <https://cothink.org/gerben/random_notes/>.)

Below is a section of the proposal
<https://code.treora.com/gerben/web-annotation-discovery>:

Approach

To show annotations on a visited page, the web browser needs to
somehow obtain these annotations. Various previous annotation
projects depend on a single global service to index the annotations
by their target, which browsers would query for annotations
targeting a particular page. To avoid such centralisation and cater
for the diversity of use cases, the browser could instead query any
annotation services of the user’s choice.

However, querying services for annotations on visited pages has an
enormous impact on reader privacy: to find for annotations on pages
you read, you have to tell the service which pages you read.
Subscribing to multiple sources would reveal this information to
even more parties.

In many usage scenarios, the annotations a person is actually
interested in is limited and from a known source. Centralised
services (e.g. Hypothes.is <https://hypothes.is/>) can help discover
annotations from any other user, but are often used for annotating
in well-defined groups: in classrooms, among colleagues, etc.

In such cases, there is no need for a central global index, and
moreover the total set of annotations of interest could easily fit
on the user’s device. This would solve the reader privacy issue as
no querying is needed — the browser can simply look up if it has any
relevant annotations for any visited page (and can thereby be much
quicker too).

Also for somewhat larger-scale annotation consumption, the total
size may well remain managable. For example, if an investigative
journalist subscribes to a thousand colleagues each writing ten
annotations per day of 1KB each, this produces roughly 4GB in a year
— significant, but perhaps worth it for their work (at which privacy
may be more important than disk storage). This size could still be
reduced by an order of magnitude if, of each annotation, only its
own URL and the URL it targets are stored (with a tradeoff for
latency and privacy, see further below
<https://code.treora.com/gerben/web-annotation-discovery#user-content-compacted-storage>).

The current proposal omits any querying mechanism and adopts this
approach of a local ‘annotation library’. The mechanisms defined
below serve to populate this library: How to discover annotation
sources and import their current annotations, and subscribe to a
source/‘feed’ to obtain their future annotations. To this end, it
selects and combines existing parts of the Web Annotation
specifications.

Two discovery mechanisms are defined:

1. Annotations encountered directly, either served as a file or
embedded in a web page.
2. Annotation ‘feeds’: collections of annotations discovered via
links in web pages.

The essence of the ‘feeds’ is simple: exactly like RSS Autodiscovery, a
website can add a <link> pointing to an Annotation Collection/Container
<https://www.w3.org/TR/annotation-protocol/#annotation-containers>, with
the appropriate rel and type attributes, e.g.:

|<link rel="alternate"
type='application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"'
href="https://annotations.fredsfrets.example/all/" title="Fred’s
frets" />|

Annotations ‘encountered directly’ are even simpler: these are
annotations that the browser visits/opens directly (detected via their
Content-Type), or that are embedded in a page (as described in the
Embedding Web Annotations in HTML
<https://www.w3.org/TR/annotation-html/#embed-json-ld> note).

Initially, my intention was in fact to (also) specify a querying
mechanism, so the browser could ask subscribed sources for annotations
that specifically target the visited page. But that approach would
reveal each page one visits to every annotation source/service one is
‘subscribed’ to — a huge privacy problem, as described above and
mentioned before
<https://lists.w3.org/Archives/Public/public-openannotation/2021May/0004.html#:~:text=Note%20however%20that,pages%20it%20visits.>
on this mailing list. For many use cases, local storage seems sufficient
and preferable.

Curious to hear if anyone has thoughts on this proposal or might like to
try it out in practice.

— Gerben

Received on Friday, 21 October 2022 22:34:10 UTC