W3C home > Mailing lists > Public > public-openannotation@w3.org > October 2018

Re: Web annotations for physical texts

From: Benjamin Young <byoung@bigbluehat.com>
Date: Mon, 15 Oct 2018 20:33:02 +0000
To: Christopher Blackwell <cwblackwell@gmail.com>, "public-openannotation@w3.org" <public-openannotation@w3.org>
CC: Steven Harms <sgharms@stevengharms.com>
Message-ID: <BN6PR06MB27700364053AEB3FCD25F28CB2FD0@BN6PR06MB2770.namprd06.prod.outlook.com>
It's likely best--given the vast array of options--that one store as many matching target expressions as one is able to generate at the time the annotation is recorded (or perhaps later with machines).

As in...
  "target": [
      "source": "urn:isbn:...",
      "selector": {
        ...some nifty new selector for physical dimensions, pages, etc...

There's also EPUB CFI's of course...and likely many more we've missed... >_>

As this exploration goes along, if anyone wants to write these findings up on the wiki, that'd be super amazing:





From: Christopher Blackwell <cwblackwell@gmail.com>
Sent: Saturday, October 13, 2018 5:39 PM
To: public-openannotation@w3.org
Cc: Steven Harms
Subject: Re: Web annotations for physical texts

Hi Steven,

Some thoughts on your questions…

CTS URNs are for machine-actionable identification and retrieval of passages of text, so their job really is different from that of a human-readable label. In our projects we use the plain text CEX format ( https://cite-architecture.github.io/citedx/CEX-spec-3.0.1/ ) for capturing data and loading it into services, and it is at that level that we can attach human-readable labels to works and editions.

Here’s a link that will (after a short delay, the server seems a little slow today) deliver a passage of text, with a label attached (and linked commentaries and some other stuff):


As for citing a page of a book… CTS really is about _texts_ rather than _books_. A CTS-URN captures the semantics of a “text” defined as “an ordered hierarchy of citation objects”.

For our texts, at least, pages in a physical edition constitute a structure orthogonal to the citation-hieararchy of a work.

So I don’t think there is a low-friction way to bend CTS away from canonically citable (= citations independent of any particular expression of a text) texts to texts citable only by pages in a particular printed edition.

We associate CTS texts with “pages”, but it involves quite a bit of integration. This might be way more than you want to get into, but to give an example…


The above is a URL that will display an object in an ordered collection of manuscript folios; "urn:cite2:hmt:msA.v1:12r” identifies folio 12-recto of a physical manuscript.

And this is a record that identifies a graph of (a) a passage of text (CtsUrn), (b) a physical folio (Cite2Urn), and (c) a digital image mapping the passage on the folio:


Chris B.

Christopher W. Blackwell
The Louis G. Forgione University Professor
Department of Classics
Furman University

On Oct 13, 2018, at 4:14 AM, Steven Harms <sgharms@stevengharms.com> wrote:

Given two endorsements for CTS in short order, I read the description and it seemed intuitive and to cover the required specificity easily. As such:


Would become



1. Intuitive!


1. With ISBN we lose the human friendliness of say, “JK Rowling wrote HP&Philospher’s stone.” This can be remedied, of course, by a higher container holding human-friendly data, but it seems like an obvious nit to address. MLA and other citation schemes preserve this visibly in the citation.


1. How to handle <PASSAGE> in a book?

Pasting the full text seems onerous. To annotate passage p, I don’t want to have to type in passage p *and* my annotation. This would also set one afoul of copyright holders.

Further, range offsets, while completely reasonable are not given generally outside of epic poetry or other classics.

Certainly many e-readers make this calculation possible and that will surely be the correct scheme for annotations from that medium. However, my focus remains real books ;)

The most common scheme for a popular book would be the page. The docs state, failing an offset:

> A reference to an individual passage is formatted as dot-separated components representing one or more levels of the citation hierarchy defined in a CTS TextInventory for that work.

Now for most popular works, there is no CTS TextInventory — to the best of my knowledge.

So: is there a low-friction way to refer to a page?

Thanks for the suggestions to now,


(Typos and blunders my own as i’m On vacation without access to a keyboard ;))

On Thu, Oct 11, 2018 at 3:54 AM Christopher Blackwell <cwblackwell@gmail.com<mailto:cwblackwell@gmail.com>> wrote:
Dear Steven,

The CTS URN might be helpful:


Part of the CITE Architecture: http://cite-architecture.github.io<http://cite-architecture.github.io/>

(Disclosure: This is a thing I’ve worked on over the years.)

This blog post points to some live examples of real data integrated with CTS URNs:


If this looks at all interesting, please don’t hesitate to send along further questions.

Chris B.

Christopher W. Blackwell
The Louis G. Forgione University Professor
Department of Classics
Furman University

On Oct 10, 2018, at 1:57 PM, Steven Harms <sgharms@stevengharms.com<mailto:sgharms@stevengharms.com>> wrote:


I am interested in creating annotations on physical books [1<https://stevengharms.com/research/semweb-topic/problem_statement/>].

As the name "web annotations" suggests, the default target of the Web Annotation Working Group would be, of course, to annotation IRI-referable targets with IRI-identifiable Annotations.

1. Is there a model whereby we could point to a physical resource in a URI / IRI format (and thus join the existing Web Annotation universe, *or*
2. Is there a framework that might support referring to physical books that I've simply not found
3. Or should I plan to use JSON-LD to create "forge my own path?"

I hope to post an example of what #3 might look like, but I'd like to double check my understanding before engaging in in such an effort, tabula rasa.



[1]: https://stevengharms.com/research/semweb-topic/problem_statement/

Steven G. Harms
PGP: E6052DAF<https://pgp.mit.edu/pks/lookup?op=get&search=0x337AF45BE6052DAF>

Steven G. Harms
PGP: E6052DAF<https://pgp.mit.edu/pks/lookup?op=get&search=0x337AF45BE6052DAF>
Received on Monday, 15 October 2018 20:33:30 UTC

This archive was generated by hypermail 2.3.1 : Monday, 15 October 2018 20:33:31 UTC