Shelley-Godwin Archive as a use case of OA and Shared Canvas

A week ago, MITH released digital facsimile editions of the draft and fair
copy notebooks in which Mary Shelley wrote Frankenstein. If you follow us
on twitter or are embedded in the digital humanities, you probably have
seen some postings about this already. I wanted to draw this list's
attention to some of the technical aspects for those who might be
interested since the project makes heavy use of Open Annotation and related
data models.

tl;dr: we augmented the Shared Canvas data model to allow visual
reconstruction of the original artifacts using a JavaScript rendering
application using HTML instead of SVG.

The website is at http://shelleygodwinarchive.org/

Our code is available on GitHub:
  Shared Canvas viewer, Drupal module, and search service:
https://github.com/umd-mith/sga
  Shared Canvas manifest generation: https://github.com/umd-mith/scalanvas


When we started the project, Open Annotation (and thus Shared Canvas) used
typical RDF serializations. For web applications, this meant RDF/XML or
RDF/JSON. To avoid requiring an RDF/XML library to parse the XML and
extract the triples, we chose to use RDF/JSON. This is what our viewer uses
now.

Given the graph nature of the data, we chose to use a library we've been
developing in-house (MITHgrid) that provides an in-browser triple store
from which we can extract the various annotations and components based on
their relationships.

We are in the process of updating our Shared Canvas manifest generation
process to make use of JSON-LD. We used this to create partial manifests
that drive the table of contents view (e.g., on
https://shelleygodwinarchive.org/contents/frankenstein) written using
backbone.js.

JSON-LD allows us to construct a manifest that works well with backbone.js
collections and models. We intend to rework our viewer to use backbone.js
instead of MITHgrid, though no timeline is set for this at the moment.


We chose to move to HTML from SVG at the last minute for a few reasons.

Because we don't have text that is amenable to OCR, we don't have bounding
boxes for words or lines. Instead, we augmented the Shared Canvas data
model to provide for text zones that do not use pixel-based coordinates
(though the text zone annotations target pixelish-based coordinates on the
canvas). To help with readability, we chose to use scrollbars to allow
viewing of all of the text in an area even if the text is too much for the
bounding box.

In our original code, we used SVG as the overall managing framework for
painting the canvas. This allowed us to provide a scalable display that
could grow/shrink as the browser window was resized. We used a
foreignObject to embed the text in an HTML body inside the SVG. It seems
strange, but it allowed us to have a consistent interface between objects
being painted onto the canvas (e.g., every code module handling an object
type could expect an SVG region into which it could draw whatever it needed
to draw).

Unfortunately, webkit has some bugs in its SVG engine. When a UI element is
visible in the HTML embedded in the SVG, the HTML will *not* resize the
content of the foreignObject.

This was the first strike against SVG and caused us to rewrite the handling
of the text-only visualization of the Shared Canvas manifest.


The second bug dealt with browser-based zooming of the webpage. Webkit,
again, has issues with embedding SVG in a webpage that has been zoomed in
or out. Only at actual size did the SVG and HTML work correctly together.
We were using a map-based JavaScript application to provide panning/zooming
of the image. When the browser was not showing the page at actual size, the
image was shifted lower right or upper left depending on the zoom.

With only a week left before launch, we rewrote the code to use HTML to
provide a panning/zooming interface for the image side of the facsimile
viewer. This also solved some (but not all) of the compatibility issues we
saw with Firefox.


Current work is focused on documenting our use of the Shared Canvas data
model and the changes we need to make to bring our use in line with the
current Open Annotation and Shared Canvas data models. We will publish this
documentation as part of the GitHub repo we release in a few weeks with all
of our TEI and manifests.

-- Jim

Received on Friday, 8 November 2013 14:57:53 UTC