Re: Converting annotations stored in the PDF into open annotation standard

On Tue, Sep 2, 2014 at 8:23 PM, Mitar <mmitar@gmail.com> wrote:

> Hi!
>
> (As suggested, reposting here from hypothes.is dev mailing list:
> http://list.hypothes.is/archive/dev/2014-08/0000013.html, and one
> existing reply:
> http://list.hypothes.is/archive/dev/2014-09/0000002.html)
>
> Does anyone know of any existing code/project which would help
> converting annotations which are already in the PDF (like those made
> by Preview) into open annotation standard?
>
> So PDF.js allows to extract annotations, but the annotations are not
> really anchored on the PDF, they just have an icon. On the other side,
> highlights are not part of extracted annotations, but it seems they
> are really added as a colored box to the PDF content. So how would one
> access them and convert them to open annotation standard?
>

Great question, Mitar!

>From what little I know, the PDF annotation are really just another layer
of PDF content. They're stored as "Annot" dictionaries on the actual Page.
Outside of positioning on the page, not much in those objects (from a quick
read) seem to give one the ability to re-anchor those annotations on
another representation...sadly.

PDF.js uses (at least in the JS and output HTML) some fragment identifiers
that reference part of the thing linked to:
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf#G13.2015387
and that identifier pulled up the same content in both PDF.js and the
built-in Chrome viewer. Those seem to be what the PDF spec calls "named
destinations" which essentially map to an "explicit destination" in a
syntax that looks something like: `[page /XYZ left top zoom]`

Lastly, here's the Annotations section from the PDF Spec 6th Edition (1.7):
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf

This paragraph in particular holds promise:

> The behavior of each annotation type is implemented by a software module
> called an annotation handler. Handlers for the standard annotation types
> are built directly into the PDF viewer application; handlers for additional
> types can be supplied as plug-in extensions.


I've no idea how far PDF.js has gotten on that front, but perhaps that's
were this could start for supporting "native" OA annotations and converting
existing one's into that format.

Exciting times ahead. :)

Also, forgive me if that was all repetition for you, Mitar. I'm wrote it
out as I researched it as much for myself (and other PDF initiates) and for
posterity. ;)

Thanks again for starting this thread!


> I opened a slightly related issue on PDF.js tracker:
>
> https://github.com/mozilla/pdf.js/issues/5252
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
>

Received on Wednesday, 3 September 2014 20:45:36 UTC