Re: Pundit and Open Annotation

On Aug 22, 2013, at 7:23 AM, Simone Fonda <fonda@netseven.it> wrote:

> -- XPointers
> Quoting section 2.1.4: “For example, fragments of HTML cannot be used
> to describe an arbitrary range of text.”, but then in tab 3.2.1, you
> cite xpointer as fragment specification for XML documents. Why it is
> stated that fragments of HTML cannot be used for arbitrary ranges of
> texts?


I suspect that we saw HTML as a fluid document and (non-HTML) XML (e.g., TEI or MEI) as a more stable document description. The practice I see on the web is that HTML is a page description language, not a semantic markup language, and thus subject to the whims of the web site designer. HTML5 should help with some of this, but I still think that, for example, a DocBook version of a document will be more stable than the HTML rendering of that same document.


> -- Text position selector and “DOM String Comparison”
> Could someone point us to an implementation of this normalisation
> routine? (removing tags, replacement of character entities, etc). How
> does this work with respect to spaces (double, triple, etc)?
> 
> All we’ve done so far is iterating over text nodes (DOM leaves) to get
> their .nodeValue, but we’re not sure (at all!) it returns the same
> result expected by the spec.


We we've done in our Shelley-Godwin Archive project work is to pick out all of the text nodes in order and ignore all other nodes. I think that's what we do when we call 'data.documentElement.textContent' on the XML file returned by a successful AJAX call in the browser. It seems to match what we do when creating our Shared Canvas manifest.


> -- SVG Selector
> What is the position of OA with respect to percentage-based values?
> There’s no statement or example about that, but we think that using %
> instead of absolute numbers is a great plus when dealing with images,
> as this approach allow the same shape to be easily used on any
> instance of the image, thumb / low res / hi res etc.


We've been using absolute pixel values when creating annotations of video, but we also record the extents of the play surface with exif:height and exif:width, which aren't part of the standard. My worry with percentages is getting sufficient precision for large images.


> -- Annotation items
> Bodies of our annotations are named graph with a (soon)
> dereferenceable URL. It contains triples that a user created
> explicitly, like:
> ## ex:AWebPage ex:cites dbpedia:KarlMarx
> ## ex:AwebPage ex:similarTo ex:AnotherWebPage
> 
> However, in order to meaningfully visualise the annotation additional
> info is needed. Think of an rdf:label, an dc:description or a
> foaf:depiction for the dbpedia:KarlMark or for the ex:AWebPage. Such
> information was not created by the annotator (the user), but rather
> copied from dbpedia or automatically extracted from the <title> tag of
> a web page, and stored (cached?) in our system.
> 
> We obviously want to distinguish between the triples created by the
> user and such additional triples that say something more about the
> "items" used in the annotation (dbpedia:KarlMarx). Consider that the
> very same item can be used in more than one annotation by different
> annotators and we want to be able to delete/modify it without
> affecting the other annotations that are using it, and possibly
> without complex or time consuming algorithms to check already used
> items, etc... Plus: we want to be open to the possibility that a
> certain item has different rdf:labels (for example) in different
> annotations.
> 
> What we do now is maintaining an additional named graph for each
> annotation (called the "Items Graph") and connect it to the annotation
> explicitly:
> ## ex:MyAnnotation ex:items ex:MyItemsGraph.
> 
> This way we can:
> - have a very fast access to common information needed to display a
> meaningful annotation
> - safely delete/modify/copy/move each annotation independently
> - display correctly and consistently the annotation over time (despite
> unavailability of the original sources, data changes etc)
> 
> So the question is: does someone have already faced a similar problem?
> Is there a recommended solution? Or is this something outside the
> scope of the OA specs?
> 


The closest thing I can think that we've looked at is the nanopublication model (http://www.nanopub.org//guidelines/1.8/) as a guide to how we might capture similar information as OA. In this case, we might have the Assertion, Supporting, and Attribution boxes (in the figure in section 1.3 of the guidelines) be named graphs with the Assertion being the primary annotation. We would create secondary annotations connecting the Supporting and Attribution information to the primary annotation. This would allow reuse of any of the components as well as provenance tracking for each (for example, Alice claims that Supporting1 is why the Assertion is true while Bob claims that Supporting2 is the reason, neither of whom made the Assertion).


> -- Notebooks
> In Pundit a notebook is a collection of annotations that a user has
> built following some personal criteria. In the RDF world there are a
> number of solutions to represent aggregations. Currently we simply use
> an ad-hoc property attached to every annotation:
> ## ex:MyAnnotation pundit:isIncludedIn ex:MyNotebook
> and
> ## ex:MyNotebook pundit:includes ex:MyAnnotation
> 
> Is there a recommendation about collections of annotations? We'd
> really like to keep it as simple as possible.


My only experience with collections of annotations is from the Shared Canvas data model (http://www.shared-canvas.org/datamodel/spec/#AnnotationList):

<> a ore:Aggregation ;
  ore:aggregates <Anno1>, <Anno2> .
 
If you want the annotations to be in a particular order:

<> a ore:Aggregation, rdf:List ;
  ore:aggregates <Anno1>, <Anno2> ;
  rdf:first <Anno1> ;
  rdf:rest ( <Anno2> ) .

I've removed the Shared Canvas-specific triples.


> Thanks for any comment, and sorry again for the length of the mail.
> 
> Best,
> Simone


Not a problem. Hope this helps.

-- Jim

Received on Thursday, 22 August 2013 13:33:06 UTC