Re: Pundit and Open Annotation

Hi Simone,

Comments inline...

On Thu, Aug 22, 2013 at 7:32 AM, James Smith <jgsmith@gmail.com> wrote:
> On Aug 22, 2013, at 7:23 AM, Simone Fonda <fonda@netseven.it> wrote:
>
>> -- XPointers
>> Quoting section 2.1.4: “For example, fragments of HTML cannot be used
>> to describe an arbitrary range of text.”, but then in tab 3.2.1, you
>> cite xpointer as fragment specification for XML documents. Why it is
>> stated that fragments of HTML cannot be used for arbitrary ranges of
>> texts?
>
> I suspect that we saw HTML as a fluid document and (non-HTML) XML (e.g., TEI or MEI) as a more stable document description. The practice I see on the web is that HTML is a page description language, not a semantic markup language, and thus subject to the whims of the web site designer. HTML5 should help with some of this, but I still think that, for example, a DocBook version of a document will be more stable than the HTML rendering of that same document.

It's actually easier than that... the specifications of fragments for
HTML and XHTML don't allow XPointers.

Quoting http://tools.ietf.org/rfc/rfc3236 and
http://tools.ietf.org/html/rfc2854:

For documents labeled as text/html, the fragment identifier designates
the correspondingly named element; any element may be named with the
"id" attribute, and A, APPLET, FRAME, IFRAME, IMG and MAP elements may
be named with a "name" attribute.

Until [XMLMIME] gets updated, fragment identifiers for XHTML documents
designate the element with the corresponding ID attribute value (see
[XML] section 3.3.1); any XHTML element with the "id" attribute.

As HTML5 is not XHTML based, it seems unlikely that they'll change the
fragment RFC.


>> -- Text position selector and “DOM String Comparison”
>> Could someone point us to an implementation of this normalisation
>> routine? (removing tags, replacement of character entities, etc). How
>> does this work with respect to spaces (double, triple, etc)?
>>
>> All we’ve done so far is iterating over text nodes (DOM leaves) to get
>> their .nodeValue, but we’re not sure (at all!) it returns the same
>> result expected by the spec.
>
>
> We we've done in our Shelley-Godwin Archive project work is to pick out all of the text nodes in order and ignore all other nodes. I think that's what we do when we call 'data.documentElement.textContent' on the XML file returned by a successful AJAX call in the browser. It seems to match what we do when creating our Shared Canvas manifest.

Yep.  It's just the normalization that's specified by DOM
implementations, so browsers should do it correctly for you.


>> -- SVG Selector
>> What is the position of OA with respect to percentage-based values?
>> There’s no statement or example about that, but we think that using %
>> instead of absolute numbers is a great plus when dealing with images,
>> as this approach allow the same shape to be easily used on any
>> instance of the image, thumb / low res / hi res etc.
>
>
> We've been using absolute pixel values when creating annotations of video, but we also record the extents of the play surface with exif:height and exif:width, which aren't part of the standard. My worry with percentages is getting sufficient precision for large images.

The "position of Open Annotation" is neutral regarding percentages vs
absolute values.

My personal preference is the same as Jim to use absolute values in
the normal case:
* Without going into floating point percentages, they likely won't be
sufficiently accurate
* It may be difficult to layout the percentage if the client cannot
determine the height and width of the image.

However I agree that reuse on different, equivalent images is much
easier with percentages if those equivalencies are known. If you allow
for accurate percentages, can do the math for the layout, and know the
equivalencies, then percentages seem the right way to go.


>> -- Annotation items
>> Bodies of our annotations are named graph with a (soon)
>> dereferenceable URL. [...]
>> However, in order to meaningfully visualise the annotation additional
>> info is needed. Think of an rdf:label, an dc:description or a
>> foaf:depiction for the dbpedia:KarlMark or for the ex:AWebPage. Such
>> information was not created by the annotator (the user), but rather
>> copied from dbpedia or automatically extracted from the <title> tag of
>> a web page, and stored (cached?) in our system.
[...]
>> So the question is: does someone have already faced a similar problem?
>> Is there a recommended solution? Or is this something outside the
>> scope of the OA specs?
>
>
> The closest thing I can think that we've looked at is the nanopublication model (http://www.nanopub.org//guidelines/1.8/) as a guide to how we might capture similar information as OA. In this case, we might have the Assertion, Supporting, and Attribution boxes (in the figure in section 1.3 of the guidelines) be named graphs with the Assertion being the primary annotation. We would create secondary annotations connecting the Supporting and Attribution information to the primary annotation. This would allow reuse of any of the components as well as provenance tracking for each (for example, Alice claims that Supporting1 is why the Assertion is true while Bob claims that Supporting2 is the reason, neither of whom made the Assertion).

That seems out of the scope of Open Annotation, as I understand the
use case.  In theory all of the information is discoverable by the
follow-your-nose principle of linked data, it's just that that is not
efficient enough for a good user experience, and hence supplying it
along with the annotation is desirable. Additional information can be
added into the annotation graph ... or in a linked named graph as you
do if you need to maintain the provenance of the information.  So in
my view you're doing it right already, and maybe there's a best
practice to be written up, but it's not really part of an annotation
specification to solve network efficiency of linked data.


>> -- Notebooks
>> In Pundit a notebook is a collection of annotations that a user has
>> built following some personal criteria. In the RDF world there are a
>> number of solutions to represent aggregations. Currently we simply use
>> an ad-hoc property attached to every annotation:
>> Is there a recommendation about collections of annotations? We'd
>> really like to keep it as simple as possible.
>
> My only experience with collections of annotations is from the Shared Canvas data model (http://www.shared-canvas.org/datamodel/spec/#AnnotationList):
>
> <> a ore:Aggregation ;
>   ore:aggregates <Anno1>, <Anno2> .
>
> If you want the annotations to be in a particular order:
>
> <> a ore:Aggregation, rdf:List ;
>   ore:aggregates <Anno1>, <Anno2> ;
>   rdf:first <Anno1> ;
>   rdf:rest ( <Anno2> ) .
>
> I've removed the Shared Canvas-specific triples.

The question of sets of annotations has come up in the past, but we
haven't felt it appropriate to try and specify anything. The issues
are:
* Requirements -- we don't have a good feel for all of the
requirements for collections of annotations. That could be solved if
people are willing to work on it, of course :)
* Order in RDF -- rdf:List and rdf:Seq are ugly monstrosities that are
opaque to SPARQL, plus see the JSON-LD discussion regarding
serialization.  Order may or may not be necessary, and would change
the representation significantly.
* Metadata about the collection is likely very use case specific.

Not to derail, but for the next version of Shared Canvas, we're going
to drop the ORE aggregations as unnecessary overkill and just use
rdf:List in a pattern friendly to JSON-LD rather than a multiclass
resource.

And thanks for the typos! As they don't change anything substantial,
I'll fix them without incrementing the spec version.
(The advantage of still being a Community Draft!)

Rob

Received on Thursday, 22 August 2013 15:18:11 UTC