Re: Pundit and Open Annotation

Thank you Robert and James for the useful replies!
I'm putting few comments inline.


On Thu, Aug 22, 2013 at 5:09 PM, Robert Sanderson <azaroth42@gmail.com>wrote:

> Hi Simone,
>
> Comments inline...
>
> On Thu, Aug 22, 2013 at 7:32 AM, James Smith <jgsmith@gmail.com> wrote:
> > On Aug 22, 2013, at 7:23 AM, Simone Fonda <fonda@netseven.it> wrote:
> >
> >> -- XPointers
> >> Quoting section 2.1.4: “For example, fragments of HTML cannot be used
> >> to describe an arbitrary range of text.”, but then in tab 3.2.1, you
> >> cite xpointer as fragment specification for XML documents. Why it is
> >> stated that fragments of HTML cannot be used for arbitrary ranges of
> >> texts?
> >
> > I suspect that we saw HTML as a fluid document and (non-HTML) XML (e.g.,
> TEI or MEI) as a more stable document description. The practice I see on
> the web is that HTML is a page description language, not a semantic markup
> language, and thus subject to the whims of the web site designer. HTML5
> should help with some of this, but I still think that, for example, a
> DocBook version of a document will be more stable than the HTML rendering
> of that same document.
>
> It's actually easier than that... the specifications of fragments for
> HTML and XHTML don't allow XPointers.
>
> Quoting http://tools.ietf.org/rfc/rfc3236 and
> http://tools.ietf.org/html/rfc2854:
>
> For documents labeled as text/html, the fragment identifier designates
> the correspondingly named element; any element may be named with the
> "id" attribute, and A, APPLET, FRAME, IFRAME, IMG and MAP elements may
> be named with a "name" attribute.
>
> Until [XMLMIME] gets updated, fragment identifiers for XHTML documents
> designate the element with the corresponding ID attribute value (see
> [XML] section 3.3.1); any XHTML element with the "id" attribute.
>
> As HTML5 is not XHTML based, it seems unlikely that they'll change the
> fragment RFC.
>


I see...so does it mean that if we in Pundit want a Xpointer based selector
it shouldn't be a subclass of FragmentSelector? I think the RDF structure
of the Fragment Selector fits well, but we could do the same just by not
subclassing it...if it is more "formally" correct....



>
>
> >> -- Text position selector and “DOM String Comparison”
> >> Could someone point us to an implementation of this normalisation
> >> routine? (removing tags, replacement of character entities, etc). How
> >> does this work with respect to spaces (double, triple, etc)?
> >>
> >> All we’ve done so far is iterating over text nodes (DOM leaves) to get
> >> their .nodeValue, but we’re not sure (at all!) it returns the same
> >> result expected by the spec.
> >
> >
> > We we've done in our Shelley-Godwin Archive project work is to pick out
> all of the text nodes in order and ignore all other nodes. I think that's
> what we do when we call 'data.documentElement.textContent' on the XML file
> returned by a successful AJAX call in the browser. It seems to match what
> we do when creating our Shared Canvas manifest.
>
> Yep.  It's just the normalization that's specified by DOM
> implementations, so browsers should do it correctly for you.
>
>
> >> -- SVG Selector
> >> What is the position of OA with respect to percentage-based values?
> >> There’s no statement or example about that, but we think that using %
> >> instead of absolute numbers is a great plus when dealing with images,
> >> as this approach allow the same shape to be easily used on any
> >> instance of the image, thumb / low res / hi res etc.
> >
> >
> > We've been using absolute pixel values when creating annotations of
> video, but we also record the extents of the play surface with exif:height
> and exif:width, which aren't part of the standard. My worry with
> percentages is getting sufficient precision for large images.
>
> The "position of Open Annotation" is neutral regarding percentages vs
> absolute values.
>
> My personal preference is the same as Jim to use absolute values in
> the normal case:
> * Without going into floating point percentages, they likely won't be
> sufficiently accurate
> * It may be difficult to layout the percentage if the client cannot
> determine the height and width of the image.
>
> However I agree that reuse on different, equivalent images is much
> easier with percentages if those equivalencies are known. If you allow
> for accurate percentages, can do the math for the layout, and know the
> equivalencies, then percentages seem the right way to go.
>

I'm not an SVG expert, but it seems to me there is at least no simple way
in SVG to express percentages. I'm I wrong?
In this case I guess Simone is likely to maintain the JSON based selector
(our brand) that we are using at the moment. May be we could propose it as
an extension to OA?


>
>
> >> -- Annotation items
> >> Bodies of our annotations are named graph with a (soon)
> >> dereferenceable URL. [...]
> >> However, in order to meaningfully visualise the annotation additional
> >> info is needed. Think of an rdf:label, an dc:description or a
> >> foaf:depiction for the dbpedia:KarlMark or for the ex:AWebPage. Such
> >> information was not created by the annotator (the user), but rather
> >> copied from dbpedia or automatically extracted from the <title> tag of
> >> a web page, and stored (cached?) in our system.
> [...]
> >> So the question is: does someone have already faced a similar problem?
> >> Is there a recommended solution? Or is this something outside the
> >> scope of the OA specs?
> >
> >
> > The closest thing I can think that we've looked at is the
> nanopublication model (http://www.nanopub.org//guidelines/1.8/) as a
> guide to how we might capture similar information as OA. In this case, we
> might have the Assertion, Supporting, and Attribution boxes (in the figure
> in section 1.3 of the guidelines) be named graphs with the Assertion being
> the primary annotation. We would create secondary annotations connecting
> the Supporting and Attribution information to the primary annotation. This
> would allow reuse of any of the components as well as provenance tracking
> for each (for example, Alice claims that Supporting1 is why the Assertion
> is true while Bob claims that Supporting2 is the reason, neither of whom
> made the Assertion).
>
> That seems out of the scope of Open Annotation, as I understand the
> use case.  In theory all of the information is discoverable by the
> follow-your-nose principle of linked data, it's just that that is not
> efficient enough for a good user experience, and hence supplying it
> along with the annotation is desirable. Additional information can be
> added into the annotation graph ... or in a linked named graph as you
> do if you need to maintain the provenance of the information.  So in
> my view you're doing it right already, and maybe there's a best
> practice to be written up, but it's not really part of an annotation
> specification to solve network efficiency of linked data.
>

Good, I agree with this view point and that it is out of scope in OA. thanks


>
>
> >> -- Notebooks
> >> In Pundit a notebook is a collection of annotations that a user has
> >> built following some personal criteria. In the RDF world there are a
> >> number of solutions to represent aggregations. Currently we simply use
> >> an ad-hoc property attached to every annotation:
> >> Is there a recommendation about collections of annotations? We'd
> >> really like to keep it as simple as possible.
> >
> > My only experience with collections of annotations is from the Shared
> Canvas data model (
> http://www.shared-canvas.org/datamodel/spec/#AnnotationList):
> >
> > <> a ore:Aggregation ;
> >   ore:aggregates <Anno1>, <Anno2> .
> >
> > If you want the annotations to be in a particular order:
> >
> > <> a ore:Aggregation, rdf:List ;
> >   ore:aggregates <Anno1>, <Anno2> ;
> >   rdf:first <Anno1> ;
> >   rdf:rest ( <Anno2> ) .
> >
> > I've removed the Shared Canvas-specific triples.
>
> The question of sets of annotations has come up in the past, but we
> haven't felt it appropriate to try and specify anything. The issues
> are:
> * Requirements -- we don't have a good feel for all of the
> requirements for collections of annotations. That could be solved if
> people are willing to work on it, of course :)
> * Order in RDF -- rdf:List and rdf:Seq are ugly monstrosities that are
> opaque to SPARQL, plus see the JSON-LD discussion regarding
> serialization.  Order may or may not be necessary, and would change
> the representation significantly.
> * Metadata about the collection is likely very use case specific.
>
> Not to derail, but for the next version of Shared Canvas, we're going
> to drop the ORE aggregations as unnecessary overkill and just use
> rdf:List in a pattern friendly to JSON-LD rather than a multiclass
> resource.



>
> And thanks for the typos! As they don't change anything substantial,
> I'll fix them without incrementing the spec version.
> (The advantage of still being a Community Draft!)
>
> Rob
>
>

Thank you again,

best,

Christian

Received on Friday, 23 August 2013 14:06:01 UTC