Pundit and Open Annotation from Simone Fonda on 2013-08-22 (public-openannotation@w3.org from August 2013)

From: Simone Fonda <fonda@netseven.it>
Date: Thu, 22 Aug 2013 13:23:05 +0200
To: public-openannotation@w3.org
Message-ID: <CAJLkhv7LOyj4FE0_X0tSwe+3=si==Kd120fHQBpdsYAJ40YOEQ@mail.gmail.com>
Hello everyone,
after a long lurking period, let me introduce myself with an huge wall of text!

Sorry for that, we are re-designing the RDF model for Pundit
(http://thepund.it) and we would like to reach (finally!) 100%
compatibility with the latest OA spec.

We've gone carefully through it, and some questions popped up as well
as some very minor typos. I'm sorry if some questions have already
been discussed and answered, feel free to point us to some archived
discussion we missed or similar.


To give you a little of background: Pundit is browser-based annotation
tool focused on web resources, mainly HTML pages and images. Our
annotations bodies are always expressed as explicit semantic
statements "subject - predicate - object".


-- XPointers
Quoting section 2.1.4: “For example, fragments of HTML cannot be used
to describe an arbitrary range of text.”, but then in tab 3.2.1, you
cite xpointer as fragment specification for XML documents. Why it is
stated that fragments of HTML cannot be used for arbitrary ranges of
texts?


-- Text position selector and “DOM String Comparison”
Could someone point us to an implementation of this normalisation
routine? (removing tags, replacement of character entities, etc). How
does this work with respect to spaces (double, triple, etc)?

All we’ve done so far is iterating over text nodes (DOM leaves) to get
their .nodeValue, but we’re not sure (at all!) it returns the same
result expected by the spec.


-- SVG Selector
What is the position of OA with respect to percentage-based values?
There’s no statement or example about that, but we think that using %
instead of absolute numbers is a great plus when dealing with images,
as this approach allow the same shape to be easily used on any
instance of the image, thumb / low res / hi res etc.


-- Annotation items
Bodies of our annotations are named graph with a (soon)
dereferenceable URL. It contains triples that a user created
explicitly, like:
## ex:AWebPage ex:cites dbpedia:KarlMarx
## ex:AwebPage ex:similarTo ex:AnotherWebPage

However, in order to meaningfully visualise the annotation additional
info is needed. Think of an rdf:label, an dc:description or a
foaf:depiction for the dbpedia:KarlMark or for the ex:AWebPage. Such
information was not created by the annotator (the user), but rather
copied from dbpedia or automatically extracted from the <title> tag of
a web page, and stored (cached?) in our system.

We obviously want to distinguish between the triples created by the
user and such additional triples that say something more about the
"items" used in the annotation (dbpedia:KarlMarx). Consider that the
very same item can be used in more than one annotation by different
annotators and we want to be able to delete/modify it without
affecting the other annotations that are using it, and possibly
without complex or time consuming algorithms to check already used
items, etc... Plus: we want to be open to the possibility that a
certain item has different rdf:labels (for example) in different
annotations.

What we do now is maintaining an additional named graph for each
annotation (called the "Items Graph") and connect it to the annotation
explicitly:
## ex:MyAnnotation ex:items ex:MyItemsGraph.

This way we can:
- have a very fast access to common information needed to display a
meaningful annotation
- safely delete/modify/copy/move each annotation independently
- display correctly and consistently the annotation over time (despite
unavailability of the original sources, data changes etc)

So the question is: does someone have already faced a similar problem?
Is there a recommended solution? Or is this something outside the
scope of the OA specs?


-- Notebooks
In Pundit a notebook is a collection of annotations that a user has
built following some personal criteria. In the RDF world there are a
number of solutions to represent aggregations. Currently we simply use
an ad-hoc property attached to every annotation:
## ex:MyAnnotation pundit:isIncludedIn ex:MyNotebook
and
## ex:MyNotebook pundit:includes ex:MyAnnotation

Is there a recommendation about collections of annotations? We'd
really like to keep it as simple as possible.


-- Typos
3.2.2.1 “The normalization routine maybe be automatically be performed”
3.4.1 “Dereferencable”
3.4.1 “The Style class in the Open Annotation model for CSS … “,
shouldnt it be CssStyle class?
5.2 “Dereferencable”
5.3 “deferencable”
Graphs: in a couple of tables it is stated that “This class is not
used directly in Annotations, only subclasses are”, but then the graph
includes that, while the RDF does not. Isnt this a bit misleading? See
3.3 States, 3.4 Styles.




Thanks for any comment, and sorry again for the length of the mail.

Best,
Simone
Received on Thursday, 22 August 2013 13:03:59 UTC