Re: Annotation Concept vs Document (was Level 1 comments) from Antoine Isaac on 2013-01-10 (public-openannotation@w3.org from January 2013)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 10 Jan 2013 22:18:09 +0100
To: public-openannotation <public-openannotation@w3.org>
Message-ID: <50EF3011.9050202@few.vu.nl>
Hi Rob,

I'm happy to see we fundamentally agree ;-)

I'm insisting however for one last time, I will insist, because I fear I might have be unclear (as maybe too often, unfortunately...)
I was not suggesting that OA itself creates two classes. On the contrary. If there's a second class, it should be provided by some other effort focused on the provenance of data in general. OA is about annotations. Hands off from more general requirements!

My core concern is that you're focusing your one node on the wrong interpretation (annotation-serialization and not annotation-concept).

I'd be curious to know in fact whether people are or will use it as such anyway. Who is using oa:Annotation as a serialization, *seriously*? Especially:
- changing the URI everytime a file is generated
- using oa:equivalentTo to connect to previous versions

I'm happy to have these questions asked to the community. I'm not an implementer myself, after all (though I'd suffer from bad modeling, in my own role).
But it is such a weird thing to have only serialization-focused resources. I'm all for trying to use OA as a vocabulary for data about annotations. But I'm much more afraid of dealing with the consequences of using it as a vocabulary for data about files that describe annotations. There are too many cans of worms involved with that.

So my own recommendation for now would be to mint these two serialization-level properties, and live the community one more year to agree on where to put them. I'm pretty sure it will depend on the format used, if they are about serialization.

Next to it (and of course even more if you keep to the current interpretation!), your suggested editor notes would be helpful. In fact it could help triggering the discussion on where to put the serialization-level statements.

I have yet remarks on some drawbacks:


> * There's significant cost in terms of 303 implementation and URI maintenance.  You can't just push an Annotation document up on a website and be done with it, you really need some publishing infrastructure to do the right thing.


I'm not so sure we can be so positive here. Not every URI needs to be 303-redirected. To the extent I understood the receipes, the URI of a datasource (and this would be the case for a serialization) can be served with an HTTP 200. And people afraid of 303-redirection for the Annotation-concept could use hash URIs built on top of the serialization URI.

Note that even if you forced the two-resource solution, there would be room for a flexible implementation illustrating the tradeoff.
people interested in "stable" annotation-concepts should probably indeed use 303 redirection.
The others can have http://example.com/aSerialization served with an HTTP 200 and http://example.com/aSerialization#theAnnotationConcept trivially accessible from the document they have retrieved. The downside of this is that the concept has "stable" statements but no "stable" identifier anymore (as it's built on top of http://example.com/aSerialization). But well, this is for the implementers who are primarily interested in the current OA annotation-as-serialization scenario anyway, they may not notice much difference.


> * There's a not insignificant cost in additional number of triples.


As the triples would be distributed over two resource (or one resource and whatever the serialization format offers for representing data about a given formalization), there is just one additional link needed to relate these two resources.
And there's possibly huge gains by avoiding the duplication that happens when an annotation is being re-published over different serializations

Daring an analogy with the infamous Batman example [1], having two resources suggests a Batman vs. Bruce Wayne distinction, while having annotations as serializations hints that the relevant grain is the Batman of every slice of time (a discrete time, though ;-) )


> * We still need equivalentTo for when there is any change to the Annotation, such as taking an embedded resource and publishing it with an HTTP URI.

Yes, but at least now equivalentTo would be used to bridge gaps that are more significant from a "business" perspective (e.g., when the structure of an annotation changes).

Antoine


[1] http://blog.soton.ac.uk/webteam/2010/09/02/the-modeler/


> Hi Antoine, all,
>
> I don't think it's feasible for the current version to change this from how it stands to mandating the identification (or at least modeling with a blank node) of both the concept and the document separately. It was discussed at length at the first CG face to face as well as explicitly rejected in OAC previously. I cannot speak for AO, Annotea, Annotator, Pundit or other existing systems in terms of design choice and discussion, but the vast majority that we looked at also did not have two nodes.
>
> My proposal is to:
> * Add an editor's note to the spec in the Provenance section saying that it is under discussion, and especially if the Named Graphs specs reach maturity, it may change in the future.
> * Increase the detail in the Provenance Mapping Appendix to allow it to be used in practice to identify the Concept separately from the Documents, to enable systems that do have a requirement for this to play in both worlds.
>
> To attempt to summarize the issues:
>
> The advantages of two nodes:
> * Explicitly distinguishes the concept and the document that describes the concept (see ORE Aggregation vs Resource Map)
> * Could reuse the same existing predicates attached to the two nodes, eg dcterms:creator and dcterms:created, instead of inventing our own to distinguish concept and document
> * Likely to fit better in the future if and when Named Graphs are properly standardized
> * Could avoid equivalentTo in place of owl:sameAs for the republishing case of simple format transformation (but not if the graph changes)
>
> However:
> * You still need two identifiers for the Annotation concept, due to the trust and dereference issues.
> * There's significant cost in terms of 303 implementation and URI maintenance. You can't just push an Annotation document up on a website and be done with it, you really need some publishing infrastructure to do the right thing.
> * There's a not insignificant cost in additional number of triples.
> * We still need equivalentTo for when there is any change to the Annotation, such as taking an embedded resource and publishing it with an HTTP URI.
> * It's still likely to change in the future, given Named Graphs and so forth.
>
> In other words, it's again the choice between preciseness of modeling and ease of implementation.
>
> To put my cards on the table, I'm personally in favor of the two node approach, as it seems significantly cleaner and number of new identifiers/triples has not been a fundamental axis for design decisions. Or at least until the textual body discussion :)
>
> Rob
>
Received on Thursday, 10 January 2013 21:18:40 UTC