Re: Annotation Concept vs Document (was Level 1 comments)

On 1/9/13 6:05 PM, Robert Sanderson wrote:
> On Wed, Jan 9, 2013 at 7:44 AM, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>> wrote:
>
>     I really don't like that an annotation resource is in fact denoting a serialization.
>     This puts a big burden on recognizing an annotation after it has passed a data conversion step, which will happen quite often in the kind of interoperability scenarios you're after.
>
>
> Yes, and hence Section 3.3.2. about expressing the equivalence of resources, including Annotations.
>
>     There is a need for representing an annotation as a more abstract business object, which is "created" by people or smart agents. Of course I understand the need for requirements on the provenance of documents and data sources, but that seems quite distinct (and to me, quite less important).
>
>
> But there isn't such a need for Bodies, which is where the actual content is? It seems inconsistent not to want an identifier for the Body, but to want an identifier for the concept of the Annotation as well as one for the Annotation Document.


I never said I don't want an identifier for the body. I said that for some bodies I don't feel an identifier is needed. That's quite different.
Plus, yes, I don't feel an identifier for an annotation is always needed (even though that's not good practice, really). For now I'm just arguing about the pattern and type.

>
> Could you give some use cases when you would need to really distinguish the two, that aren't covered by equivalentTo?


This is quite an absurd question. Of course equivalentTo will cover most cases, it would be a real pity otherwise. I'm arguing it's not covering them in a very effective manner, and the whole idea of considering annotations firstly as serializations is moot.

Let's consider the scenario where a complex annotation ann1 (with hasBody, hasTitle, creator, etc.) issued by one system and re-used by one system, which re-publishes it in a different packaging/serialization. In this case the two systems publish the same 'annotation-as-concept', it's only serialization data (10-20% of all triples, maybe less if there are specific resources without identifiers to re-use) that changes. But because the current OA dictates that annotations are serializations, then the second system is forced to re-issue an identifier (ann2) for it.
The problem is how to handle the duplication. Should all triples of ann1 attached to ann2, too? Where to stop the duplication (hasBody is duplicated, not serializedAt, what for others?)?
oa:equivalentTo allows in principle this duplication to happen. But:
- the current semantics fall quite short on specifying how it should happen. And quite logically: it would be a pity if a domain-specific vocabulary like OA ends up fully specifying what should be specified at another, more fundamental level (how to handle equivalence between resources).
- the fact that there would be such massive triple duplication is quite a bad thing.


>
>     I still think that respecting the one-to-one principle is important in these matters: attributing statements (e.g., oa:serializedBy and oa:hasTarget) to one URI, while they belong to different levels, can be very confusing in a paradigm (the Semantic Web one) that expects this kind of mixture not to happen.
>
>
> I agree ... except that it happens all the time, as people take short cuts like this in RDF when they're not considered dangerous.


Yes, that's why I did not make such a fuss about options 2-3.


>
>     Actually I'd be curious to hear about the feedback you received on ORE ResourceMap. I personally don't think it was technically so bad.
>
> Neither did we, of course :) The feedback I recall was:
>
> * From the non LOD community, the mandatory 303 was seen as unnecessarily complex to implement for little visible gain.
> * From the LOD community, the fact that we required information be attached to the Resource Map, which they saw as a temporary artifact only needed to serialize the graph for a particular HTTP transaction, was seen as unnecessarily complex.
> * And of course, the never-ending HttpRange14 arguments.


Ok. Note that Named Graphs may actually solve some of these: as far as I know, they can represent documents/serializations, i.e., have identifiers that do not force a 303.
(btw I suppose we can also use hash ids in some cases to alleviate this 303 issue).


>
>     To me a good way for handling solution 1 would be for OA to just coin the properties serializedAt and serializedBy and *defer to other 'data provenance' proposals* (NGs, ResourceMaps, PROV...) for how to use them, i.e., on which resource exactly to attach them. Of course we could provide a couple of examples as guidance.
>
> We could formalize Appendix A (Mapping to W3C Provenance Model) further and introduce actual classes for the Annotating and Serializing activities, Annotation entity as opposed to the Serialization entity, etc. However I am very wary of introducing multiple ways to express the same information, and the current solution seems (to me) to be the easiest compromise.
> On the other hand, so long as it was additional information, systems that need the division could add it, and systems that don't would ignore it. So the Annotation would have oa:annotatedAt/By and oa:serializedAt/By for the majority of cases, and might also have identities provided for the additional PROV entities and activities.
>
>
>     On option 2 or 3: I trust that if there's one resource, then it should mainly denote the more abstract annotation, not the serialization. I think this has less pitfalls for interoperability between applications. If you're searching for a justification: just imagine the kind of horrible questions data consumers will ask about the semantics of oa:equivalent! (whether or not higher-level statements like oa:hasBody or oa:hasTarget statements should be propagated across equivalent annotations -- I believe they should).
>
>
> I would have thought the other way around. That it's easier to assert equivalence between documents than it is between concepts.


Come on, the notion of serialization is even lower-level than the one of most document. Let me add a syntactically meaningless space in one JSON or XML representation of an annotation, and re-publish it. I'd have to consider it as a new serialization (it's not the same amount of bytes, and it's generated at a different time) and have to use equivalentTo (or have others scratch their heads to try to elicit one) and then all the triples for the annotation concept would have to be duplicated again by me or the data re-user.


> In fact the concept should only ever have one URI, or can be trivially owl:sameAs other identical concepts. The metadata about the document is what prevents us from using owl:sameAs in the first place, rather than oa:equivalentTo.


Yes, precisely: we have to coin a new property next to the 10-or-so already existing equivalence properties around (owl:sameAs, etc), and then scratch our heads, what the operational semantics of that thing would be (if there's no spec on how to transfer/duplicate the triples from one annotation to another, it's useless).


>
>     And we could keep the current pattern but updating the semantics of serializedBy to mean something like
>     "this resource [which is an 'abstract' annotation]" has been serialized by X"
>     as opposed to "this serialization was carried out by X" as I understand the meaning of serializedBy now.
>     This property would become a kind of 'shortcut':
>     anAbstractAnnotation -serializedBy-> X
>     standing for the hypothetical path
>     anAbstractAnnotation -hasSerialization-> anAnnotationSerialization -createdBy-> X
>
>
> Yes, this is what the PROV mapping expresses.


No: the PROV mapping is
anAnnotationSerialization -oa:serializedBy-> X
stands for
anAnnotationSerialization -prov:wasGeneratedBy-> aSerializationActivity -prov:associatedWith-> X

By the way of course the mapping to PROV introduces the notion of annotation-as-concept. My whole point is that it makes sense to have this annotation considered the main resource in the model, because that's the level where most of the crucial triples attached to the oa:Annotation actually lie.
Your choice of words and identifiers in the PROV mapping is quite revealing: Anno1 is an oa:Annotation, but then the PROV entity is also called <Annotation>...


>
>     Side question: I'd be curious to hear whether
>     oa:Annotation rdfs:subClassOf ore:Aggregation
>     holds for you (for me it does!)
>
> We tried that in OAC, you may not be surprised to hear. It ... was not well received.
> See: http://www.openannotation.org/spec/alpha2/#DM_Baseline and compare to /alpha and /beta


Well, if you already had this view that annotations are serialization, it's not a surprise that a mapping to ore:Aggregation (which are rather abstract beasts) has been not well received!

Antoine

Received on Thursday, 10 January 2013 08:38:35 UTC