Re: Annotation Concept vs Document (was Level 1 comments) from Robert Sanderson on 2013-01-15 (public-openannotation@w3.org from January 2013)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Tue, 15 Jan 2013 16:58:38 -0700
To: public-openannotation <public-openannotation@w3.org>
Message-ID: <CABevsUFGhW2RuK1XsCZ4QODN7wb7VbSxXuCZB_L6WCVtFg5DLA@mail.gmail.com>
To report back explicitly about this topic...

Paolo, Herbert, Antoine and I looked through the specification
carefully over the weekend to see what impact this change would have.
We determined that having the URI of the Annotation primarily identify
the concept rather than the serialization would not drastically change
the interpretation or use of the model.

The significant changes were in the mapping to the PROV model such
that the Annotation Document is introduced, rather than the Annotation
Concept, and in the wording of several sections.  The
serializedBy/serializedAt shortcuts still remain in the specification,
and hence we have a one node model with shortcuts rather than
explicitly requiring two nodes to discuss concept separately from
serialization.    Also discussed was the possibility of a three node
model (!) consisting of Annotation Concept, Annotation Graph and
Annotation Document... which was quickly accepted as both ultimately
correct and the most annoying :)

We feel that this addresses the concerns in the thread below, but
would encourage everyone to read the new draft with a critical eye in
this regard.

Many thanks!

Rob


On Wed, Jan 9, 2013 at 8:16 AM, James Smith <jgsmith@gmail.com> wrote:
> I think I have some of the same discomfort Antoine is voicing w.r.t. serialization and annotation. Worrying about the serialization as a fundamental component of an annotation's identity strikes me as more of a document model problem than a data model problem.
>
> I can see where URI-A and URI-B denote different annotations because they are two different URLs, but I have a hard time understanding why URI-A as RDF/XML and URI-A as JSON-LD should represent two different annotations. They are simply representations of the same underlying resource named URI-A. The resource happens to be an OA annotation.
>
> We've seen this in other areas. For example, with images, we might have a URL that points to the image resource but have the image available in various serializations such as JPEG, PNG, or GIF. There are times when we will need to annotate the image as an image regardless of serialization, but there can be times when we will need to annotate the image in a particular serialization (e.g., the JPEG and PNG are fine, but the GIF has an artifact). OA provides a way to distinguish these cases.
>
> We need both kinds of addressability. I don't see why we can't treat annotations as just another resource when we use them as targets/bodies.
>
> Or am I totally missing the point?
>
>
> On Jan 9, 2013, at 9:44 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:
>
>> Hi Rob,
>>
>> Ouch. We're coming to a comment I'd have made only while reviewing 3.2.3!
>>
>> I really don't like that an annotation resource is in fact denoting a serialization.
>> This puts a big burden on recognizing an annotation after it has passed a data conversion step, which will happen quite often in the kind of interoperability scenarios you're after.
>> There is a need for representing an annotation as a more abstract business object, which is "created" by people or smart agents. Of course I understand the need for requirements on the provenance of documents and data sources, but that seems quite distinct (and to me, quite less important).
>>
>> I still think that respecting the one-to-one principle is important in these matters: attributing statements (e.g., oa:serializedBy and oa:hasTarget) to one URI, while they belong to different levels, can be very confusing in a paradigm (the Semantic Web one) that expects this kind of mixture not to happen.
>>
>> Actually I'd be curious to hear about the feedback you received on ORE ResourceMap. I personally don't think it was technically so bad. My guess is that the negative feedback might have been motivated by the very act of trying to meet a very general requirement (data sources) within a vocabulary designed for a more specific requirement (aggregations). Especially at a time where they were other approaches (SPARQL named graphs) being devised. OK, NG were not a standard then and are still not. But I understand the will of some people to avoid another proposal, possibly difficult to re conciliate with NGs, to emerge, while they were embarking on bringing NGs to the next level.
>>
>>
>> To me a good way for handling solution 1 would be for OA to just coin the properties serializedAt and serializedBy and *defer to other 'data provenance' proposals* (NGs, ResourceMaps, PROV...) for how to use them, i.e., on which resource exactly to attach them. Of course we could provide a couple of examples as guidance.
>> I suppose you will not like it, but it's quite legitimate given that the solutions at hand at not mature or consensual yet. The community could sort out later, which is the best solution.
>> It could also be that different (sub) communities stick to different options. But that can be ok as well: perhaps there is one solution which is perfect for RDF but horrible for another...
>>
>>
>>
>> On option 2 or 3: I trust that if there's one resource, then it should mainly denote the more abstract annotation, not the serialization. I think this has less pitfalls for interoperability between applications. If you're searching for a justification: just imagine the kind of horrible questions data consumers will ask about the semantics of oa:equivalent! (whether or not higher-level statements like oa:hasBody or oa:hasTarget statements should be propagated across equivalent annotations -- I believe they should).
>>
>> And we could keep the current pattern but updating the semantics of serializedBy to mean something like
>> "this resource [which is an 'abstract' annotation]" has been serialized by X"
>> as opposed to "this serialization was carried out by X" as I understand the meaning of serializedBy now.
>> This property would become a kind of 'shortcut':
>> anAbstractAnnotation -serializedBy-> X
>> standing for the hypothetical path
>> anAbstractAnnotation -hasSerialization-> anAnnotationSerialization -createdBy-> X
>>
>>
>> Side question: I'd be curious to hear whether
>> oa:Annotation rdfs:subClassOf ore:Aggregation
>> holds for you (for me it does!)
>>
>>
>>
>> Cheers,
>>
>> Antoine
>>
>>
>>>
>>> Dear all,
>>>
>>> To pick up on one of Antoine's comments in particular:
>>>
>>> On Sun, Jan 6, 2013 at 8:47 AM, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>> wrote:
>>>
>>>
>>>    2. "An Annotation is the expression of a relationship between two or more resources in the form of a serialized graph."
>>>    I find this confusing. Serialization is a representation in one syntax. This hints that an annotation serialized in RDF/XML is not the same as an annotation serialized in Turtle... I would remove "in the form of a serialized graph".
>>>
>>>
>>> That is actually the exact intent. An Annotation is a document, which necessarily has a serialization. Therefore the RDF/XML serialization of a graph, from URI-A, is a different "Annotation" from the same graph serialized in JSON-LD from URI-B.
>>>
>>> This is to avoid having to have multiple nodes, one identifying the Annotation and the other identifying the serialization. This was met with large rounds of disdain from the Linked Data community when it was done in the Open Archives ORE spec (Conceptual Aggregation vs Resource Map Document) and necessitates the use of the 303 redirect paradigm.
>>>
>>> The two options considered:
>>>
>>> (1) Have multiple nodes. One for the serialization, one for the Annotation concept.
>>> Costs:
>>> * Have to mint and maintain two identifiers. People don't like doing this. Look at the "textual body" discussion!
>>> * Have to have a 303 redirection service
>>> * Have to include both in the graph in order to have the serializedBy/serializedAt information
>>> * Have to have specific instructions as to what to refer to, Concept or Serialization, in further Annotations
>>>
>>>
>>> (2) Have a single identity that represents the serialization
>>> Costs:
>>> * Have to either explain the issue in detail to people who probably don't care, or gloss over it and hope Antoine doesn't notice :)
>>> * Have to have serializedBy/At and annotatedBy/At to properly maintain the provenance information
>>>
>>> We figured that option (2) was the lesser of the two evils.
>>>
>>> The hypothetical option (3) is to have a single identity that represents the concept, but that would be much harder to justify as to why you got a representation from a concept.
>>>
>>> Our proposed solution is to keep the text in the introduction as is, but explain the situation further in the Provenance section for people who care about it.
>>>
>>> Rob & Paolo
>>>
>>>
>>
>>
>
>
Received on Tuesday, 15 January 2013 23:59:06 UTC