RE: [model] Clarifying annotation architecture

 

 

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: Tuesday, August 04, 2015 4:43 AM
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Tim Cole <t-cole3@illinois.edu>; Frederick Hirsch <w3c@fjhirsch.com>; W3C Public Annotation List <public-annotation@w3.org>
Subject: Re: [model] Clarifying annotation architecture

 

 

> On 04 Aug 2015, at 24:30 , Robert Sanderson < <mailto:azaroth42@gmail.com> azaroth42@gmail.com> wrote:

> 

> 

> Thanks Tim!

> 

> On Mon, Aug 3, 2015 at 1:45 PM, Timothy Cole < <mailto:t-cole3@illinois.edu> t-cole3@illinois.edu> wrote:

> I remain unconvinced that assigning a motivation or role to an Embedded Textual Body not being assigned a URI in the Annotation is bad RDF.

> 

> 

> In my mind, if you don't give an Embedded Textual Body a URI (i.e., if you let it default to a blank node) then that Embedded Textual Body will not be reused outside the current Annotation, and therefore the role assigned to Embedded Textual Body in the context of the current annotation is the only role which this particular instance of the string will ever have.

> 

> 

> That's true if you're the server providing the Annotation with a blank node body ... but on receipt of an Annotation, servers MUST assign a URI to the annotation, and MAY assign them to all other nodes.  Suddenly your blank node becomes a real resource and can be referenced from outside of the annotation.

 

Hm. Why? I think that would be an error.

 

So on last call I advocated for making the MAY into a SHOULD, at least for oa classes in the annotation lacking URIs (I am not always known for my consistency).  The rationale was to facilitate interoperability by providing a way to reference bodies and targets that otherwise would not have identifiers because the client had no good way to mint.  But clearly there are tradeoffs. And as alluded to it may make a difference how the URI is minted. 

 

We also have some interesting language in the Model right now in section 4.1, " If it is not considered important to allow other Annotations or systems to refer to the Specific Resource, then a blank node may be used instead." Rob, will this be part of what we look at under the new issue you added to address whether parts of 4.1 belong in the Protocol rather than the Model?

 

I am not sure why a URI must be assigned to a blank node body. However, it indeed it must, then (just as in any other RDF environment) I believe it is a requirement that URI-s assigned to a body must be unique. See also what the RDF 1.1 concept document says about this:

 

[[[

In situations where stronger identification is needed, systems MAY systematically replace some or all of the blank nodes in an RDF graph with IRIs. Systems wishing to do this SHOULD mint a new, globally unique IRI (aSkolem IRI) for each blank node so replaced.

]]]  <http://www.w3.org/TR/rdf11-concepts/#section-skolemization> http://www.w3.org/TR/rdf11-concepts/#section-skolemization

 

There is also a skolemization approach described in that section. We can even require a MUST instead of a SHOULD for this.

 

Ivan

 

 

> 

> Relying on blank node anonymity is a dangerous route to follow, and cuts off options such as annotating part of the body.  For that you'd need a SpecificResource with a Source of the body URI, and a selector to say which part of the text you're talking about. And now we're back into assigning context-specific properties to URIs in the global space :(

> 

>  Rather than try to correct and augment these examples in this email thread, the way to go *if there's enough interest* is to create a wiki page or the like where people can argue the details, correct mistakes and augment with additional examples. Would this be worthwhile to do?

> 

> 

> Yes I think so :)

> 

> Deleting the examples, and pulling out the questions:

> 

> 

> The biggest drawback here is that I now have 2 separate annotations that were really created together by the agent. We've had discussions about Annotation Sets, and though not accommodated by our current model or protocol drafts, implementers are using Annotation Sets in practice.  LDP patterns offer a way to group or cluster annotations, albeit not all that efficiently and requiring knowledge of Direct Containers (or Indirect Containers), an LDP feature that not everyone is choosing to implement.

> 

> 

> The issue here is that then the client needs to be able to create the Container as well as the Annotations.  I think that's quite a step up in terms of implementation requirements, as clients could then create arbitrary structures.  If they can create Direct or Indirect Containers, there would be a LOT of sanity checking required. (e.g. I create a Direct Container that adds junk into _your_ Annotation Set :S )

> 

> As much as I'm a proponent of LDP, the structure seems less amenable to client side manipulation compared to the management of resources within existing structures.

> 

> 

> Each annotation (previously each body) can now be referenced on its own, which is good for many use cases. One glaring limitation -- LDP does not currently provide a way (that I know of) to post or get all of 3A, 3B, and 3C in a single HTTP request. Possibly we could define a best practice. There is also a 4th resource (the LDP Direct Container) involved which is not shown. This may be too much overhead. Still the annotation set concept is useful and is being used. There are other (non-LDP) ways to do Annotation Sets. Do we want to make any accommodation for Annotation Sets (e.g., add a class and predicate to our data model context)? If we did, this option might become more attractive. Note, that in an editing scenario, subsequent annotations, e.g., feedback from the author on the proposed edit, can be added to the set.

> 

> 

> There would need to be a profile that inlined all of the Annotations into the AnnotationSet as an additional requirement.  Whether the Container is the AnnotationSet or not would be to be determined (e.g. a new Basic Container, not a Direct/Indirect Container)

> 

> The cross-overs with search and paging are also interesting questions.  In a search, do I get the Set or the individual Annotations? If there's 1000 Annotations in my set, do I have to do paging on it?  etc.

> 

> 

> 

>  ***Option 2 - body-level motivations***

> 

> 

> Graph 1 can be rewritten with Embedded Textual Bodies or with SpecificResource Bodies, allowing implementers to express additional properties of the bodies, such as role or motivatedBy (illustrated below). Graph 4 below, using Embedded Textual Bodies, is inconsistent with the argument you have made that to express the role of a body you must always have a Specific Resource (see above).

> 

> 

> Right, and see above for why I think even blank nodes here are a problem in a distributed ecosystem rather than a single interaction.  Graph 5 is the most verbose, but I believe the most correct.

> 

>  ***Option 3 - specializations of the hasBody predicate***

> 

> 

> This is probably the least extensible approach and creates problems for those not using RDF inferencing

> 

> 

> Yes.

> 

> R

> 

> --

> Rob Sanderson

> Information Standards Advocate

> Digital Library Systems and Services

> Stanford, CA 94305

 

 

----

Ivan Herman, W3C

Digital Publishing Activity Lead

Home:  <http://www.w3.org/People/Ivan/> http://www.w3.org/People/Ivan/

mobile: +31-641044153

ORCID ID:  <http://orcid.org/0000-0003-0782-2704> http://orcid.org/0000-0003-0782-2704

 

 

 

 

Received on Tuesday, 4 August 2015 16:04:14 UTC