RE: My thoughts on the multi-body alternatives (as shown on Tim's wiki page) from Timothy Cole on 2015-08-17 (public-annotation@w3.org from August 2015)

From: Timothy Cole <t-cole3@illinois.edu>
Date: Mon, 17 Aug 2015 12:13:34 -0500
To: "'Ivan Herman'" <ivan@w3.org>, "'Robert Sanderson'" <azaroth42@gmail.com>
CC: "'W3C Public Annotation List'" <public-annotation@w3.org>
Message-ID: <004a01d0d910$43043e00$c90cba00$@illinois.edu>
Rob-

Focusing on the +0 for role on EmbeddedContent class and +1 for role on oa:SpecificResource class ...

Now that EmbeddedContent is in our namespace (having replaced our prior reliance on the now defunct Representing Content in RDF effort), I'm not seeing that we have meaningful distinctions between these classes that would make one more suitable than the other when it comes to attaching role.  Personally I would be +1 for both of these patterns in JSON:

"body" :  {  
        "type" : "Specific",
        "source" : "http://example.org/body1.html" ,
         "role" : "describing"
  }

"body" :  {  
        "type" : "Embedded",
        "value" : "I would be +1 for this." ,
         "role" : "commenting"
  }

My rationale (FWIW): I see as the key characteristic of both classes the ability to create and give identity (as needed) to a resource required to create a specific annotation -- which is to my mind what makes them both suitable objects to which to attach properties specific to the annotation.  The main substantive distinction is that one is limited to resources that can be expressed as strings (rdf:value) and the other is always derived from an existing resource (oa:hasSource). But though we introduce SpecificResource in the context of using only a segment or portion of a resource, SpecificResource can also be effectively used as a kind of proxy for resource in its entirety (as we are discussing in connection with Role). And similarly though we introduce EmbeddedContent in connection with text/plain bodies, this class can also be used for embedding text/html, text/xml, application/xml,  image/svg+xml, etc. anything that can be expressed as a string -- e.g., use XML to create an SVG meme and it can serve as the body of your annotation. 

Both may appear as blank nodes in an Annotation, but both may also be assigned a URI (though I tend to think this would not be the norm), which does mean, as you point out for EmbeddedContent resources, that we would be allowing role to be assigned to a resource that could be reused.  But I think the same is true for SpecificResource, even more so given current language, "If the Specific Resource has an HTTP URI, then the exact segment of the Source resource that it identifies, and only the segment, must be returned when the URI is dereferenced." So if associating a role directly with an EmbeddedContent meme is wrong because it could be created with or subsequently given a de-referenceable URI, than I think the same is true for SpecificResource. 

Of course my counter-argument would be that since both EmbeddedContent and SpecificResource are instantiated specifically for use in a specific annotation context, any reuse, while not explicitly discouraged, must account for the properties assigned the resource when instantiated.  This is certainly arguable, and I haven't come up with a good precedent yet, but my main thought is that both classes are equally suitable (or not suitable) for a hasRole property.  

As an aside, if we do decide that SpecificResource and EmbeddedContent are together the right direction to go to resolve the role issue (and my main concern here is that I don't like the idea of implicit typing in JSON-LD -- I think we need to include type explicitly in this situation), I think we should consider introducing EmbeddedContent and SpecificResources together in the data model. This would mean first introducing SpecificResource prior to its use in Section 4.1 where we begin talking about Specifiers. I think it would also be a good idea not to make EmbeddedContent so much about Textual Bodies, but rather make clear that it can be used for just about any resource that can be expressed as a string. 

Thanks,

Tim Cole

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: Monday, August 17, 2015 12:30 AM
To: Robert Sanderson <azaroth42@gmail.com>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Subject: Re: My thoughts on the multi-body alternatives (as shown on Tim's wiki page)


> On 16 Aug 2015, at 21:08 , Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> 
> Thanks Ivan!  Replies inline.
> 
> On Sun, Aug 16, 2015 at 3:50 AM, Ivan Herman <ivan@w3.org> wrote
> 
>> Here are my (random) thoughts:
>> - I believe that the pattern
>>   "a" : {
>>      "b" : "something",
>>      "c" : "something else"
>>   }
>> is a fairly natural pattern in JSON. To be specific, the fact that
>> 
>> "body" : "This image is worth viewing on my desktop."
>> is transformed into something like
>> "body" : {
>>         "source": "This image is worth viewing on my desktop.",
>>         "role" : "commenting"
>> }
>> is not, as far as I can judge, shocking for a JSON user.
>> 
> That actually doesn't work as is (oa:hasSource must be a URI), but yes, something like that should not be too disturbing.
> 

Sorry. This is a mistake in the email, not on the wiki page.

>> Except when "b" or "c" is "id", this pattern translates perfectly well through JSON-LD to RDF: it is a anonymous blank node, ie, where there is even no attempt to provide an identifier.
>> 
> Until such time as a server assigns a URI to a resource that was 
> formerly a blank node, via some sort of skolemization routine. Per: 
> http://www.w3.org/TR/rdf11-concepts/#section-blank-nodes
> 
> This is recommended as a pattern by Bizer and Heath:  
> http://linkeddatabook.com/editions/1.0/#htoc16
> 
> "The scope of blank nodes is limited to the document in which they appear, meaning it is not possible to create RDF links to them from external documents, reducing the potential for interlinking between different Linked Data sources. In addition, it becomes much more difficult to merge data from different sources when blank nodes are used, as there is no URI to serve as a common key. Therefore, all resources in a data set should be named using URI references."
> 
> And by David Wood, Michael Hausenblas (et al), in their Linked Data book:
> 
> "You should note that many people avoid using blank nodes. Blank nodes 
> can cause some difficulty when you get them back in query results 
> because you can’t query them later. They don’t have a name, so you 
> can’t resolve them. For this reason, many people just make up URIs whenever they need to and avoid blank nodes altogether."
> 
> 
> So unless we propose that blank nodes MUST NOT be given URIs (and a very quick -1 to that, unless we also intend to require LDPatch, and another -1 to that), relying on resources staying blank nodes is a dangerous assumption, in my opinion.
> 

Let us not go to a discussion on the usage of blank nodes. Suffices to say that I do not agree with a rigid interpretation of these sentences with all my respect to Chris and others above. These are personal opinions anyway, there is no such thing as a Linked Data Standard.

It is not necessary to discuss this because, of course, a system MAY generate an URI for a blank node; any statement that would require the usage of a blank node at a given point would be contrary to the RDF standard.

> 
>> In other words, I believe that using that pattern should be something we embrace.
>> - However, if the blank node is not anonymous, ie, we *must* add an "id": that I think is a problem. It forces the user either to mint a (fairly artificial) URI (eg, a urn:XXXX) or use the _:XXX pattern for a blank node ID. Something that makes the structure more complex, and forces a JSON user to use a notion (the blank node id) which is far from obvious. I believe we should try to avoid that.
>> 
> Agree with not *requiring* an ID, but also to stress that we also shouldn't require that it never have an id.
> 

As I said, it would even be contrary to RDF to do so. I believe in all our examples and encoding we should use the anonymous blank node and be silent on any ID, and let users and/or implementations decide whether they would add an URI.

However: I believe we should create constructions that would not *require* a serialization to mint URI-s or BNode identifier; such a constructions would mean a major cognitive load on JSON users who are not familiar with RDF constructs and notions.

> With -1 meaning cannot live with, +0 being can live with if that's the general consensus, and +1 being strongly prefer...
> 
>> This is the reason that I have to agree with Doug that the 'role 
>> assignment' approach is probably way too complex for a JSON user, and 
>> we should drop it. This in spite of the fact that, from a Semantics 
>> point of view, it is certainly attractive (that is why I was in 
>> favour of it, originally). Sorry Ray:-)
>> 
> Agreed. While the role assignment approach is able to be ignored when it doesn't apply and doesn't make assertions that aren't always true, it limits the generalization of the approach to tags, semantic tags and other situations where annotation specific information must be associated with a resource.  It's also more complex and surfaces the RDF blank node issue. So, I'm also not in favor of the approach, but it's better than others.
> 
> Role Assignment:  +0

Actually, my vote would be '-1' for the reasons stated above.

> 
> 
>> - The subproperty approach seems to be very simple; the JSON 
>> structure (see, eg, [2]) is structurally very close to the 
>> serialization without any role assignment (eg, [3]). What worries me 
>> the most is the proliferation of additional predicates, and the fact 
>> that the environment (including in JSON) has to, in effect, implement 
>> the subproperty relationship. Looks a bit as a spaghetti code, and 
>> may not be obvious to extend
>> 
> Agreed.  Again, it doesn't break the RDF framework, and might be argued that it's in fact best practice to create subproperties, we're trying to solve the problem for pure json clients, not clients with a full RDF stack that could determine that xxx:hasReplacement is a subPropertyOf oa:hasBody.  So again, in terms of fulfilling *all* of the requirements (must not break RDF, must be friendly to developers), it's not great.
> 
> SubProperties: +0

Again, my vote would be -1 (for the same reasons)


> 
> 
>> - The 'role attached to a resource', and the 'role as a class' have a very similar structure when serialized (eg, [4] and [5]). In fact, as I said, the current SemanticTag notion is already a representation of the 'role as a class' pattern. I must admit that I cannot make a big difference between the two; they look fairly similar to me, and I am not sure how I would choose among the two. I can live with both.
>> 
> The JSON pattern is the one we want to adopt, I agree, but the devil is in the details.
> 
> Both, as stated, generate broken RDF when used with resources that have identity.  We explicitly made a change to the CG model to fix this exact issue for Semantic Tags in the FPWD, and this would revert that fix.  A video must not be given a class of oa:Comment in one Annotation and a class of oa:Question in another, which this model would require.
> 
> Role as a Class:  -1
> Role attached to _any_ Resource: -1
> 
> 

Rob, I am not sure what you are arguing against. All the examples, as created by Tim (I just tried to beautify them) avoid this issue, those are not alternatives on the table.

> The embedded content resource, while it is a blank node, does not suffer from having its role conflated with it.  However (as above) when it gets given a URI, it falls into the same pattern as the video. As Ivan has already demonstrated (by putting a literal into hasSource) the confusion that this would generate would be huge, and particularly if we also remove types from the representation. We would need to explain when to use one pattern and when to use the other, thereby defeating the purpose of making the developer's life simpler.

Again, I do not know what you are arguing against (except that I made a mistake *in my mail*, not the wiki page).

> 
> So, I don't think it really meets the requirements of making things easier.  As soon as a server receives and transforms the pattern into the one needed for the non-blank-node resources, the client needs to now understand two patterns anyway.
> 
> Role attached to EmbeddedContent or SpecificResource: +0
> 
> 
> The role of the resource in the annotation is not a property of the resource, it is a relationship between the Annotation and the Resource.  Given that we don't want to do subproperties, there is only one possible method to use, which is to reify that relationship into a resource and a role.  This is (IMO) what Specific Resources and Motivations are, respectively.
> 
> A Specific Resource is the body or target -as it relates to the annotation-.  It's not the entire image, it's the segment identified by the Specific Resource and described by the selector.  It's not just the part of the image, it's the part of the image as identified by the Specific Resource, and described by the selector and the CSS Style. It's not every representation of the image, it's the JPG representation, as identified by the Specific Resource and described via the HTTP Request State.  It's not any role of the image, it's the role of tagging, as identified by the Specific Resource, and described by the Motivation.
> 
> As a hopefully illuminating historical note, previously Specific Resources were Constrained Resources, and Specifiers were Constraints [1]. This was because the selectors (etc) constrain the scope of the resource.  We changed the name to Specific for two reasons ... the notion of X Specific Resource vs X Generic Resource in Tim Berners-Lee's 2006 ontology [3], and that constraint based programming/reasoning is a very different thing.  We also played with ORE Proxies for the same role [2] (which would have looked like role assignments) and discarded for the same reasons as above.
> 
> [1] http://www.openannotation.org/spec/beta/#DM_Constraint
> [2] http://www.openannotation.org/spec/alpha2/#DM_Segments
> [3] http://www.w3.org/2006/gen/ont
> 
> So ... with -one consistent change- (allow Motivation to be associated with SpecificResource) we solve the problem in the desired tree hierarchy, for both body and target, without introducing new structure (role assignment) or opening the flood gates for new subproperties.  We solve the tagging inconsistency at the same time, for free.

Again, I am not sure what you are arguing against or fore in this case. Can you please look at the wiki page to say what problems you have with those patterns?

I *think* the only difference between what you say and what is in the examples is that you seem to *require* to type all resources that is used as domains of the "role" attribute as SpecificResource. I am neutral in terms of the requirements on this in terms of the model; but I have a '-1' against *requiring* it to explicitly state in the serialization. Again for a non-RDF user, the move from

"body" : "This image is worth viewing on my desktop."

to

"body" : {
 "value" : "This image is worth viewing on my desktop.",
 "role" : "commenting"
}

is easy to grasp, the additional type information, ie,

"body" : {
 "type" : "specific"
 "value" : "This image is worth viewing on my desktop.",
 "role" : "commenting"
}

would again be an obstacle.

> 
> Role attached to SpecificResource: +1
> 

As far as I am concerned, I have +1 for "Role Attached to resources" but a -1 if that is combined with a *requirement* to explicitly denote the resource as a Specific Resource. And, actually, I am 0 (or also +1) to the 'Role as Class' alternative  which I regard almost a variant of the roles attached to resources.

> 
>> A side issue, though: we should align, imho, the Semantic tagging structure to whichever we choose.
> 
> Agreed. And Tagging. If we can have a single consistent model, that would be great!
> 
> [Leaving out typing and multiplicity, which I think we should discuss 
> separately from roles]
> 

Agreed although… as I said above it may be an essential part of the picture

Ivan

> Rob
> 
> --
> Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Monday, 17 August 2015 17:18:02 UTC