Re: My thoughts on the multi-body alternatives (as shown on Tim's wiki page) from Ivan Herman on 2015-08-17 (public-annotation@w3.org from August 2015)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 17 Aug 2015 07:29:44 +0200
To: Robert Sanderson <azaroth42@gmail.com>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <1990C4EB-D968-4D77-9BD7-536F14DE6FC3@w3.org>
> On 16 Aug 2015, at 21:08 , Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> 
> Thanks Ivan!  Replies inline.
> 
> On Sun, Aug 16, 2015 at 3:50 AM, Ivan Herman <ivan@w3.org> wrote
> 
>> Here are my (random) thoughts:
>> - I believe that the pattern
>>   "a" : {
>>      "b" : "something",
>>      "c" : "something else"
>>   }
>> is a fairly natural pattern in JSON. To be specific, the fact that
>> 
>> "body" : "This image is worth viewing on my desktop."
>> is transformed into something like
>> "body" : {
>>         "source": "This image is worth viewing on my desktop.",
>>         "role" : "commenting"
>> }
>> is not, as far as I can judge, shocking for a JSON user.
>> 
> That actually doesn't work as is (oa:hasSource must be a URI), but yes, something like that should not be too disturbing.
> 

Sorry. This is a mistake in the email, not on the wiki page.

>> Except when "b" or "c" is "id", this pattern translates perfectly well through JSON-LD to RDF: it is a anonymous blank node, ie, where there is even no attempt to provide an identifier.
>> 
> Until such time as a server assigns a URI to a resource that was formerly a blank node, via some sort of skolemization routine. Per: http://www.w3.org/TR/rdf11-concepts/#section-blank-nodes
> 
> This is recommended as a pattern by Bizer and Heath:  http://linkeddatabook.com/editions/1.0/#htoc16
> 
> "The scope of blank nodes is limited to the document in which they appear, meaning it is not possible to create RDF links to them from external documents, reducing the potential for interlinking between different Linked Data sources. In addition, it becomes much more difficult to merge data from different sources when blank nodes are used, as there is no URI to serve as a common key. Therefore, all resources in a data set should be named using URI references."
> 
> And by David Wood, Michael Hausenblas (et al), in their Linked Data book:
> 
> "You should note that many people avoid using blank nodes. Blank nodes can cause
> some difficulty when you get them back in query results because you can’t query them
> later. They don’t have a name, so you can’t resolve them. For this reason, many people
> just make up URIs whenever they need to and avoid blank nodes altogether."
> 
> 
> So unless we propose that blank nodes MUST NOT be given URIs (and a very quick -1 to that, unless we also intend to require LDPatch, and another -1 to that), relying on resources staying blank nodes is a dangerous assumption, in my opinion.
> 

Let us not go to a discussion on the usage of blank nodes. Suffices to say that I do not agree with a rigid interpretation of these sentences with all my respect to Chris and others above. These are personal opinions anyway, there is no such thing as a Linked Data Standard.

It is not necessary to discuss this because, of course, a system MAY generate an URI for a blank node; any statement that would require the usage of a blank node at a given point would be contrary to the RDF standard.

> 
>> In other words, I believe that using that pattern should be something we embrace.
>> - However, if the blank node is not anonymous, ie, we *must* add an "id": that I think is a problem. It forces the user either to mint a (fairly artificial) URI (eg, a urn:XXXX) or use the _:XXX pattern for a blank node ID. Something that makes the structure more complex, and forces a JSON user to use a notion (the blank node id) which is far from obvious. I believe we should try to avoid that.
>> 
> Agree with not *requiring* an ID, but also to stress that we also shouldn't require that it never have an id.
> 

As I said, it would even be contrary to RDF to do so. I believe in all our examples and encoding we should use the anonymous blank node and be silent on any ID, and let users and/or implementations decide whether they would add an URI.

However: I believe we should create constructions that would not *require* a serialization to mint URI-s or BNode identifier; such a constructions would mean a major cognitive load on JSON users who are not familiar with RDF constructs and notions.

> With -1 meaning cannot live with, +0 being can live with if that's the general consensus, and +1 being strongly prefer...
> 
>> This is the reason that I have to agree with Doug that the 'role assignment' approach is probably way too complex for a JSON user, and we should drop it. This in spite of the fact that, from a Semantics point of view, it is certainly attractive (that is why I was in favour of it, originally). Sorry Ray:-)
>> 
> Agreed. While the role assignment approach is able to be ignored when it doesn't apply and doesn't make assertions that aren't always true, it limits the generalization of the approach to tags, semantic tags and other situations where annotation specific information must be associated with a resource.  It's also more complex and surfaces the RDF blank node issue. So, I'm also not in favor of the approach, but it's better than others.
> 
> Role Assignment:  +0

Actually, my vote would be '-1' for the reasons stated above.

> 
> 
>> - The subproperty approach seems to be very simple; the JSON structure (see, eg, [2]) is structurally very close to the serialization without any role assignment (eg, [3]). What worries me the most is the proliferation of additional predicates, and the fact that the environment (including in JSON) has to, in effect, implement the subproperty relationship. Looks a bit as a spaghetti code, and may not be obvious to extend
>> 
> Agreed.  Again, it doesn't break the RDF framework, and might be argued that it's in fact best practice to create subproperties, we're trying to solve the problem for pure json clients, not clients with a full RDF stack that could determine that xxx:hasReplacement is a subPropertyOf oa:hasBody.  So again, in terms of fulfilling *all* of the requirements (must not break RDF, must be friendly to developers), it's not great.
> 
> SubProperties: +0

Again, my vote would be -1 (for the same reasons)


> 
> 
>> - The 'role attached to a resource', and the 'role as a class' have a very similar structure when serialized (eg, [4] and [5]). In fact, as I said, the current SemanticTag notion is already a representation of the 'role as a class' pattern. I must admit that I cannot make a big difference between the two; they look fairly similar to me, and I am not sure how I would choose among the two. I can live with both.
>> 
> The JSON pattern is the one we want to adopt, I agree, but the devil is in the details.
> 
> Both, as stated, generate broken RDF when used with resources that have identity.  We explicitly made a change to the CG model to fix this exact issue for Semantic Tags in the FPWD, and this would revert that fix.  A video must not be given a class of oa:Comment in one Annotation and a class of oa:Question in another, which this model would require.
> 
> Role as a Class:  -1
> Role attached to _any_ Resource: -1
> 
> 

Rob, I am not sure what you are arguing against. All the examples, as created by Tim (I just tried to beautify them) avoid this issue, those are not alternatives on the table.

> The embedded content resource, while it is a blank node, does not suffer from having its role conflated with it.  However (as above) when it gets given a URI, it falls into the same pattern as the video. As Ivan has already demonstrated (by putting a literal into hasSource) the confusion that this would generate would be huge, and particularly if we also remove types from the representation. We would need to explain when to use one pattern and when to use the other, thereby defeating the purpose of making the developer's life simpler.

Again, I do not know what you are arguing against (except that I made a mistake *in my mail*, not the wiki page).

> 
> So, I don't think it really meets the requirements of making things easier.  As soon as a server receives and transforms the pattern into the one needed for the non-blank-node resources, the client needs to now understand two patterns anyway.
> 
> Role attached to EmbeddedContent or SpecificResource: +0
> 
> 
> The role of the resource in the annotation is not a property of the resource, it is a relationship between the Annotation and the Resource.  Given that we don't want to do subproperties, there is only one possible method to use, which is to reify that relationship into a resource and a role.  This is (IMO) what Specific Resources and Motivations are, respectively.
> 
> A Specific Resource is the body or target -as it relates to the annotation-.  It's not the entire image, it's the segment identified by the Specific Resource and described by the selector.  It's not just the part of the image, it's the part of the image as identified by the Specific Resource, and described by the selector and the CSS Style. It's not every representation of the image, it's the JPG representation, as identified by the Specific Resource and described via the HTTP Request State.  It's not any role of the image, it's the role of tagging, as identified by the Specific Resource, and described by the Motivation.
> 
> As a hopefully illuminating historical note, previously Specific Resources were Constrained Resources, and Specifiers were Constraints [1]. This was because the selectors (etc) constrain the scope of the resource.  We changed the name to Specific for two reasons ... the notion of X Specific Resource vs X Generic Resource in Tim Berners-Lee's 2006 ontology [3], and that constraint based programming/reasoning is a very different thing.  We also played with ORE Proxies for the same role [2] (which would have looked like role assignments) and discarded for the same reasons as above.
> 
> [1] http://www.openannotation.org/spec/beta/#DM_Constraint
> [2] http://www.openannotation.org/spec/alpha2/#DM_Segments
> [3] http://www.w3.org/2006/gen/ont
> 
> So ... with -one consistent change- (allow Motivation to be associated with SpecificResource) we solve the problem in the desired tree hierarchy, for both body and target, without introducing new structure (role assignment) or opening the flood gates for new subproperties.  We solve the tagging inconsistency at the same time, for free.

Again, I am not sure what you are arguing against or fore in this case. Can you please look at the wiki page to say what problems you have with those patterns?

I *think* the only difference between what you say and what is in the examples is that you seem to *require* to type all resources that is used as domains of the "role" attribute as SpecificResource. I am neutral in terms of the requirements on this in terms of the model; but I have a '-1' against *requiring* it to explicitly state in the serialization. Again for a non-RDF user, the move from

"body" : "This image is worth viewing on my desktop."

to

"body" : {
 "value" : "This image is worth viewing on my desktop.",
 "role" : "commenting"
}

is easy to grasp, the additional type information, ie,

"body" : {
 "type" : "specific"
 "value" : "This image is worth viewing on my desktop.",
 "role" : "commenting"
}

would again be an obstacle.

> 
> Role attached to SpecificResource: +1
> 

As far as I am concerned, I have +1 for "Role Attached to resources" but a -1 if that is combined with a *requirement* to explicitly denote the resource as a Specific Resource. And, actually, I am 0 (or also +1) to the 'Role as Class' alternative  which I regard almost a variant of the roles attached to resources.

> 
>> A side issue, though: we should align, imho, the Semantic tagging structure to whichever we choose.
> 
> Agreed. And Tagging. If we can have a single consistent model, that would be great!
> 
> [Leaving out typing and multiplicity, which I think we should discuss separately from roles]
> 

Agreed although… as I said above it may be an essential part of the picture

Ivan

> Rob
> 
> --
> Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Monday, 17 August 2015 05:29:54 UTC