Re: My thoughts on the multi-body alternatives (as shown on Tim's wiki page) from Ivan Herman on 2015-08-19 (public-annotation@w3.org from August 2015)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 19 Aug 2015 07:14:19 +0200
To: Doug Schepers <schepers@w3.org>
Cc: Robert Sanderson <azaroth42@gmail.com>, Tim Cole <t-cole3@illinois.edu>, W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <37C51F52-B151-45C0-AE13-AFDC988639DA@w3.org>
Doug,

need some clarification/comment…

> On 19 Aug 2015, at 01:26 , Doug Schepers <schepers@w3.org> wrote:
> 
> Hi, Rob–
> 
> I'm going to propose a compromise. I don't think either of us will love it, but I hope we can both live with it.
> 
> I'm going to use "object"/"property" terminology, but feel free to reformulate into RDF terminology.
> 
> 
> 1) We allow (but don't require) an "id" property on each object, to make it addressable;
> 
> 
> 2) We strive for a single consistent structure that applies equally to Body, Target, Tag, and so on, modulo the proposal in #3.
> 
> 
> 3) We make the nested "source" object a SHOULD, while the empty-node construct is only a MAY, and only allowed for text-literal resources, and (maybe?) only in a non-Linked Data context; we define a clear equivalence mapping.
> 
> For example, these statements would be equivalent:
> 
>  "body" : {
>    "role" : "tagging",
>    "source" : {
>      "value" : "+1"
>    }
>  }
> 
>  "body" : {
>    "role" : "tagging",
>    "value" : "+1"
>  }
> 
> But these would not:
> 
>  "body" : {
>    "role" : "linking",
>    "source" : {
>      "type": "Image",
>      "id": "http://example.com/image.png"
>    }
>  }
> 
>  "body" : {
>    "role" : "linking",
>    "type": "Image",
>    "id": "http://example.com/image.png"
>  }
> 

When you say 'would not', you mean these two are not equivalent, I guess.

I was actually wondering whether we may want to go one step further. Namely that the object associated with "body" MUST have a "source" (or maybe an alternative "value" for textual content), and that is what identifies the 'real' annotation body. In other words, the second construction would become a bug. If so, the value of "source" may be a simple URI, too, ie, these two would be equivalent:

"body" : {
   "role"   : "linking",
   "source" : "http://example.com/image.png"
}

"body" : {
   "role"  : "linking",
   "source" : {
       "id" : "http://example.com/image.png"
   }
}

I guess what I am heading at is that (also in the model) the object assigned to body may either be a pure simple literal to handle the simple cases, or, essentially, a Specific Resource.

As you said, the body itself may or may not have its own ID, but that ID is disjoint of the ID of, say, the image in this case.

That also mean that the idiom we use in the current model:

"body" : {
 "id" :  "http://example.com/image.png"
}

becomes moot and the correct way is

"body" : {
 "source" :  "http://example.com/image.png"
}

and this makes the corresponding RDF a little bit more complex. But I would prefer to push the complexity on the RDF side (which can easily cope with this) if it makes it simpler to the non-RDF user community.

(As an aside: if we go down that line, and reflecting on one of Rob's remarks in another mail, maybe we indeed want to come back to our decision and use "@id" instead of "id". It would emphasize its very particular role and usage.)

Ivan

> 
> 
> 4) We define the default type of "value" to be "text". We require explicit "type" values for all other datatypes.
> 
> For example, these statements would be equivalent:
> 
>  "body" : {
>    "role" : "commenting",
>    "source" : {
>      "type" : "text",
>      "value" : "This reminds me of a meme…"
>    }
>  }
> 
>  "body" : {
>    "role" : "commenting",
>    "source" : {
>      "value" : "This reminds me of a meme…"
>    }
>  }
> 
>  "body" : {
>    "role" : "commenting",
>    "value" : "This reminds me of a meme…"
>  }
> 
> 
> 
> 5) We continue to discuss property names that might be more intuitive. For example, I find "source" less clear than "content", and I'd like to see different proposals for the terms "EmbeddedContent" and "SpecificResource".
> 
> 
> Thoughts?
> 
> Regards–
> –Doug
> 
> 
> On 8/18/15 3:57 PM, Robert Sanderson wrote:
>> 
>> 
>> On Tue, Aug 18, 2015 at 12:01 PM, Doug Schepers <schepers@w3.org
>> <mailto:schepers@w3.org>> wrote:
>> 
>>    Hi, Rob–
>>    On 8/17/15 2:41 PM, Robert Sanderson wrote:
>> 
>>        On Mon, Aug 17, 2015 at 10:13 AM, Timothy Cole wrote:
>>        Now that resource has two roles, tagging and commenting.
>> 
>> 
>>    Can you please describe again (I feel you've mentioned it before)
>>    the use case for this 'body' reuse? Especially in the case where the
>>    body is a text literal?
>> 
>> 
>> Sure.
>> 
>> As a reviewer using an annotation tool to comment on a paper, I want my
>> review to be persistent and referenced. It might refer to other papers
>> beyond the one I'm commenting on, for example to point out plagiarism or
>> to suggest other sources, but the review is of the target paper.
>> 
>> As a paper author, I want to link that review to my cited paper as a
>> justification for its value.  E.g. the same content reviews one paper
>> and provides support for another paper.
>> 
>> The review starts off as a block of text in an annotation client.  It is
>> then transferred via the protocol to a server.  The server creates a URI
>> for it.
>> The second annotation takes that URI and uses it as the body, with a
>> different role.
>> 
>> 
>> And another:
>> 
>> I post on twitter a comment noting a typo on a wikipedia page.
>> A system then uses that more specifically as the justification for an
>> annotation that also suggests the change, both using more specific
>> motivations.
>> 
>> 
>> And another:
>> 
>> I post on medium my thoughts about a particular politically charged
>> topic.  It's a comment on a wikipedia page.
>> People on both sides of the topic take the same post different ways and
>> use it as support for their view and a dismissal of the opposition.
>> 
>> 
>> And another:
>> 
>> I transcribe a quote from a book as part of a crowd-sourcing platform.
>> I then use that quote as a comment on the museum exhibit that it is
>> talking about.
>> 
>> I can go on if needed.
>> 
>>    Is this use case common, or is it an edge case?
>> 
>> 
>> Common.
>> 
>>    I'm having a hard time imagining a large-scale annotation
>>    application that would reuse body literals, rather than simply
>>    having multiple instances of similar bodies, each contained in its
>>    own annotation. The user experience and workflow aren't clear to me.
>> 
>> 
>> When the authorship of the body is important. Which is almost always.
>> Note that the author of the body is not necessarily the author of the
>> annotation, as per the examples above, bar the last one.
>> 
>> 
>>    I totally understand that multiple annotations might use the same
>>    external resource (e.g. a picture or video) as a body, but that's a
>>    different case with a different object structure (and a different
>>    UX/workflow).
>> 
>> 
>> All of the above *start* as plain text, so the same UX for the first
>> part.  The second annotation doesn't need to re-type the text, rather
>> than selecting existing content.  So I think I agree that there is a
>> different workflow, even if the same UI might allow both.
>> 
>> However I disagree that there must be a different structure.  Having a
>> consistent structure for both uses -of the same body- seems important,
>> as clients and servers will otherwise need to implement both, depending
>> on the otherwise arbitrary order in which the annotations were created.
>> 
>> 
>>    At some point, if you're pointing to 2 different external resources,
>>    it seems like it would be hard to delineate between an annotation
>>    with multiple targets (or bodies), rather than a clear body-target
>>    relationship, and I don't see what kind of annotation client would
>>    structure things that way.
>> 
>> 
>> I don't understand this, sorry.
>> 
>>    I assume that your annotation client does something like this… can
>>    you tell us how that works?
>> 
>> 
>> And I'm not sure what you're asking for here.
>> 
>> 
>> 
>>        Rather than consistently using the Specific resource pattern:
>> 
>>        "body": {
>>            "role": "tagging",
>>            "source": {
>>              "id": "http://repo.org/bodies/1",
>>              "value": "+1"
>>            }
>>        }
>> 
>>        Which will always work at the (IMO minimal) cost of slightly
>>        more structure.
>>        It's also clearer without the explicit types, as role can only be on
>>        SpecificResource.
>> 
>> 
>>    Is this structured allowed, or required? If it's simply allowed,
>>    then we agree. If it's required, then I'm a bit less comfortable.
>> 
>> 
>> It would be required for resources with URIs.  I would prefer to require
>> it also for Embedded content for consistency, and to keep the separation
>> of concerns per my response to Tim.
>> 
>> 
>>    When we extrapolate to multiple bodies (which is really what we're
>>    talking about), the extra code become more obvious:
>> 
>>    "body" : [
>>       { "role" : "tagging", "value" : "+1"},
>>       { "role" : "commenting", "value" : "This reminds me of a meme…" },
>>       { "role" : "linking", "source" : "http://example.com/image.png" }
>>    ]
>> 
>> (Fixed and compacted inline)
>> 
>>    versus:
>> 
>>    "body" : [
>>       {
>>         "role" : "tagging",
>>         "source" : { "value" : "+1" }
>>       },
>>       {
>>         "role" : "commenting",
>>         "source" : { "value" : "This reminds me of a meme…" }
>>       },
>>       {
>>         "role" : "linking",
>>         "source" :  "http://example.com/image.png"
>>       }
>>    ]
>> 
>> (Fixed and compacted inline)
>> 
>> But in most cases:
>> 
>> "body" : [
>>   {
>>     "role" : "tagging",
>>     "source" : { "value" : "+1" }
>>   },
>>   {
>>     "role" : "commenting",
>>     "source" : { "type" : "text", "value" : "This reminds me of a meme…" }
>>   },
>>   {
>>     "role" : "linking",
>>     "source" :  {  "type": "Image", "id": "http://example.com/image.png" }
>>   }
>> ]
>> 
>>    At that point, it's not clear what this structure buys us, though
>>    I'll admit that it adds a uniformity of structure between constructs
>>    of different types might make it easier to always do the right thing.
>> 
>> 
>> Uniformity in data structures is good, rather than constantly having to
>> test for the existence of different structures.  Also in terms of making
>> it easier to do the right thing, and the actual complexity of the
>> structure, if you have to explain one thing well, that's easier than
>> explaining two things well plus when you would choose to use one or the
>> other.
>> 
>> Especially when you have to understand and implement either both anyway,
>> or just one.
>> 
>> 
>>        That is why I'm +0, rather than -1.  I can live with it if
>>        needed, but I
>>        think there's a better way that separates the two concerns:
>> 
>>        EmbeddedContent:  Transfer content of any type for any resource,
>>        URI or
>>        no, in the serialized annotation.  (Which is why we talked about
>>        it in
>>        the Serialization section in the CG docs)
>>        SpecificResource:  Make annotation specific assertions about a
>>        Body or
>>        Target resource. (Until now, that has been selector, state,
>>        style and
>>        scope ... we're just adding another specifier of role)
>> 
>> 
>>    Perhaps the terms "EmbeddedContent" and "SpecificResource" are
>>    throwing me off a bit. Are those terms used in LD/RDF, or are they
>>    terms we've introduced?
>> 
>> 
>> We introduced both.
>> 
>> We (the WG) introduced EmbeddedContent to replace the defunct
>> ContentAsText work, after many failed efforts to get the people
>> responsible for it to take it forwards.
>> http://www.w3.org/TR/Content-in-RDF10/
>> 
>> And (as earlier in the thread) we (the Open Annotation Collaboration,
>> pre CG) introduced Specific Resource based on Tim Berners-Lee's notion
>> of Specific vs Generic resources in the web architecture, previously
>> called Constrained resources.
>> 
>> 
>> Hope that helps,
>> 
>> Rob
>> 
>> --
>> Rob Sanderson
>> Information Standards Advocate
>> Digital Library Systems and Services
>> Stanford, CA 94305
> 


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 19 August 2015 05:14:32 UTC