RE: Bodies resource from Benjamin from Timothy Cole on 2015-09-02 (public-annotation@w3.org from September 2015)

From: Timothy Cole <t-cole3@illinois.edu>
Date: Wed, 2 Sep 2015 09:53:38 -0500
To: "'Ivan Herman'" <ivan@w3.org>, "'Robert Sanderson'" <azaroth42@gmail.com>, "'Benjamin Young'" <bigbluehat@hypothes.is>
CC: "'W3C Public Annotation List'" <public-annotation@w3.org>
Message-ID: <078901d0e58f$27618190$762484b0$@illinois.edu>
Ivan-

I like the approach you suggest in 2.2 based on Benjamin's idea  of 'introduc[ing] an alternative "bodies" (as Benjamin put it) which has this new structure,' but only assuming it is not part of the Data Model TR.  

The TR we are creating is an interoperability spec. Our goal should be a spec that some will adopt exactly, others will adopt with local augmentations, and others will adapt, modify and optimize for local context, including non-RDF local contexts and constrained local contexts that would not be practical or wise to build into an interoperability model. If we can illustrate such optimizations, that may encourage developers to build on our foundation (making it easier to subsequently map for export to the shared interoperability model) rather than starting entirely from scratch, but wouldn't it be appropriate to put such illustrations outside the data model TR?

So as you suggest in your follow-up post, let's update the data model TR with best consensus we can muster, maintaining focus on flexibility, generality, interoperability and potential for adaptability and see where we are.  Would it then be reasonable to have a task force(s) propose additional documentation illustrating 'sanctioned' adaptions / simplifications for more constrained contexts? 

-Tim Cole

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: Wednesday, September 02, 2015 5:29 AM
To: Robert Sanderson <azaroth42@gmail.com>; Benjamin Young <bigbluehat@hypothes.is>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Subject: Re: Bodies resource from Benjamin

Rob, Benjamin

I have the following issues with the overall scheme.

1. I am not sure how 'closed' our spec is. What I mean is: is it possible for an application to add its own properties to a body (or target) object? I would expect it would be possible, ie, to add an additional property like

"body" : {
   "tags": [ {"text": "correction"}, {"text": "typo"} ],
   "comments": {"text": "wow...I should learn to type..."},
   "sourceOrganization" : "http://my.company.org"
}

where "sourceOrganization" is mapped, through an additional json-ld @context, to schema:sourceOrganization.

If this is allowed (and I do not see why not) then the client has to do more than just iterate through all the properties; it has to make an extra effort to find out which one is a role and which one is something else. This is the same weakness as the one in the subProperties pattern.

2. What is not clear is whether the usage of Composite would be the ONLY way of expressing a body (whether I have a meaningful role or not).

2.1. If it is the only way, than we make the simple thing more complicated. If the approaches we are discussing these days become part of the spec, I can do something like

body: {
   "content" : "http://www.ex.org"
}

if the body becomes a Composite only, then I must add an extra role, albeit artificial one, ending up with something like

"body" : {
   "annotation" : { "id" : "http://www.ex.org" } }

where "annotation" is a meaningless role. Even if there *is* a role

body: {
   "role" : "commenting"
   "content" : "http://www.ex.org"
}

would become

body: {
   "commenting" : { "id" : "http://www.ex.org" } }

which I do find more complicated, although it is a matter of taste.


2.2. If this is NOT the only way, then maybe it is better to separate the two approaches. Allow for "body" as we have now (or will have in the new TR) and introduce an alternative "bodies" (as Benjamin put it) which has this new structure, ie, to use "Composite". (Or, alternatively, to require the presence of "Composite" when that is used, but I think that would be very error prone). That may work, although with the caveat in point 1 above. (Although redundancy is not good in a spec, there may be cases, like this one, when we may want to live with it).

I think what we have to decide here is what exactly the role (sic!) or "role" is in practice. I understand there are very valid use cases where we have multiple bodies with multiple roles. What is the percentage of those use cases? Because we have to optimize for the majority of use cases, obviously, and make that majority easy to express. That may help us deciding which direction to go.

Cheers

Ivan


> On 02 Sep 2015, at 05:20 , Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> 
> Okay, as this has not been discussed previously, we should give it a fair shot...
> 
> Riffing on the strawperson from Benjamin and the mention of Composites from Jacob, today I have been trying to model the requirements we have in the following way:
> 
> The body or target of an annotation is a Composite, where that resource has relationships to the included resources.  Those relationships would replace the use of motivations, and the base set would be enumerated in the model.  Additional relationships could be created to cover further use cases, such as the copy-edit replacement or the canvas-painting motivation in IIIF.
> 
> This would NOT suffer from the main objection to using subProperties of hasBody/hasTarget, which is that it would be impossible to determine which resources were bodies, which were targets and which neither, as hasBody/hasTarget would point to the Composite.  As the Composite is a construction within the Annotation, there would be no need to have properties other than motivation replacing relationships, thus a pure JSON/javascript client could iterate through all of the properties (excluding id and type) and know that they were bodies (or targets) even if they do not understand the semantics.  For annotations with a known structure, the direct method of accessing the information would work (anno.body.replacement) meeting the performance requirements expressed.
> 
> For example:
> 
> {
>   "@context": [
>    "http://www.w3.org/ns/anno.jsonld",
>    "http://example.org/ns/edit.jsonld"
>   ],
>   "id": "http://example.org/anno1",
>   "type": "Annotation",
>   "target": {
>    "resource": "http://example.com/doc1"  // should really be a SpecificResource with Selector
>   },
>   "body": {
>      "type": "Composite",
>      "tags": [ {"text": "correction"}, {"text": "typo"} ],
>      "comments": {"text": "wow...I should learn to type..."},
>      "replacements": {"text": "itinerary"}
>   }
> }
> 
> Here resource is oa:item (which we could rename), and tags, comments, replacements are all subProperties of it.  Replacements is defined in the second context.
> 
> 
> However, don't get too excited ... this does NOT work with the other multiplicity constructions where the order of the members is important.
> Although in JSON, the value of "tags" is an array, that is actually the following turtle:
> 
> _:body a oa:Composite ;
>   oa:hasTag [ oa:text "correction"], [ oa:text "typo"]
> 
> And not an rdf:List:
> 
> _:body a oa:Composite ;
>   oa:hasTag ( [oa:text "correction"] [oa:text "typo"])
> 
> (Apologies to those who do not speak turtle as a native language -- 
> the first is not ordered, the second is)
> 
> Thus, a Choice of comments could NOT be modeled as:
> 
> "body": {
>   "type": "Choice",
>   "comments": ["http://eg.org/comment-en", "http://eg.org/comment-fr" 
> ] }
> 
> Because this would require comments / oa:hasComment to be both ordered in some instances and not ordered in others, which is not possible.
> 
> Two separate keys and predicates that reflect the same motivation could be created (commentsList / oa:hasCommentList, comments / oa:hasComment) but that seems pretty terrible, especially as the values for the two would be identical.
> 
> The current proposal does allow for this use case, as the Composite or Choice would simply have SpecificResources with motivations as items/members.
> 
> Unless someone else can see how this could work (in RDF and JSON please), while fulfilling all of the requirements?
> 
> Rob
> 
> 
> 
> On Tue, Sep 1, 2015 at 10:06 AM, Doug Schepers <schepers@w3.org> wrote:
> Hi, Benjamin–
> 
> I realize that you were probably just putting out a strawman for discussion, and that you were probably making a different point, but since you are talking in code, I thought it would be useful to make a specific point about your code.
> 
> Just a high-level response, inline…
> 
> On 9/1/15 11:40 AM, Benjamin Young wrote:
> On Tue, Sep 1, 2015 at 11:21 AM, Robert Sandersonwrote:
> 
> 
>         Where this is trending now in my head is that we *keep*
>         motivation on the annotation, but create classes for bodies.
>         What this *might* look like in JSON-LD is something like:
> 
>         ```
>         {
>            "type": "Annotation"
>            "motivation": "editing",
>            "bodies": {
>              "tags": ["correction", "typo"],
>              "comment": "wow...I should learn to type...",
>              "edit": {
>                "original": "itinirary",
>                "replacement": "itinerary"
>              },
> 
> This should not be necessary, under any of the proposals we'd been considering thus far.
> 
> My immediate reaction was (I think) similar to Rob's:
> 
>     * A pattern for extension that doesn't involve subProperties is what
>     we have now.
> 
> If I'm reading Rob correctly, this means that none of the bodies (or targets) should have special sub-properties (or sub-structures) of the same type (e.g. motives/motivations/roles) that require special parsing or processing.
> 
> (Note that Target does have Selectors each with idiosyncratic 
> properties, but in this case, I think it's unavoidable and they are 
> clearly defined.)
> 
> 
> Without making any judgment for or against other aspects of your strawman, and keeping everything else the same to isolate this single point for discussion, here's how I'd reformulate your strawman:
> 
>  ```
>  {
>     "type": "Annotation"
>     "motivation": "editing",
>     "bodies": {
>       "tags": ["correction", "typo"],
>       "comment": "wow...I should learn to type...",
>       "edit": "itinerary",
>       "related": ["http://dictionary.reference.com/browse/itinerary"]
>     },
>     "target": "http://example.com/doc1"
>     "target": {
>       "source": "http://example.com/doc1",
>       "selector": {
>         "type": "oa:TextQuoteSelector",
>         "exact": "itinirary"
>       }
>     }
>  }
>  ```
> 
> Yes, it's slightly longer. But has the same functionality, and it avoids two crucial problems:
> 
> 1) the needless duplication of information;
> 1a) you'd need a TextQuoteSelector in the target anyway to correctly 
> anchor the selection;
> 1b) mechanisms that duplicate information in multiple places are prone 
> to getting out of sync and causing problems;
> 
> 2) the need for idiosyncratic and potentially unpredictable additional 
> structures or properties within a known type of property
> 2a) this makes processing more difficult even for known structures of 
> this type
> 2b) introducing such a structure into an extension point sets a 
> pattern that makes graceful degradation very difficult
> 
> 
> And, again, it's not necessary. I think it's useful for use to talk about these edge cases (and central use cases) because it helps us validate that our design is practical and versatile. In this case, you wrote some strawman code that might well have been done by a developer unfamiliar with the data model's design principles, and we were easily able to reformulate it into something that easily avoids the problems.
> 
> This tells me 2 things:
> 
> 1) the data model is strong and flexible;
> 
> 2) we need to be really clear about how the model works, in terms the average developer can understand, and show explicitly how to add extensions (where they can be added, and how they should be structured); we can provide examples to make it clearer (like Rob's  “antecedent” and “subsequent” motives).
> 
> 
> 
> 
> On a related topic (which I'm putting here just to capture it)… Note 
> that this my formulation has a somewhat interesting side effect. Since the TextQuoteSelector doesn't have a "prefix" or "suffix", it's ambiguous which instance of the "exact" quote value "itinirary" it's referring to, if there was more than one misspelling in the same document. Is it the first instance? The last instance? All instances? Is this a hack for spellcheck, or an abuse of the data model? Should this be expressed as multiple targets? Or should we define some "all instances" property? Or should we require a "prefix" and/or "suffix"? Is the Data Model the right place to define UA behavior for resolving selectors? Or should there be another spec, perhaps something that defines UA behavior for selectors in terms of RangeFinder and other APIs?
> 
> Food for thought.
> 
> Regards–
> –Doug
> 
> 
> 
> 
> --
> Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 2 September 2015 14:56:04 UTC