Re: Copy-edit use case JSON strawman (was: Re: Bodies resource from Benjamin) from Doug Schepers on 2015-09-01 (public-annotation@w3.org from September 2015)

From: Doug Schepers <schepers@w3.org>
Date: Tue, 1 Sep 2015 14:21:46 -0400
To: Benjamin Young <bigbluehat@hypothes.is>
Cc: Robert Sanderson <azaroth42@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <55E5ECBA.2040406@w3.org>
Hi, Benjamin–

On 9/1/15 1:43 PM, Benjamin Young wrote:
> Changed the subject line as we're way off the map at this point. :)
>
> On Tue, Sep 1, 2015 at 1:06 PM, Doug Schepers wrote:
>
>     Hi, Benjamin–
>
>     I realize that you were probably just putting out a strawman for
>     discussion, and that you were probably making a different point, but
>     since you are talking in code, I thought it would be useful to make
>     a specific point about your code.
>
>
> Right. The strawman actually had nothing to do with any copy-editing use
> case--outside of that's the use case that started the multi-bodies
> discussion to begin with...so I took a crack at it. ;)

Okay, understood.


>     Just a high-level response, inline…
>
>     On 9/1/15 11:40 AM, Benjamin Young wrote:
>
>         On Tue, Sep 1, 2015 at 11:21 AM, Robert Sandersonwrote:
>
>
>                  Where this is trending now in my head is that we *keep*
>                  motivation on the annotation, but create classes for
>         bodies.
>                  What this *might* look like in JSON-LD is something like:
>
>                  ```
>                  {
>                     "type": "Annotation"
>                     "motivation": "editing",
>                     "bodies": {
>                       "tags": ["correction", "typo"],
>                       "comment": "wow...I should learn to type...",
>                       "edit": {
>                         "original": "itinirary",
>                         "replacement": "itinerary"
>                       },
>
>
>     This should not be necessary, under any of the proposals we'd been
>     considering thus far.
>
>
> Given this singular scenario (in which I did, unfortunately leave out
> the TextQuoteSelector...or a TextPositionSelector...) your proposal
> almost works.

I think it does work, details below.


> More below...
>
>
>     My immediate reaction was (I think) similar to Rob's:
>
>              * A pattern for extension that doesn't involve
>         subProperties is what
>              we have now.
>
>
>     If I'm reading Rob correctly, this means that none of the bodies (or
>     targets) should have special sub-properties (or sub-structures) of
>     the same type (e.g. motives/motivations/roles) that require special
>     parsing or processing.
>
>     (Note that Target does have Selectors each with idiosyncratic
>     properties, but in this case, I think it's unavoidable and they are
>     clearly defined.)
>
>
>     Without making any judgment for or against other aspects of your
>     strawman, and keeping everything else the same to isolate this
>     single point for discussion, here's how I'd reformulate your strawman:
>
>       ```
>       {
>          "type": "Annotation"
>          "motivation": "editing",
>          "bodies": {
>            "tags": ["correction", "typo"],
>            "comment": "wow...I should learn to type...",
>            "edit": "itinerary",
>            "related": ["http://dictionary.reference.com/browse/itinerary"]
>          },
>          "target": "http://example.com/doc1"
>          "target": {
>            "source": "http://example.com/doc1",
>            "selector": {
>              "type": "oa:TextQuoteSelector",
>              "exact": "itinirary"
>            }
>          }
>       }
>       ```
>
>
>     Yes, it's slightly longer. But has the same functionality, and it
>     avoids two crucial problems:
>
>     1) the needless duplication of information;
>     1a) you'd need a TextQuoteSelector in the target anyway to correctly
>     anchor the selection;
>     1b) mechanisms that duplicate information in multiple places are
>     prone to getting out of sync and causing problems;
>
>
> It's only "needless duplication" if you don't have more than one selector.
>
> Given that you can have more than one selector (even of the same
> type...as with any of these constructs), there's no way to tie the edit
> to the selector without making them both resources (giving them URLs or
> assigning them a blank node identifier).

I suggest that it works with any TextQuoteSelector.

I think it does require at least one "exact" value for a 
TextQuoteSelector, but it can be a replacement suggestion for any and 
all "exact" values of TextQuoteSelector on any Target.

Reversing the case, what if there are multiple "edit" (or maybe a new 
motive, "replace" or "copy-edit" might be better) Bodies and a single 
TextQuoteSelector "exact" values of a single Target?

Then the UA could decide what behavior makes the most sense; it could 
not present any "accept" option at all, but still render the bodies; it 
could offer each of the bodies as an "accept" option (like a 
spellchecker does); of it could do something else.

I'm not suggesting we mandate UA behavior, but we could lay out clear 
guidelines (or we could put some loose requirements somewhere).

What about multiple bodies and multiple targets? I'd argue that that's a 
bad (if valid) annotation, and UA vendors shouldn't produce such 
annotations, but that's for the market to decide.


>     2) the need for idiosyncratic and potentially unpredictable
>     additional structures or properties within a known type of property
>     2a) this makes processing more difficult even for known structures
>     of this type
>     2b) introducing such a structure into an extension point sets a
>     pattern that makes graceful degradation very difficult
>
>
> FWIW, this "edit" object would likely be pretty far out of scope for our
> charter, and left up to implementors or other specs and working groups.

I don't agree it's out of scope.

All we're suggesting is a way that such annotations can be structured in 
an actionable way.

We should stop short of suggesting how the actual write operation to the 
original document happens, but as far as conveying the annotator's 
intent, it's no more out of scope than allowing replies or any other 
action on an annotation.

W3C WGs are empowered to explore the bounds of their scope. Sometimes 
that does lead to collaborations with other WGs, or even to the launch 
of new WGs. No need to be overly prescriptive, so long as we don't 
distract ourselves or step on another WG's work.


> It would certainly make use of the selector system where possible, but
> it would also have a requirement to survive the changing of the document
> and still be a representative "edit"--so, at the very least, the "edit"
> and the "selection" would have to be expressly related.
>
>
>       And, again, it's not necessary. I think it's useful for use to
>     talk about these edge cases (and central use cases) because it helps
>     us validate that our design is practical and versatile. In this
>     case, you wrote some strawman code that might well have been done by
>     a developer unfamiliar with the data model's design principles, and
>     we were easily able to reformulate it into something that easily
>     avoids the problems.
>
> Almost. ;)
>
>     This tells me 2 things:
>
>     1) the data model is strong and flexible;
>
>     2) we need to be really clear about how the model works, in terms
>     the average developer can understand, and show explicitly how to add
>     extensions (where they can be added, and how they should be
>     structured); we can provide examples to make it clearer (like Rob's
>     “antecedent” and “subsequent” motives).
>
>     On a related topic (which I'm putting here just to capture it)…
>     Note that this my formulation has a somewhat interesting side
>     effect. Since the TextQuoteSelector doesn't have a "prefix" or
>     "suffix", it's ambiguous which instance of the "exact" quote value
>     "itinirary" it's referring to, if there was more than one
>     misspelling in the same document. Is it the first instance? The last
>     instance? All instances? Is this a hack for spellcheck, or an abuse
>     of the data model? Should this be expressed as multiple targets? Or
>     should we define some "all instances" property? Or should we require
>     a "prefix" and/or "suffix"? Is the Data Model the right place to
>     define UA behavior for resolving selectors? Or should there be
>     another spec, perhaps something that defines UA behavior for
>     selectors in terms of RangeFinder and other APIs?
>
>
> Given the reading of 4.2.2:
> http://www.w3.org/TR/annotation-model/#h5_text-quote-selector
> "This Selector describes a range of text by copying it, and including
> some range of text immediately before and after it to distinguish
> between multiple copies of the same sequence of characters within the
> document."
>
> Given the recommendation (and provision) of prefix and suffix, the
> expectation would be that it selects every "itinirary" typo in the document.

Makes sense to me, but it should be explicit.

Regards–
–Doug
Received on Tuesday, 1 September 2015 18:21:50 UTC