Re: Robust Anchoring and Selectors (was: "Annotation" and "annotation") from Ivan Herman on 2014-11-14 (public-annotation@w3.org from November 2014)

From: Ivan Herman <ivan@w3.org>
Date: Sat, 15 Nov 2014 06:34:27 +0800
To: Doug Schepers <schepers@w3.org>
Cc: Jacob Jett <jjett2@illinois.edu>, Robert Sanderson <azaroth42@gmail.com>, W3C Public Annotation List <public-annotation@w3.org>
Message-Id: <242652EC-F183-448D-82A5-62EC0C161F37@w3.org>
> On 15 Nov 2014, at 06:00 , Doug Schepers <schepers@w3.org> wrote:
> 
> Hi, Jacob–
> 
> I agree with you that the basic building blocks of annotations have utility beyond the annotation use case; in fact, I think that's the best prospect we have for getting native implementations (in browsers, for example).
> 
> The charter already treats robust anchoring as a separate aspect; it just so happens that the data model also deals directly with selectors.
> 
> I don't think it's useful at this point to completely split out the selector aspect into its own thing, since that would undermine the utility of the data model spec.

I respectfully disagree, but that is subject for later discussion. There is no problem publishing the data model right now as is (it is a stake on the ground, and that is important) and possibly split the document later. But I see the selector specification plus the anchoring as something very important by itself and to be used in other areas (metadata is probably the most obvious one). Reuse may become way more natural if taken out of the strict annotation context.

But, again, this is a discussion for a later day in my view.

Ivan

> The data model has some basic selectors, and should include an extension point for other selectors not included in the basic data model.
> 
> At some later point, we might come up with a more generic way of specifying those selectors and the robust anchoring mechanisms for various frameworks (e.g. in browser, in semweb systems, in server-side APIs). Then we might decide how to reference those in the data model spec. Baby steps.
> 
> Even the robust anchoring topic can be broken down into useful components.
> 
> I'm working on a proposal, discussed briefly at TPAC, for a client-side 'findText' API, which would use the selectors from an annotation to return a DOM range object. This has interest from some browser vendors. But this would only be for text, not any arbitrary markup (images, data, math (?), graphics). To be part of a full solution, it should probably be wrapped in a higher-level API that also uses other arbitrary selectors to find other arbitrary content.
> 
> But it's a start!
> 
> 
> In the meantime, let's keep in mind what a useful set of building blocks for robust anchoring/selectors would be for your use case. Maybe you could write up a simplified use case scenario and derived requirements for this, so we have it captured?
> 
> Regards-
> -Doug
> 
> On 11/14/14 4:41 PM, Jacob Jett wrote:
>> With regards to both the tagging and highlighting/bookmarking cases, I do
>> see both of those as kinds of annotating activities.
>> 
>> Transcribing I tend to think of as transcribing. Your transcription example
>> brings me to another simmering issue though. The entire portion of the
>> model concerning selectors generalizes to use cases beyond annotation, as
>> I've mentioned to you in the past. Is there any possibility of spinning it
>> off into it's own standard-building activity?
>> 
>> Beyond it's obvious use in the transcribing example you have already given,
>> I have a collection building use case that requires that it be possible to
>> gather any arbitrary portion of any arbitrary resource into a collection.
>> 
>> To illustrate a specific example, I have some digital humanities scholar
>> who should like to gather together all of the poems written by women
>> authors between 1800 and 1825, i.e., early 19th century. The poetry
>> collection is to be used as an input for a text analysis process, as such
>> the scholar wants to jettison all of the extraneous material (page headers,
>> page footers, tables of contents, pretty much all of the structure and
>> metadata regarding the books that the poems appear in). The segmentation
>> techniques that we've already worked out during the development of the oa
>> model would be perfect for this task, except that the semantics of the
>> predicates are limited to cases of annotation. Which leaves me in the
>> rather awkward position of re-inventing the wheel through super-classing...
>> :(
>> 
>> My point is that annotations are not the web's swiss army knives but, this
>> model does have something of a monopoly on good strategies for arbitrary
>> segmentation of resources. We should be relatively precise with what an
>> annotation is and be very wary of scope creep.
>> 
>> An aside: some other thoughts about whether or not highlighting/bookmarking
>> is actually bodiless. I don't believe that these are actually bodiless.
>> These are both traditional, physical annotating activities and so make
>> sense to fold in here. The difference in this case though is the intended
>> target of the "annotation".
>> 
>> Under normal circumstances our expectation of an annotation body is that it
>> should contain some content of interest to a human being. We are just using
>> machines as the middleware to get it to that human. In the highlighting and
>> bookmarking case the "body's" content is directly intended for both machine
>> consumption and subsequently machine action, e.g., apply this style to this
>> arbitrary thing. This is because the human consumable content appears to be
>> quite abstract indeed.
>> 
>> If we wanted to contemplate more about what this human consumable content
>> is then we might consider whether or not annotating something with a color
>> is not in fact equivalent to annotating it with an image, or some sounds,
>> etc, vis a vis a human end user. It seems to me that it is and so, I'm not
>> so confidant that we actually have the license to say that the
>> highlighting/bookmarking use case is equivalent to there being no body
>> content in the annotation at all. It seems much safer to conclude that the
>> effort of carrying that content has been punted to something machine
>> actionable (i.e., CSS in this case) rather than the more ordinary formats
>> we normally expect to do that work (e.g., text, video, images, etc.). Which
>> begs the question as to whether or not there are additional kinds of
>> machine actionable annotations beyond bookmarking and highlighting. Bob
>> Morris's editing use case comes to mind and the idea of "expectations"
>> rears its ugly head once again.
>> 
>> Regards,
>> 
>> Jacob
>> 
>> 
>> 
>> 
>> _____________________________________________________
>> Jacob Jett
>> Research Assistant
>> Center for Informatics Research in Science and Scholarship
>> The Graduate School of Library and Information Science
>> University of Illinois at Urbana-Champaign
>> 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
>> (217) 244-2164
>> jjett2@illinois.edu
>> 
>> On Fri, Nov 14, 2014 at 1:36 PM, Robert Sanderson <azaroth42@gmail.com>
>> wrote:
>> 
>>> 
>>> Hi Jacob,
>>> 
>>> On Fri, Nov 14, 2014 at 9:47 AM, Jacob Jett <jjett2@illinois.edu> wrote:
>>> 
>>>> I think though we need to be clearer that it isn't just any kind of
>>>> association between things (RDF does that natively) but, rather it is a
>>>> certain kind of association. It has specific semantics and while the web
>>>> presents certain opportunities for new functionalities at the end of the
>>>> day an annotation links some content or process to a specific thing that it
>>>> is intended to convey some information about.
>>>> 
>>> 
>>> Typically [[weaselwords]] they are, but there are many use cases where
>>> people intuitively want to use an annotation without the aboutness.
>>> 
>>> Some examples:
>>> 
>>> * Transcribing some text in an image for accessibility ... is the
>>> transcribed text "about" the region of the image? Maybe?
>>> 
>>> * Tagging the mention of an entity in some text with its identifier. Is
>>> the URI "about" the segment of text? Maybe?
>>> 
>>> * Highlights and bookmarks -- there's no Body to be about the target.
>>> 
>>> 
>>> 
>>>> I would not, for instance, use an annotation to gather things into a
>>>> collection. That's a very different kind of association to make.
>>>> 
>>> 
>>> Nor would I, personally, and yet tagging is often used for exactly that
>>> reason.  For example, tagging your music files is essentially organizing
>>> them into sub-collections, and interfaces let you filter on those.  Or
>>> github issues -- the tags (labels) are almost always used to group the
>>> issues into what amounts to collections.
>>> 
>>> I agree (of course!) with the spirit of being precise in our definitions
>>> and at the same time being easy to understand and relate to, but it's very
>>> hard without excluding significant use cases that have been considered to
>>> be in scope of the work.
>>> 
>>> Rob
>>> 
>>> 
>>> 
>>> --
>>> Rob Sanderson
>>> Technology Collaboration Facilitator
>>> Digital Library Systems and Services
>>> Stanford, CA 94305
>>> 
>> 
> 
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Friday, 14 November 2014 22:35:09 UTC