Re: use case clarification - cross format annotations from Jacob Jett on 2014-12-05 (public-annotation@w3.org from December 2014)

From: Jacob Jett <jjett2@illinois.edu>
Date: Fri, 5 Dec 2014 14:57:01 -0600
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Frederick Hirsch <w3c@fjhirsch.com>, Nick Stenning <nick@whiteink.com>, Web Annotation <public-annotation@w3.org>
Message-ID: <CABzPtBJomwK_kbKmQgGcbPai3Y7GTe8o_TNeddOH6+ub1b4GRg@mail.gmail.com>
+1 to what Rob has noted here.

Some thoughts on DOIs, it seems to me that we might be able to reuse this
particular URI as the identifier for the Choice node since it acts as
something of a fulcrum among the various representations. IIRC, selectors
are specific to the individuals named by the choice so there shouldn't be
any nasty collisions there.

We're hoping to do something similar with our workset model for the HTRC.
One of the issues is that we have multiple representations for the pages of
each digitized volumn (e.g., a text file and an image file). Ideally we'd
like a distinct identifier that persistently identifies the page-size chunk
of content independent from its respective representations. Since I'm
hoping to reuse some of the selector concepts there our decision here is
going to be important.

With regards to using DOIs, since they are non-information resources,
perhaps we might consider adopting a community best practice that promotes
the Choice solution (even in cases where the annotator is only providing
one choice). This would keep the selectors operating at the correct level
of object (the representation) while still allowing the annotator to target
the non-info resource.

Regards,

Jacob


_____________________________________________________
Jacob Jett
Research Assistant
Center for Informatics Research in Science and Scholarship
The Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
(217) 244-2164
jjett2@illinois.edu

On Fri, Dec 5, 2014 at 1:50 PM, Robert Sanderson <azaroth42@gmail.com>
wrote:

>
> All,
>
> Apologies for only catching up with this thread now due to travel.
>
> Some things to note, all from my non-chairy perspective.
>
> ## EPUB and IDPF
>
> The EPUB world has this same requirement.  The solution that was arrived
> at for the use of Open Annotation in that space was:
> http://www.idpf.org/epub/oa/#h.hnfijet1uk3j
>
> Notably, the recommendation is to give as much metadata as possible to
> allow client systems (possibly working against an offline library of
> content) the best opportunity to discover appropriate works.  If there is a
> known, unique URI, then that's great but there are many situations when
> that won't be available, even if the current representation is online.
>
> ## Web Architecture
>
> In the web architecture we have resources with identity that can provide
> representations.  As a specification building best practice, we should
> strive to follow that architecture.  Thus if we have an annotation about a
> Work, then that Work should be a resource which is identified by a URI.
> The problem is that we rarely have that URI, and even if the publishing
> system knows it, it has no recommended way to convey it when providing the
> representation.   This isn't an issue unique to annotation, of course, but
> is something for which we could consider providing guidance in the
> annotation space -- a method for a resource to ask a client to instead of
> annotating the URI in the browser, to please use this supplied URI instead.
>
> ## Choice as a Workaround
>
> When presented with multiple URIs that convey the same information, a
> Choice could be used to maintain that list of representations.  As Choices
> are resources, it could be maintained outside of the annotation and
> dereferenced when used.  There's clearly problems, but it avoids the FRBR
> issue [1] of trying to guess exactly what the annotator is trying to
> comment on -- the file they currently see, all the way up to the concept of
> the intellectual content that the file conveys.
>
> [1] If you don't know what FRBR is, I encourage you to remain ignorant and
> not waste valuable time and braincells ;)
>
> ## Scope
>
> We're definitely not going to solve it perfectly... but should we try to
> solve it at all?  Content negotiation is an architectural option, and
> annotating the generic URI plus negotiation for the representation would
> fix a lot of the issues.  It's just that we don't have or know the URIs
> that do this, when all the browser sees is the representation's URI.  See
> the webarch topic above :)
>
> ## DOIs and Fragments
>
> As Bill knows, I'm ... less of a fan of DOIs than others in the scholarly
> publishing sector.  One thing to note is that once you hit any redirecting
> URI, such as a DOI, the use of fragments to identify segments of the
> resource goes out the window.  That fragment will be at best lost, and at
> worst end up referring to something completely unexpected when the
> publisher sends you to an HTML splash page, rather than the PDF that was
> originally annotated.  The best we can hope for is to include as much
> description about the target resource as we can, and hope that the client
> can do something sensible with it.
>
>
> Rob
>
>
>
> On Fri, Dec 5, 2014 at 11:23 AM, Frederick Hirsch <w3c@fjhirsch.com>
> wrote:
>
>> Thanks to Paolo, Nick for clarifying this.
>>
>> It seems we can simplify by assuming server side intelligence where
>> needed in conjunction with identifier standardization done elsewhere.
>>
>> Not sure of the downside of this approach.
>>
>> regards, frederick
>>
>> Frederick Hirsch
>> @fjhirsch
>>
>> On Dec 5, 2014, at 4:27 AM, Nick Stenning <nick@whiteink.com> wrote:
>>
>> > On Thu, Dec 4, 2014, at 19:39, Bill Kasdorf wrote:
>> >> Isn't it a problem, though, that the DOI _identifies_ the document but
>> it
>> >> doesn't necessarily _locate_, or link _to_ the document?
>> >
>> > I'm not sure it is a problem, because as Paulo has already mentioned in
>> > a new thread on this subject, these problems can be addressed on the
>> > server, behind an API.
>> >
>> >
>> > ### Situation 1: retrieve all annotations for current page
>> >
>> > I am on a web page, say "http://jimwatson.com/papers/dna.html". I want
>> > to retrieve all annotations for that web page. I can, with a dumb
>> > client, make a call to a search API, providing only the page URL. The
>> > server can then, in principle:
>> >
>> > 1a) look up the URL in an internal cache mapping URLs to identifiers,
>> > OR, in the event of a cache miss
>> > 1b) fetch the URL, and scan it for metadata such as the
>> > previously-mentioned "dc.identifier" meta tags
>> > 2) as a result of 1a) or 1b), resolve the URL to a set of URLs:
>> >
>> >        {canonical identifiers for the document} ∪ {URLs for the current
>> >        representation}
>> >
>> > 3) return all search results for that broader set
>> >
>> > This is what Paolo has referred to as "Target extension" in
>> >
>> >
>> http://lists.w3.org/Archives/Public/public-annotation/2014Dec/0021.html
>> >
>> >
>> > ### Situation 2: retrieve all pages for current annotation
>> >
>> > I am starting with an annotation (perhaps previously retrieved from
>> > storage) and I want to find all pages which it annotates. The question
>> > that sits at the core of this discussion is:
>> >
>> >    "Should all the information I need be contained within the
>> >    annotation itself,
>> >     or can I rely on the use of a supporting API to help me?"
>> >
>> > My feeling is that answering that the annotation should be
>> > self-contained results in a horrendously complicated wire format that
>> > almost no client implementations will know how to support.
>> >
>> > By contrast, answering that you can rely on a (perhaps domain-specific)
>> > storage API, which knows how to resolve DOIs and other canonicalised
>> > identifiers into repr URLs and vice versa, allows for:
>> >
>> > - relatively simple clients
>> > - lower network overhead
>> > - vastly increased flexibility in mapping URIs -> URLs and vice versa
>> >
>> > To expand on the last point. If we put this mapping in the data model,
>> > we are limited to the concepts that can reasonably be expressed in a
>> > data format we are expecting people to parse.
>> >
>> > If we allow this mapping to be encoded in a program that runs behind an
>> > API, we have the full power of any programming language and any
>> > necessary domain assumptions to help us.
>> >
>> > -N
>> >
>>
>>
>>
>
>
> --
> Rob Sanderson
> Technology Collaboration Facilitator
> Digital Library Systems and Services
> Stanford, CA 94305
>
Received on Friday, 5 December 2014 20:58:09 UTC