Re: use case clarification - cross format annotations from Benjamin Young on 2014-12-11 (public-annotation@w3.org from December 2014)

From: Benjamin Young <bigbluehat@hypothes.is>
Date: Thu, 11 Dec 2014 15:09:01 -0500
To: Robert Sanderson <azaroth42@gmail.com>
Cc: W3C Public Annotation List <public-annotation@w3.org>, Frederick Hirsch <w3c@fjhirsch.com>, Nick Stenning <nick@whiteink.com>
Message-ID: <CAE3H5FJgGw3JrXgHLCPRbVWp0YAZuSbj-Px9NyESr0R+vuhMng@mail.gmail.com>
"The best we can hope for is to include as much description about the
target resource as we can, and hope that the client can do something
sensible with it."

I couldn't agree more.

Thanks for the summary, Rob,
Benjamin
On Dec 5, 2014 2:51 PM, "Robert Sanderson" <azaroth42@gmail.com> wrote:

>
> All,
>
> Apologies for only catching up with this thread now due to travel.
>
> Some things to note, all from my non-chairy perspective.
>
> ## EPUB and IDPF
>
> The EPUB world has this same requirement.  The solution that was arrived
> at for the use of Open Annotation in that space was:
> http://www.idpf.org/epub/oa/#h.hnfijet1uk3j
>
> Notably, the recommendation is to give as much metadata as possible to
> allow client systems (possibly working against an offline library of
> content) the best opportunity to discover appropriate works.  If there is a
> known, unique URI, then that's great but there are many situations when
> that won't be available, even if the current representation is online.
>
> ## Web Architecture
>
> In the web architecture we have resources with identity that can provide
> representations.  As a specification building best practice, we should
> strive to follow that architecture.  Thus if we have an annotation about a
> Work, then that Work should be a resource which is identified by a URI.
> The problem is that we rarely have that URI, and even if the publishing
> system knows it, it has no recommended way to convey it when providing the
> representation.   This isn't an issue unique to annotation, of course, but
> is something for which we could consider providing guidance in the
> annotation space -- a method for a resource to ask a client to instead of
> annotating the URI in the browser, to please use this supplied URI instead.
>
> ## Choice as a Workaround
>
> When presented with multiple URIs that convey the same information, a
> Choice could be used to maintain that list of representations.  As Choices
> are resources, it could be maintained outside of the annotation and
> dereferenced when used.  There's clearly problems, but it avoids the FRBR
> issue [1] of trying to guess exactly what the annotator is trying to
> comment on -- the file they currently see, all the way up to the concept of
> the intellectual content that the file conveys.
>
> [1] If you don't know what FRBR is, I encourage you to remain ignorant and
> not waste valuable time and braincells ;)
>
> ## Scope
>
> We're definitely not going to solve it perfectly... but should we try to
> solve it at all?  Content negotiation is an architectural option, and
> annotating the generic URI plus negotiation for the representation would
> fix a lot of the issues.  It's just that we don't have or know the URIs
> that do this, when all the browser sees is the representation's URI.  See
> the webarch topic above :)
>
> ## DOIs and Fragments
>
> As Bill knows, I'm ... less of a fan of DOIs than others in the scholarly
> publishing sector.  One thing to note is that once you hit any redirecting
> URI, such as a DOI, the use of fragments to identify segments of the
> resource goes out the window.  That fragment will be at best lost, and at
> worst end up referring to something completely unexpected when the
> publisher sends you to an HTML splash page, rather than the PDF that was
> originally annotated.  The best we can hope for is to include as much
> description about the target resource as we can, and hope that the client
> can do something sensible with it.
>
>
> Rob
>
>
>
> On Fri, Dec 5, 2014 at 11:23 AM, Frederick Hirsch <w3c@fjhirsch.com>
> wrote:
>
>> Thanks to Paolo, Nick for clarifying this.
>>
>> It seems we can simplify by assuming server side intelligence where
>> needed in conjunction with identifier standardization done elsewhere.
>>
>> Not sure of the downside of this approach.
>>
>> regards, frederick
>>
>> Frederick Hirsch
>> @fjhirsch
>>
>> On Dec 5, 2014, at 4:27 AM, Nick Stenning <nick@whiteink.com> wrote:
>>
>> > On Thu, Dec 4, 2014, at 19:39, Bill Kasdorf wrote:
>> >> Isn't it a problem, though, that the DOI _identifies_ the document but
>> it
>> >> doesn't necessarily _locate_, or link _to_ the document?
>> >
>> > I'm not sure it is a problem, because as Paulo has already mentioned in
>> > a new thread on this subject, these problems can be addressed on the
>> > server, behind an API.
>> >
>> >
>> > ### Situation 1: retrieve all annotations for current page
>> >
>> > I am on a web page, say "http://jimwatson.com/papers/dna.html". I want
>> > to retrieve all annotations for that web page. I can, with a dumb
>> > client, make a call to a search API, providing only the page URL. The
>> > server can then, in principle:
>> >
>> > 1a) look up the URL in an internal cache mapping URLs to identifiers,
>> > OR, in the event of a cache miss
>> > 1b) fetch the URL, and scan it for metadata such as the
>> > previously-mentioned "dc.identifier" meta tags
>> > 2) as a result of 1a) or 1b), resolve the URL to a set of URLs:
>> >
>> >        {canonical identifiers for the document} ∪ {URLs for the current
>> >        representation}
>> >
>> > 3) return all search results for that broader set
>> >
>> > This is what Paolo has referred to as "Target extension" in
>> >
>> >
>> http://lists.w3.org/Archives/Public/public-annotation/2014Dec/0021.html
>> >
>> >
>> > ### Situation 2: retrieve all pages for current annotation
>> >
>> > I am starting with an annotation (perhaps previously retrieved from
>> > storage) and I want to find all pages which it annotates. The question
>> > that sits at the core of this discussion is:
>> >
>> >    "Should all the information I need be contained within the
>> >    annotation itself,
>> >     or can I rely on the use of a supporting API to help me?"
>> >
>> > My feeling is that answering that the annotation should be
>> > self-contained results in a horrendously complicated wire format that
>> > almost no client implementations will know how to support.
>> >
>> > By contrast, answering that you can rely on a (perhaps domain-specific)
>> > storage API, which knows how to resolve DOIs and other canonicalised
>> > identifiers into repr URLs and vice versa, allows for:
>> >
>> > - relatively simple clients
>> > - lower network overhead
>> > - vastly increased flexibility in mapping URIs -> URLs and vice versa
>> >
>> > To expand on the last point. If we put this mapping in the data model,
>> > we are limited to the concepts that can reasonably be expressed in a
>> > data format we are expecting people to parse.
>> >
>> > If we allow this mapping to be encoded in a program that runs behind an
>> > API, we have the full power of any programming language and any
>> > necessary domain assumptions to help us.
>> >
>> > -N
>> >
>>
>>
>>
>
>
> --
> Rob Sanderson
> Technology Collaboration Facilitator
> Digital Library Systems and Services
> Stanford, CA 94305
>
Received on Thursday, 11 December 2014 20:09:31 UTC