- From: Benjamin Young <bigbluehat@hypothes.is>
- Date: Thu, 11 Dec 2014 15:09:01 -0500
- To: Robert Sanderson <azaroth42@gmail.com>
- Cc: W3C Public Annotation List <public-annotation@w3.org>, Frederick Hirsch <w3c@fjhirsch.com>, Nick Stenning <nick@whiteink.com>
- Message-ID: <CAE3H5FJgGw3JrXgHLCPRbVWp0YAZuSbj-Px9NyESr0R+vuhMng@mail.gmail.com>
"The best we can hope for is to include as much description about the target resource as we can, and hope that the client can do something sensible with it." I couldn't agree more. Thanks for the summary, Rob, Benjamin On Dec 5, 2014 2:51 PM, "Robert Sanderson" <azaroth42@gmail.com> wrote: > > All, > > Apologies for only catching up with this thread now due to travel. > > Some things to note, all from my non-chairy perspective. > > ## EPUB and IDPF > > The EPUB world has this same requirement. The solution that was arrived > at for the use of Open Annotation in that space was: > http://www.idpf.org/epub/oa/#h.hnfijet1uk3j > > Notably, the recommendation is to give as much metadata as possible to > allow client systems (possibly working against an offline library of > content) the best opportunity to discover appropriate works. If there is a > known, unique URI, then that's great but there are many situations when > that won't be available, even if the current representation is online. > > ## Web Architecture > > In the web architecture we have resources with identity that can provide > representations. As a specification building best practice, we should > strive to follow that architecture. Thus if we have an annotation about a > Work, then that Work should be a resource which is identified by a URI. > The problem is that we rarely have that URI, and even if the publishing > system knows it, it has no recommended way to convey it when providing the > representation. This isn't an issue unique to annotation, of course, but > is something for which we could consider providing guidance in the > annotation space -- a method for a resource to ask a client to instead of > annotating the URI in the browser, to please use this supplied URI instead. > > ## Choice as a Workaround > > When presented with multiple URIs that convey the same information, a > Choice could be used to maintain that list of representations. As Choices > are resources, it could be maintained outside of the annotation and > dereferenced when used. There's clearly problems, but it avoids the FRBR > issue [1] of trying to guess exactly what the annotator is trying to > comment on -- the file they currently see, all the way up to the concept of > the intellectual content that the file conveys. > > [1] If you don't know what FRBR is, I encourage you to remain ignorant and > not waste valuable time and braincells ;) > > ## Scope > > We're definitely not going to solve it perfectly... but should we try to > solve it at all? Content negotiation is an architectural option, and > annotating the generic URI plus negotiation for the representation would > fix a lot of the issues. It's just that we don't have or know the URIs > that do this, when all the browser sees is the representation's URI. See > the webarch topic above :) > > ## DOIs and Fragments > > As Bill knows, I'm ... less of a fan of DOIs than others in the scholarly > publishing sector. One thing to note is that once you hit any redirecting > URI, such as a DOI, the use of fragments to identify segments of the > resource goes out the window. That fragment will be at best lost, and at > worst end up referring to something completely unexpected when the > publisher sends you to an HTML splash page, rather than the PDF that was > originally annotated. The best we can hope for is to include as much > description about the target resource as we can, and hope that the client > can do something sensible with it. > > > Rob > > > > On Fri, Dec 5, 2014 at 11:23 AM, Frederick Hirsch <w3c@fjhirsch.com> > wrote: > >> Thanks to Paolo, Nick for clarifying this. >> >> It seems we can simplify by assuming server side intelligence where >> needed in conjunction with identifier standardization done elsewhere. >> >> Not sure of the downside of this approach. >> >> regards, frederick >> >> Frederick Hirsch >> @fjhirsch >> >> On Dec 5, 2014, at 4:27 AM, Nick Stenning <nick@whiteink.com> wrote: >> >> > On Thu, Dec 4, 2014, at 19:39, Bill Kasdorf wrote: >> >> Isn't it a problem, though, that the DOI _identifies_ the document but >> it >> >> doesn't necessarily _locate_, or link _to_ the document? >> > >> > I'm not sure it is a problem, because as Paulo has already mentioned in >> > a new thread on this subject, these problems can be addressed on the >> > server, behind an API. >> > >> > >> > ### Situation 1: retrieve all annotations for current page >> > >> > I am on a web page, say "http://jimwatson.com/papers/dna.html". I want >> > to retrieve all annotations for that web page. I can, with a dumb >> > client, make a call to a search API, providing only the page URL. The >> > server can then, in principle: >> > >> > 1a) look up the URL in an internal cache mapping URLs to identifiers, >> > OR, in the event of a cache miss >> > 1b) fetch the URL, and scan it for metadata such as the >> > previously-mentioned "dc.identifier" meta tags >> > 2) as a result of 1a) or 1b), resolve the URL to a set of URLs: >> > >> > {canonical identifiers for the document} ∪ {URLs for the current >> > representation} >> > >> > 3) return all search results for that broader set >> > >> > This is what Paolo has referred to as "Target extension" in >> > >> > >> http://lists.w3.org/Archives/Public/public-annotation/2014Dec/0021.html >> > >> > >> > ### Situation 2: retrieve all pages for current annotation >> > >> > I am starting with an annotation (perhaps previously retrieved from >> > storage) and I want to find all pages which it annotates. The question >> > that sits at the core of this discussion is: >> > >> > "Should all the information I need be contained within the >> > annotation itself, >> > or can I rely on the use of a supporting API to help me?" >> > >> > My feeling is that answering that the annotation should be >> > self-contained results in a horrendously complicated wire format that >> > almost no client implementations will know how to support. >> > >> > By contrast, answering that you can rely on a (perhaps domain-specific) >> > storage API, which knows how to resolve DOIs and other canonicalised >> > identifiers into repr URLs and vice versa, allows for: >> > >> > - relatively simple clients >> > - lower network overhead >> > - vastly increased flexibility in mapping URIs -> URLs and vice versa >> > >> > To expand on the last point. If we put this mapping in the data model, >> > we are limited to the concepts that can reasonably be expressed in a >> > data format we are expecting people to parse. >> > >> > If we allow this mapping to be encoded in a program that runs behind an >> > API, we have the full power of any programming language and any >> > necessary domain assumptions to help us. >> > >> > -N >> > >> >> >> > > > -- > Rob Sanderson > Technology Collaboration Facilitator > Digital Library Systems and Services > Stanford, CA 94305 >
Received on Thursday, 11 December 2014 20:09:31 UTC