Re: Commentary on the "protocol" draft from Robert Sanderson on 2015-02-08 (public-annotation@w3.org from February 2015)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Sun, 8 Feb 2015 10:37:09 -0800
To: Ivan Herman <ivan@w3.org>
Cc: W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CABevsUEkR7v_5s6a0D7ES+9U1zRVeyoVbgin0m8mPwD7Ks2S_w@mail.gmail.com>
Hi Ivan,

>From my perspective, the things we would have done differently come from
the choice of technology stack and internal architecture, rather than LDP.
The LDP specification just puts in writing the best practices around REST,
with the addition of formalizing containers to manage and discover
resources, paging of large resources, and patching of RDF descriptions.

The Triannon architecture uses a graph internally, in order to work with a
Ruby gem called ActiveTriples (like ActiveModel but for RDF). This added
quite a lot of overhead, compared to treating the JSON serialization as the
internal model and transforming to Turtle (or other) on the way out.  There
are reasons why we did that, but the choice did make the implementation
harder, especially for an RDF newbie.

The overheads of following the standards come from the web architecture
rather than LDP -- as the nodes in the graph (anno, body, target, specific
resources, agents, etc) are resources, they should have their own URIs and
be separately dereferencable. Meaning that some agent needs to do that
dereferencing to stitch back together the entire "annotation" graph for the
client to work with.  Either all the clients can do that or some middleware
component can, which is why we implemented Triannon. The resources are all
still available by themselves, Triannon just saves the client some HTTP
interactions.  The same will be true at a much larger magnitude with search
results... getting back just a list of URIs rather than the full
annotations would be very expensive in terms of number of transactions
between client and server.

Basically, we made it much harder on ourselves than we needed to if all we
wanted was a simple annotation storage platform. But that's not all we
wanted :)

However, there are advantages to the approach, which become evident when
more complex annotations are used.  For example, instead of recreating
specific resources, they can be referenced by their URI.  This solves the
problem that Jacob raised about commentary threads that are in response to
an annotation AND about the target of the original annotation, for
example.  It makes updating much easier as you never need to worry about
blank nodes, instead you PUT or PATCH the body's newly minted URI and don't
need to worry about the rest of the annotation.  This will let us use
JSON-Patch (which would fail with blank nodes) as an alternative format to
LD-Patch.

Hope that helps :)

Rob











On Sun, Feb 8, 2015 at 12:28 AM, Ivan Herman <ivan@w3.org> wrote:

> Good to know this Robert, thanks!
>
> There is an obvious question, then: if LDP wouldn't have been around, what
> would you have done differently? Where did LDP get in the way, and where
> did it help?
>
> Ivan
>
> > On 07 Feb 2015, at 20:21 , Robert Sanderson <azaroth42@gmail.com> wrote:
> >
> >
> > Hi Ivan, all,
> >
> > On Fri, Feb 6, 2015 at 3:29 AM, Ivan Herman <ivan@w3.org> wrote:
> > just while reading your reply, one thought... In many respect, these
> types of issues become clearer if one does some sort of a proof-of-concepts
> mini-implementation.
> >
> > I think that would be great if someone has the capacity to try it out!
> >
> > Do you think it is realistic to
> > - take an existing LDP implementation (if there isn't any that would
> really work, we already have a problem!)
> >
> > So for full disclosure...
> >
> > We have done something similar to the proposed protocol at Stanford
> using Fedora4 as the LDP implementation, and the proposal takes into
> account the feedback from that work.  Before going further, this was the
> first time the developer had done anything with RDF, let alone LDP, and was
> only somewhat familiar with Rails.  So the challenges that she worked
> through might have been avoided with a different LDP back end, or sticking
> strictly to JSON rather than also working with the graph natively.
> >
> > The biggest issue we ran into is generating the combined representation
> from multiple nodes in the graph at request time.  This is because the LDP
> implementation takes a very strict view that if you ask for the URI of the
> annotation, then all you get back is the information about that resource
> with only links to any related resources ... like the body and target.
> This isn't ideal as the client needs all of the information at once, and
> having the client make one request rather than several is obviously better.
> The LDP implementation also skolemizes all blank nodes, so even if the
> annotation is submitted with a blank node for an embedded body, it will get
> turned into a resource with a real URI.  All of which are legitimate
> strategies with LDP, and likely even best practice in an atomistic,
> resource-centric world, but they make the client developer's life less easy.
> >
> > Other issues we ran into were how to request different json-ld contexts
> for the same annotation, and how to associate metadata with resources not
> under the LDP server's control, such as managing the assertion that
> cnn.com/index.html dc:format 'text/html'.
> >
> > Our solution was to build a piece of middleware (Triannon) that manages
> the interactions with Fedora4 on behalf of a simpler client. On our roadmap
> is to swap out the Fedora4 backend to see if the middleware either still
> works with a different backend (hooray interoperability) or which aspects
> become unnecessary due to different choices that the other implementation
> made.
> >
> > For reference, if anyone wants to look at it, with the understanding
> that Triannon is still under active development:
> >  Triannon:  https://github.com/sul-dlss/triannon
> >  Fedora4: https://github.com/fcrepo4/fcrepo4 ; docs:
> https://wiki.duraspace.org/display/FEDORA40/Quick+Start
> >
> > All that said ... I believe that the current protocol spec could be
> implemented from scratch to a usable, if not production worthy, quality in
> a day or so by someone using tools and a language they're fluent with.  I
> may get a chance to put my coding where my mouth is, but there are many
> others on the list much better positioned and qualified than me!
> >
> >
> > - one of our implementers would try to make some sort of a mock-up to
> see if the basic mechanism can be set up easily
> > "Easily" is an operative word. If it turns out that setting up such a
> proof-of-concept would be difficult the we really may have a problem...
> >
> > From my experience, I expect that the server will be harder to implement
> than the client, and there will be significantly fewer server
> implementations than client implementations.  Which is to say that if we
> optimize ease of server development we may be making the client development
> harder... which is not the right way round, in my opinion.  We should think
> carefully about the tradeoffs between client development, server
> development and re-use of existing standards.
> >
> >
> > Just a thought...
> >
> > And a very good thought :)  Rough consensus and working code leads to
> the best standards.
> >
> > Rob
> >
> >
> >
> > Ivan
> >
> >
> > > On 05 Feb 2015, at 16:31 , Robert Sanderson <azaroth42@gmail.com>
> wrote:
> > >
> > >
> > > Thanks Nick!  Responses inline below.
> > >
> > > On Wed, Feb 4, 2015 at 9:45 AM, Nick Stenning <nick@whiteink.com>
> wrote:
> > > There are a couple of relatively minor questions I have, such as:
> > > - When you say "the REST best practice guidelines" -- are you
> > > referring to a specific document?
> > >
> > > No, so that's probably over-worded.  I just mean REST in general,
> rather than some arbitrary pattern.
> > > Will change the wording there.
> > >
> > >
> > > - You say "interactions are designed to take place over HTTP, but the
> > > design should not prevent implementations that carry the transactions
> > > over other protocols". Perhaps I've misunderstood what you mean, but
> > > this seems to be a design constraint that is at odds with REST. How can
> > > a protocol which specifies verbs and status codes be implementable out
> > > of an HTTP context? My inclination would be to remove this altogether
> > > and be clear that we are designing an HTTP protocol for annotation.
> > >
> > > +1  The principle came from a requirement I've heard in the past that
> interactions should be able to take place over a persistent stream like
> websockets or similar.  Agree, now that I've actually put some words down,
> that it's not feasible at the same time as using LDP without some abstract
> API spec that gets instantiated multiple ways. Which seems like overkill.
> > > If there's interest in stream based protocols, I think we could have a
> second protocol document down the line that's hopefully just how to
> translate into a message.
> > >
> > >
> > > - As specified, a container representation MUST contain a list of
> > > contained annotations. This seems increasingly impractical as a
> > > container grows in size, and doesn't seem to admit much subtlety around
> > > authorisation and access controls. Is this part of LDP? If so perhaps
> > > the container model doesn't fit here -- would a conforming
> > > implementation on top of the Hypothes.is data store have to include
> tens
> > > of thousands of annotation URIs in the top-level resource body?
> > >
> > > Yes, it is a requirement we've inherited.  There's a few mitigating
> factors:
> > >
> > > * There's the notion of paging in LDP:
> http://www.w3.org/TR/ldp-paging/
> > >    This would let the response be split up into chunks as needed.
> > >
> > > * Only the URI needs to be transferred, not the entire annotation. The
> URI can be relative, so the JSON-LD in example 2 (
> http://w3c.github.io/web-annotation/protocol/wd/#containers-for-annotations)
> with 10,000 annoX strings is what the response would look like.
> > >
> > > *  The requirement comes from discovery, as LDP doesn't have any
> search capabilities inherently, so the list of contained resources is to
> let a harvesting follow-your-nose approach work.  Other than for discovery
> of annotation URIs, I'm not sure why you'd ever retrieve the container
> description.
> > >
> > > * There could be many such containers to divide up the list, each with
> their own authorization/authentication controls.  Auth isn't currently
> specified in LDP but is on the roadmap for future work.
> > >
> > > Overall I agree it's not ideal, but I don't think it's a significant
> barrier to entry either?
> > >
> > >
> > > I also have one overall concern, which is that the design you've
> > > proposed seems like a compromise between two rather different groups of
> > > users, which by virtue of being a compromise doesn't really satisfy
> > > either of them.
> > >
> > > My postulate is that there are two broad classes of people who need
> > > annotation protocols:
> > > (1) Bulk data stores and archival services
> > > (2) User-facing clients
> > >
> > >
> > > Agreed. I wanted to map directly into LDP with as few additional
> requirements / constraints as possible in the first cut at this.  The
> result is something that seems like it should work but isn't optimized for
> any particular scenario. (premature optimization and all that)
> > >
> > >
> > > Bulk clients [...] will be stymied by:
> > > - requirement to list all annotations in a container
> > >
> > > Not sure that the client would ever need to retrieve the container
> description.
> > >
> > > - no bulk retrieval or submission
> > >
> > > Agreed. Bulk retrieval seems like a search operation, bulk submission
> OTOH is harder.
> > >
> > >  - no easy way to retrieve updates since $TIMESTAMP (q.v. SLEEP [1])
> > >
> > > Harvesting by time also seems like search.
> > >
> > >
> > > User-facing clients [...] won't like:
> > > - no normative specification of how to search for annotations relevant
> to the current page
> > >
> > > Agreed.
> > >
> > > - LD Patch
> > > - distinction between PUT and PATCH
> > >
> > > Yep. Theres also the JSON PATCH format (RFC 6902:
> https://tools.ietf.org/html/rfc6902) that we could adopt as we can
> mandate the structure of the JSON via the JSON-LD context and serialization
> spec.  OTOH, patching is just a hard problem, particularly so with graphs,
> and doubled by the existence of blank nodes where the id isn't consistent.
> > >
> > >
> > > - no guidance on error handling
> > >
> > > +1, to be discussed.
> > >
> > > - conneg
> > >
> > > I'm not a big fan of content negotiation either, TBH, but it's part of
> the web architecture.  This is one I think we can plaster over a little by
> setting the defaults sensibly so that conneg doesn't really need to be done
> for basic usage by browser based clients.
> > >
> > >
> > > Overall, I think it might worth thinking about whether splitting these
> > > two use cases would allow us to focus on:
> > > - interoperability and bulk data flows for group (1)
> > > - simple data formats, low implementation overhead and
> document-oriented
> > > workflows for group (2)
> > >
> > > I think that would be very valuable as a way to make progress.
> There'll need to be some compromises between them as we don't want two
> totally separate protocols, but finding the right ones to make seems like
> our task in this WG :)
> > >
> > > Rob
> > >
> > > --
> > > Rob Sanderson
> > > Information Standards Advocate
> > > Digital Library Systems and Services
> > > Stanford, CA 94305
> >
> >
> > ----
> > Ivan Herman, W3C
> > Digital Publishing Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > ORCID ID: http://orcid.org/0000-0003-0782-2704
> >
> >
> >
> >
> >
> >
> >
> > --
> > Rob Sanderson
> > Information Standards Advocate
> > Digital Library Systems and Services
> > Stanford, CA 94305
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>


-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305
Received on Sunday, 8 February 2015 18:37:38 UTC