Re: ldp wishlist for crosscloud from Martynas Jusevičius on 2014-11-09 (public-rww@w3.org from November 2014)

From: Martynas Jusevičius <martynas@graphity.org>
Date: Sun, 9 Nov 2014 19:14:25 +0100
To: Melvin Carvalho <melvincarvalho@gmail.com>
Cc: public-rww <public-rww@w3.org>, Sandro Hawke <sandro@w3.org>
Message-ID: <CAE35VmxaJvwc2SKDQCDBPPZuEz_gc_ry54mVSwKevhCTzz52Mw@mail.gmail.com>
Better link #1:
https://github.com/Graphity/graphity-client/wiki/Reinventing-Web-applications#linked-data-platform

On Sun, Nov 9, 2014 at 7:13 PM, Martynas Jusevičius
<martynas@graphity.org> wrote:
> Hey all,
>
> I will not waste my time convincing people LDP was a wrong design
> choice, but I'm certain we have quite a few of Sandro's points covered
> with our declarative Linked Data approach:
> https://github.com/Graphity/graphity-client/wiki/Reinventing-Web-applications#templates
> https://github.com/Graphity/graphity-client/wiki/How-Graphity-works
>
>
> Martynas
> graphityhq.com
>
>
>
> On Sun, Nov 9, 2014 at 6:56 PM, Melvin Carvalho
> <melvincarvalho@gmail.com> wrote:
>> FYI: Sandro just posted an excellent, detailed, wish list to the LDP WG.
>> Perhaps 75%+ could be related to RWW standards.
>>
>> Would love to see any, or all of these points, move closer to
>> standardization...
>>
>> ---------- Forwarded message ----------
>> From: Sandro Hawke <sandro@w3.org>
>> Date: 9 November 2014 18:06
>> Subject: ldp wishlist for crosscloud
>> To: Linked Data Platform WG <public-ldp-wg@w3.org>
>>
>>
>> As you may know, these days most of my time is no longer W3C-staff but is
>> funded research toward building "Crosscloud", an architecture for software
>> which allows users to control their data and which should encourage
>> innovation by making it much easier to develop powerful, open multi-user
>> (social) software.
>>
>> Back in January, we started off building on LDP, with Andrei creating
>> cimba.co.  It's a microblogging app, intending to replicate some of early
>> Twitter, in a completely decentralized way using generic
>> (non-application-specific) LDP.    To make that work, we had to extend LDP
>> with Access Control; clients can tell the server who can do what with each
>> research.   We also made no use of Direct Containers, Indirect Containers,
>> or Paging.   It's just Basic Containers, Access Control, WebID-TLS for
>> client authentication, Turtle for data, and non-RDF resources for photos.
>> (Maybe I'm forgetting some details; for demo see 2 minute video at [1].)
>>
>> While cimba basically works, it's painful in various ways and unable to do
>> many things, showing us that we need much more support from the servers.
>> We've also started building several more apps which are showing other things
>> that are important to have.
>>
>> We don't have it all figured out yet, let alone implemented, but here are a
>> few of the thing we probably need.   I'm providing this list to help with
>> re-chartering, although most of these are not yet mature enough for
>> standardization.   Maybe they will be in 6-12 months, though.    As you look
>> at this list, one thing to figure out is how will we know when this module
>> is ready for the WG to take up.
>>
>> == 1.  Queries
>>
>> This is a big one.   It's impractical to have the cimba WebApp, running in
>> the browser, do all the GETS (hundreds, at least) every time it starts.  It
>> needs to do a small number of queries, and have the server manage all the
>> aggregation.   The server has to be able to query across other servers as
>> well as itself.
>>
>> We're currently playing with forms of "Link-Following SPARQL", but also with
>> a more restricted MongoDB-like query language, both for easier
>> implementation and for response-time/load guarantees.
>>
>> Queries make resource-paging obsolete, which is why I've lost interest in
>> paging.
>>
>> == 2.  Change Notification to Web Servers
>>
>> If a server acting on behalf of the end-user is going to aggregate data from
>> other servers, it needs to be able to keep its copy in sync. Traditional web
>> cache + polling works only when it's okay to be seconds or minutes out of
>> date; many multi-user apps require much more responsiveness than that, so we
>> see a need for one server to be able to subscribe to change notification
>> from another.
>>
>> One might want something like PATCH to make this more efficient, but at the
>> moment it looks like we can keep the resources small enough that it doesn't
>> matter.
>>
>> == 3.  Change Notification to Web Clients
>>
>> Similarly, Web Apps often need to know immediately when data has changed.
>> While it might be nice to have this be the same protocol as (2), our
>> preliminary investigation suggests the engineering trade-offs make that
>> impractical.   So, this needs to be its own protocol. Probably it's just a
>> tweak to the query protocol where query results, rather than being a single
>> response collecting all the results, are ongoing add-result and
>> remove-result events.
>>
>> == 4.  Operation over WebSockets
>>
>> It almost certainly makes sense to use WebSockets for (3), but it also makes
>> sense to use them for all the current LDP operations for high performance.
>> A modest client and server can probably process at least 1000 GETs per
>> second, but in practice, without WebSockets, they'll be slowed an order of
>> magnitude because of round trip delays.    That is, say RTT is 50ms, so we
>> can do 20 round trips per second.    Most browsers allow at most 6
>> connections per hostname [2], so that's 120 round trips per second, max, no
>> matter how much CPU and RAM and bandwidth you have.
>>
>> I'm still thinking about what this might look like.     Strawman is
>> something like each client-to-server message is a JSON object like { "verb":
>> "GET", "resource":"http://example.org", "accept": "text/html", "seq":7 } and
>> response are like { "in-reponse-to": 7, "status": 200, "contentType":
>> "text/html", "content": "<html>......</html>" }
>>
>> So the higher levels don't have to know it's not normal HTTP, *but* we can
>> have hundreds or thousands of requests pipelined.     Also, we can have
>> multiple responses, or something, for event notification.   This would also
>> allow for more transactional operation, if desired.   (Maybe
>> "partial-response-to" and "final-response-to".)
>>
>> == 5.  Non-Listing Containers
>>
>> I want end-points that I can POST to, and GET some information about,
>> without being swamped by an enumeration of everything posted there.   I
>> don't want to have to include a Prefer header to avoid that swamping.
>>
>> You might consider this a taste, but I think it's an important usability
>> issue.
>>
>> Again, with querying, you probably don't want to just be dumping the list of
>> contained resources.   Querying also lets us control inlining, etc.
>> Basically, if querying is available, I think we can skip serializing
>> membership/containment triples.
>>
>> == 6.  PUT-to-Create
>>
>> There are situations where the client needs to lay out, on the server, an
>> assortment of resources with carefully controlled URLs, such as a static
>> website with interlinked html, css, js, images, etc.    This should be
>> doable with PUT, where PUT creates the resource inside the container that
>> owns that URL space.
>>
>> == 7.  DELETE WHERE
>>
>> One of our current demo apps is a game that is likely to generate a dozen
>> resources per second per user.   Asking for each of those resources to be
>> individually deleted afterwards seems rather silly, even problematic, so a
>> DELETE WHERE operation would be nice.
>>
>> Yes, one could put them all in a container in this case, and define it as a
>> kind of container that deletes its contained resources when it's deleted,,
>> but there are situations where that wont work as well.  Maybe we want to
>> delete the resources after about 60 seconds have gone by, for example.
>> Easy to do with a DELETE WHERE, hard to do otherwise.
>>
>> ==  8.  WebMention for Data, backlinks used in Queries
>>
>> The basics of WebMention are in-scope for the Social Web WG, but it's not
>> clear they'll apply it to arbitrary raw data, or say how the back-links are
>> made available for use in queries.   Like many of these, this might be joint
>> work with SWWG.
>>
>> ==  9.  Client Authentication
>>
>> Arguable this is quite out of scope, and yet it's hard to operate without
>> it.   Especially things like (2) are easier with some kind of
>> authentication.
>>
>> For a strawman of how easy it could be:
>> https://github.com/sandhawke/spot/blob/master/spec.md
>>
>> == 10.  Access Control
>>
>> Obviously.
>>
>> My current radical theory is I only need is a flag that a page is
>> owner-only, public, or group-read, and then a way to define the group of
>> identities (see (9)) who can read it.    Most people imagine we need to
>> control a lot more than read access, and perhaps we do, but I'm currently
>> working with the theory that everyone makes their own contributions in their
>> own space, notifying but never actually "writing" to anyone else's.
>>
>> == 11.  Combined Metadata and Content operations
>>
>> I don't think I can put this very crisply, but I've started thinking about
>> resources as looking like this:
>>
>> { property1: value1,
>>    property2: value2,
>>    ...
>>    content: "<html>....</html>",
>>    contentType: "text/html"
>>    ...
>> }
>>
>> and it's so much nicer.   Basically, every resource is properties-value
>> pairs, and some of that pv data is "content".    If you don't do something
>> like this, queries and notifications and all that require us to bifurcate
>> into a mechanism that's all about the content and another that's all about
>> the metadata.
>>
>> LDP-RS's then become content-free resources, or null-content resources, but
>> much less fundamentally different.   With the current LDP framing, what
>> happens when you PUT an image to an LDP-RS or PUT rdf to what you created as
>> an image?   This model clears that up nicely.
>>
>> But this might only work in the face of other assumptions I'm making, like
>> the only triples at <R> are in a graph rooted at <R>, so you can think of
>> them all as properties of R.    Also I've resolved httpRange-14 by saying
>> I'm only interested in proper information-resource-denoting URLs, and you
>> can use indirect properties for talking about people, places, events, etc.
>> Maybe those radical assumptions are necessary for making this work.
>>
>> 12.  Forwarding
>>
>> We need to be able to move resources, because it's very hard to pick a URL
>> and stick to it for decades.   And if it's used as part of other apps, and
>> you don't stick to it, you'll break them.   The fear of this will, I
>> suspect, significantly impede adoption.
>>
>> I propose three mechanisms.   Any one of them might work; between the three
>> I'm fairly confident.
>>
>> 1.  Servers SHOULD check all their outgoing links at least once every 30
>> days.   If they get a 301 response, they SHOULD update the link in place.
>> Valid reason not to change it is this is some kind of a frozen/static page
>> that can't be changed.
>>
>> 2.  When a client gets a 301, following a link it got from server A, it
>> should notify server A, so A can rewrite the link sooner.   This could use a
>> .well-known end-point on A, or there could be a Report-Link-Issues-To header
>> on every resource which A serves telling clients how to report any 301s (and
>> 404s) it finds.
>>
>> 3.  The notification mechanism (2) above, should include move notifications,
>> so when a page is being watched, if it moves the watcher will be immediately
>> notified and able to change its link.
>>
>> All this works much better if in addition to 301 we have a way to say a
>> whole tree has moved.    That is, all URLs starting http://foo.example/x/
>> should not be considered redirected to http://bar.example/y/, etc.
>>
>> With these mechanisms in place, links from compliant servers should start to
>> transition quickly and drop off to zero after 30 days. Obviously links from
>> hand-maintained resources, and printed on paper, etc, wont change, but those
>> are usually consumed by humans who are better able to deal with a broken
>> link anyway.
>>
>> == More...
>>
>> I'm sure there's more, but this gives the general shape of things. Do we
>> want the new charter to target some of these?   To allow for some of these?
>> And again: how do we assess when each of these is mature enough for a WG to
>> begin looking at it?
>>
>> Thanks for considering this.
>>
>>       -- Sandro
>>
>>
>> [1] https://www.youtube.com/watch?v=z0_XaJ97rF0
>> [2] http://www.browserscope.org/?category=network&v=top
>>
>>
Received on Sunday, 9 November 2014 18:14:53 UTC