Re: ldp wishlist for crosscloud from Martynas Jusevičius on 2014-11-09 (public-rww@w3.org from November 2014)

From: Martynas Jusevičius <martynas@graphity.org>
Date: Sun, 9 Nov 2014 19:13:25 +0100
To: Melvin Carvalho <melvincarvalho@gmail.com>
Cc: public-rww <public-rww@w3.org>, Sandro Hawke <sandro@w3.org>
Message-ID: <CAE35Vmxam10qTNRrgfbAN6fcGrDYL=FNJpUtby7_b1OcFSamvQ@mail.gmail.com>
Hey all,

I will not waste my time convincing people LDP was a wrong design
choice, but I'm certain we have quite a few of Sandro's points covered
with our declarative Linked Data approach:
https://github.com/Graphity/graphity-client/wiki/Reinventing-Web-applications#templates
https://github.com/Graphity/graphity-client/wiki/How-Graphity-works


Martynas
graphityhq.com



On Sun, Nov 9, 2014 at 6:56 PM, Melvin Carvalho
<melvincarvalho@gmail.com> wrote:
> FYI: Sandro just posted an excellent, detailed, wish list to the LDP WG.
> Perhaps 75%+ could be related to RWW standards.
>
> Would love to see any, or all of these points, move closer to
> standardization...
>
> ---------- Forwarded message ----------
> From: Sandro Hawke <sandro@w3.org>
> Date: 9 November 2014 18:06
> Subject: ldp wishlist for crosscloud
> To: Linked Data Platform WG <public-ldp-wg@w3.org>
>
>
> As you may know, these days most of my time is no longer W3C-staff but is
> funded research toward building "Crosscloud", an architecture for software
> which allows users to control their data and which should encourage
> innovation by making it much easier to develop powerful, open multi-user
> (social) software.
>
> Back in January, we started off building on LDP, with Andrei creating
> cimba.co.  It's a microblogging app, intending to replicate some of early
> Twitter, in a completely decentralized way using generic
> (non-application-specific) LDP.    To make that work, we had to extend LDP
> with Access Control; clients can tell the server who can do what with each
> research.   We also made no use of Direct Containers, Indirect Containers,
> or Paging.   It's just Basic Containers, Access Control, WebID-TLS for
> client authentication, Turtle for data, and non-RDF resources for photos.
> (Maybe I'm forgetting some details; for demo see 2 minute video at [1].)
>
> While cimba basically works, it's painful in various ways and unable to do
> many things, showing us that we need much more support from the servers.
> We've also started building several more apps which are showing other things
> that are important to have.
>
> We don't have it all figured out yet, let alone implemented, but here are a
> few of the thing we probably need.   I'm providing this list to help with
> re-chartering, although most of these are not yet mature enough for
> standardization.   Maybe they will be in 6-12 months, though.    As you look
> at this list, one thing to figure out is how will we know when this module
> is ready for the WG to take up.
>
> == 1.  Queries
>
> This is a big one.   It's impractical to have the cimba WebApp, running in
> the browser, do all the GETS (hundreds, at least) every time it starts.  It
> needs to do a small number of queries, and have the server manage all the
> aggregation.   The server has to be able to query across other servers as
> well as itself.
>
> We're currently playing with forms of "Link-Following SPARQL", but also with
> a more restricted MongoDB-like query language, both for easier
> implementation and for response-time/load guarantees.
>
> Queries make resource-paging obsolete, which is why I've lost interest in
> paging.
>
> == 2.  Change Notification to Web Servers
>
> If a server acting on behalf of the end-user is going to aggregate data from
> other servers, it needs to be able to keep its copy in sync. Traditional web
> cache + polling works only when it's okay to be seconds or minutes out of
> date; many multi-user apps require much more responsiveness than that, so we
> see a need for one server to be able to subscribe to change notification
> from another.
>
> One might want something like PATCH to make this more efficient, but at the
> moment it looks like we can keep the resources small enough that it doesn't
> matter.
>
> == 3.  Change Notification to Web Clients
>
> Similarly, Web Apps often need to know immediately when data has changed.
> While it might be nice to have this be the same protocol as (2), our
> preliminary investigation suggests the engineering trade-offs make that
> impractical.   So, this needs to be its own protocol. Probably it's just a
> tweak to the query protocol where query results, rather than being a single
> response collecting all the results, are ongoing add-result and
> remove-result events.
>
> == 4.  Operation over WebSockets
>
> It almost certainly makes sense to use WebSockets for (3), but it also makes
> sense to use them for all the current LDP operations for high performance.
> A modest client and server can probably process at least 1000 GETs per
> second, but in practice, without WebSockets, they'll be slowed an order of
> magnitude because of round trip delays.    That is, say RTT is 50ms, so we
> can do 20 round trips per second.    Most browsers allow at most 6
> connections per hostname [2], so that's 120 round trips per second, max, no
> matter how much CPU and RAM and bandwidth you have.
>
> I'm still thinking about what this might look like.     Strawman is
> something like each client-to-server message is a JSON object like { "verb":
> "GET", "resource":"http://example.org", "accept": "text/html", "seq":7 } and
> response are like { "in-reponse-to": 7, "status": 200, "contentType":
> "text/html", "content": "<html>......</html>" }
>
> So the higher levels don't have to know it's not normal HTTP, *but* we can
> have hundreds or thousands of requests pipelined.     Also, we can have
> multiple responses, or something, for event notification.   This would also
> allow for more transactional operation, if desired.   (Maybe
> "partial-response-to" and "final-response-to".)
>
> == 5.  Non-Listing Containers
>
> I want end-points that I can POST to, and GET some information about,
> without being swamped by an enumeration of everything posted there.   I
> don't want to have to include a Prefer header to avoid that swamping.
>
> You might consider this a taste, but I think it's an important usability
> issue.
>
> Again, with querying, you probably don't want to just be dumping the list of
> contained resources.   Querying also lets us control inlining, etc.
> Basically, if querying is available, I think we can skip serializing
> membership/containment triples.
>
> == 6.  PUT-to-Create
>
> There are situations where the client needs to lay out, on the server, an
> assortment of resources with carefully controlled URLs, such as a static
> website with interlinked html, css, js, images, etc.    This should be
> doable with PUT, where PUT creates the resource inside the container that
> owns that URL space.
>
> == 7.  DELETE WHERE
>
> One of our current demo apps is a game that is likely to generate a dozen
> resources per second per user.   Asking for each of those resources to be
> individually deleted afterwards seems rather silly, even problematic, so a
> DELETE WHERE operation would be nice.
>
> Yes, one could put them all in a container in this case, and define it as a
> kind of container that deletes its contained resources when it's deleted,,
> but there are situations where that wont work as well.  Maybe we want to
> delete the resources after about 60 seconds have gone by, for example.
> Easy to do with a DELETE WHERE, hard to do otherwise.
>
> ==  8.  WebMention for Data, backlinks used in Queries
>
> The basics of WebMention are in-scope for the Social Web WG, but it's not
> clear they'll apply it to arbitrary raw data, or say how the back-links are
> made available for use in queries.   Like many of these, this might be joint
> work with SWWG.
>
> ==  9.  Client Authentication
>
> Arguable this is quite out of scope, and yet it's hard to operate without
> it.   Especially things like (2) are easier with some kind of
> authentication.
>
> For a strawman of how easy it could be:
> https://github.com/sandhawke/spot/blob/master/spec.md
>
> == 10.  Access Control
>
> Obviously.
>
> My current radical theory is I only need is a flag that a page is
> owner-only, public, or group-read, and then a way to define the group of
> identities (see (9)) who can read it.    Most people imagine we need to
> control a lot more than read access, and perhaps we do, but I'm currently
> working with the theory that everyone makes their own contributions in their
> own space, notifying but never actually "writing" to anyone else's.
>
> == 11.  Combined Metadata and Content operations
>
> I don't think I can put this very crisply, but I've started thinking about
> resources as looking like this:
>
> { property1: value1,
>    property2: value2,
>    ...
>    content: "<html>....</html>",
>    contentType: "text/html"
>    ...
> }
>
> and it's so much nicer.   Basically, every resource is properties-value
> pairs, and some of that pv data is "content".    If you don't do something
> like this, queries and notifications and all that require us to bifurcate
> into a mechanism that's all about the content and another that's all about
> the metadata.
>
> LDP-RS's then become content-free resources, or null-content resources, but
> much less fundamentally different.   With the current LDP framing, what
> happens when you PUT an image to an LDP-RS or PUT rdf to what you created as
> an image?   This model clears that up nicely.
>
> But this might only work in the face of other assumptions I'm making, like
> the only triples at <R> are in a graph rooted at <R>, so you can think of
> them all as properties of R.    Also I've resolved httpRange-14 by saying
> I'm only interested in proper information-resource-denoting URLs, and you
> can use indirect properties for talking about people, places, events, etc.
> Maybe those radical assumptions are necessary for making this work.
>
> 12.  Forwarding
>
> We need to be able to move resources, because it's very hard to pick a URL
> and stick to it for decades.   And if it's used as part of other apps, and
> you don't stick to it, you'll break them.   The fear of this will, I
> suspect, significantly impede adoption.
>
> I propose three mechanisms.   Any one of them might work; between the three
> I'm fairly confident.
>
> 1.  Servers SHOULD check all their outgoing links at least once every 30
> days.   If they get a 301 response, they SHOULD update the link in place.
> Valid reason not to change it is this is some kind of a frozen/static page
> that can't be changed.
>
> 2.  When a client gets a 301, following a link it got from server A, it
> should notify server A, so A can rewrite the link sooner.   This could use a
> .well-known end-point on A, or there could be a Report-Link-Issues-To header
> on every resource which A serves telling clients how to report any 301s (and
> 404s) it finds.
>
> 3.  The notification mechanism (2) above, should include move notifications,
> so when a page is being watched, if it moves the watcher will be immediately
> notified and able to change its link.
>
> All this works much better if in addition to 301 we have a way to say a
> whole tree has moved.    That is, all URLs starting http://foo.example/x/
> should not be considered redirected to http://bar.example/y/, etc.
>
> With these mechanisms in place, links from compliant servers should start to
> transition quickly and drop off to zero after 30 days. Obviously links from
> hand-maintained resources, and printed on paper, etc, wont change, but those
> are usually consumed by humans who are better able to deal with a broken
> link anyway.
>
> == More...
>
> I'm sure there's more, but this gives the general shape of things. Do we
> want the new charter to target some of these?   To allow for some of these?
> And again: how do we assess when each of these is mature enough for a WG to
> begin looking at it?
>
> Thanks for considering this.
>
>       -- Sandro
>
>
> [1] https://www.youtube.com/watch?v=z0_XaJ97rF0
> [2] http://www.browserscope.org/?category=network&v=top
>
>
Received on Sunday, 9 November 2014 18:13:53 UTC