- From: Martynas Jusevičius <martynas@graphity.org>
- Date: Sun, 9 Nov 2014 19:13:25 +0100
- To: Melvin Carvalho <melvincarvalho@gmail.com>
- Cc: public-rww <public-rww@w3.org>, Sandro Hawke <sandro@w3.org>
Hey all, I will not waste my time convincing people LDP was a wrong design choice, but I'm certain we have quite a few of Sandro's points covered with our declarative Linked Data approach: https://github.com/Graphity/graphity-client/wiki/Reinventing-Web-applications#templates https://github.com/Graphity/graphity-client/wiki/How-Graphity-works Martynas graphityhq.com On Sun, Nov 9, 2014 at 6:56 PM, Melvin Carvalho <melvincarvalho@gmail.com> wrote: > FYI: Sandro just posted an excellent, detailed, wish list to the LDP WG. > Perhaps 75%+ could be related to RWW standards. > > Would love to see any, or all of these points, move closer to > standardization... > > ---------- Forwarded message ---------- > From: Sandro Hawke <sandro@w3.org> > Date: 9 November 2014 18:06 > Subject: ldp wishlist for crosscloud > To: Linked Data Platform WG <public-ldp-wg@w3.org> > > > As you may know, these days most of my time is no longer W3C-staff but is > funded research toward building "Crosscloud", an architecture for software > which allows users to control their data and which should encourage > innovation by making it much easier to develop powerful, open multi-user > (social) software. > > Back in January, we started off building on LDP, with Andrei creating > cimba.co. It's a microblogging app, intending to replicate some of early > Twitter, in a completely decentralized way using generic > (non-application-specific) LDP. To make that work, we had to extend LDP > with Access Control; clients can tell the server who can do what with each > research. We also made no use of Direct Containers, Indirect Containers, > or Paging. It's just Basic Containers, Access Control, WebID-TLS for > client authentication, Turtle for data, and non-RDF resources for photos. > (Maybe I'm forgetting some details; for demo see 2 minute video at [1].) > > While cimba basically works, it's painful in various ways and unable to do > many things, showing us that we need much more support from the servers. > We've also started building several more apps which are showing other things > that are important to have. > > We don't have it all figured out yet, let alone implemented, but here are a > few of the thing we probably need. I'm providing this list to help with > re-chartering, although most of these are not yet mature enough for > standardization. Maybe they will be in 6-12 months, though. As you look > at this list, one thing to figure out is how will we know when this module > is ready for the WG to take up. > > == 1. Queries > > This is a big one. It's impractical to have the cimba WebApp, running in > the browser, do all the GETS (hundreds, at least) every time it starts. It > needs to do a small number of queries, and have the server manage all the > aggregation. The server has to be able to query across other servers as > well as itself. > > We're currently playing with forms of "Link-Following SPARQL", but also with > a more restricted MongoDB-like query language, both for easier > implementation and for response-time/load guarantees. > > Queries make resource-paging obsolete, which is why I've lost interest in > paging. > > == 2. Change Notification to Web Servers > > If a server acting on behalf of the end-user is going to aggregate data from > other servers, it needs to be able to keep its copy in sync. Traditional web > cache + polling works only when it's okay to be seconds or minutes out of > date; many multi-user apps require much more responsiveness than that, so we > see a need for one server to be able to subscribe to change notification > from another. > > One might want something like PATCH to make this more efficient, but at the > moment it looks like we can keep the resources small enough that it doesn't > matter. > > == 3. Change Notification to Web Clients > > Similarly, Web Apps often need to know immediately when data has changed. > While it might be nice to have this be the same protocol as (2), our > preliminary investigation suggests the engineering trade-offs make that > impractical. So, this needs to be its own protocol. Probably it's just a > tweak to the query protocol where query results, rather than being a single > response collecting all the results, are ongoing add-result and > remove-result events. > > == 4. Operation over WebSockets > > It almost certainly makes sense to use WebSockets for (3), but it also makes > sense to use them for all the current LDP operations for high performance. > A modest client and server can probably process at least 1000 GETs per > second, but in practice, without WebSockets, they'll be slowed an order of > magnitude because of round trip delays. That is, say RTT is 50ms, so we > can do 20 round trips per second. Most browsers allow at most 6 > connections per hostname [2], so that's 120 round trips per second, max, no > matter how much CPU and RAM and bandwidth you have. > > I'm still thinking about what this might look like. Strawman is > something like each client-to-server message is a JSON object like { "verb": > "GET", "resource":"http://example.org", "accept": "text/html", "seq":7 } and > response are like { "in-reponse-to": 7, "status": 200, "contentType": > "text/html", "content": "<html>......</html>" } > > So the higher levels don't have to know it's not normal HTTP, *but* we can > have hundreds or thousands of requests pipelined. Also, we can have > multiple responses, or something, for event notification. This would also > allow for more transactional operation, if desired. (Maybe > "partial-response-to" and "final-response-to".) > > == 5. Non-Listing Containers > > I want end-points that I can POST to, and GET some information about, > without being swamped by an enumeration of everything posted there. I > don't want to have to include a Prefer header to avoid that swamping. > > You might consider this a taste, but I think it's an important usability > issue. > > Again, with querying, you probably don't want to just be dumping the list of > contained resources. Querying also lets us control inlining, etc. > Basically, if querying is available, I think we can skip serializing > membership/containment triples. > > == 6. PUT-to-Create > > There are situations where the client needs to lay out, on the server, an > assortment of resources with carefully controlled URLs, such as a static > website with interlinked html, css, js, images, etc. This should be > doable with PUT, where PUT creates the resource inside the container that > owns that URL space. > > == 7. DELETE WHERE > > One of our current demo apps is a game that is likely to generate a dozen > resources per second per user. Asking for each of those resources to be > individually deleted afterwards seems rather silly, even problematic, so a > DELETE WHERE operation would be nice. > > Yes, one could put them all in a container in this case, and define it as a > kind of container that deletes its contained resources when it's deleted,, > but there are situations where that wont work as well. Maybe we want to > delete the resources after about 60 seconds have gone by, for example. > Easy to do with a DELETE WHERE, hard to do otherwise. > > == 8. WebMention for Data, backlinks used in Queries > > The basics of WebMention are in-scope for the Social Web WG, but it's not > clear they'll apply it to arbitrary raw data, or say how the back-links are > made available for use in queries. Like many of these, this might be joint > work with SWWG. > > == 9. Client Authentication > > Arguable this is quite out of scope, and yet it's hard to operate without > it. Especially things like (2) are easier with some kind of > authentication. > > For a strawman of how easy it could be: > https://github.com/sandhawke/spot/blob/master/spec.md > > == 10. Access Control > > Obviously. > > My current radical theory is I only need is a flag that a page is > owner-only, public, or group-read, and then a way to define the group of > identities (see (9)) who can read it. Most people imagine we need to > control a lot more than read access, and perhaps we do, but I'm currently > working with the theory that everyone makes their own contributions in their > own space, notifying but never actually "writing" to anyone else's. > > == 11. Combined Metadata and Content operations > > I don't think I can put this very crisply, but I've started thinking about > resources as looking like this: > > { property1: value1, > property2: value2, > ... > content: "<html>....</html>", > contentType: "text/html" > ... > } > > and it's so much nicer. Basically, every resource is properties-value > pairs, and some of that pv data is "content". If you don't do something > like this, queries and notifications and all that require us to bifurcate > into a mechanism that's all about the content and another that's all about > the metadata. > > LDP-RS's then become content-free resources, or null-content resources, but > much less fundamentally different. With the current LDP framing, what > happens when you PUT an image to an LDP-RS or PUT rdf to what you created as > an image? This model clears that up nicely. > > But this might only work in the face of other assumptions I'm making, like > the only triples at <R> are in a graph rooted at <R>, so you can think of > them all as properties of R. Also I've resolved httpRange-14 by saying > I'm only interested in proper information-resource-denoting URLs, and you > can use indirect properties for talking about people, places, events, etc. > Maybe those radical assumptions are necessary for making this work. > > 12. Forwarding > > We need to be able to move resources, because it's very hard to pick a URL > and stick to it for decades. And if it's used as part of other apps, and > you don't stick to it, you'll break them. The fear of this will, I > suspect, significantly impede adoption. > > I propose three mechanisms. Any one of them might work; between the three > I'm fairly confident. > > 1. Servers SHOULD check all their outgoing links at least once every 30 > days. If they get a 301 response, they SHOULD update the link in place. > Valid reason not to change it is this is some kind of a frozen/static page > that can't be changed. > > 2. When a client gets a 301, following a link it got from server A, it > should notify server A, so A can rewrite the link sooner. This could use a > .well-known end-point on A, or there could be a Report-Link-Issues-To header > on every resource which A serves telling clients how to report any 301s (and > 404s) it finds. > > 3. The notification mechanism (2) above, should include move notifications, > so when a page is being watched, if it moves the watcher will be immediately > notified and able to change its link. > > All this works much better if in addition to 301 we have a way to say a > whole tree has moved. That is, all URLs starting http://foo.example/x/ > should not be considered redirected to http://bar.example/y/, etc. > > With these mechanisms in place, links from compliant servers should start to > transition quickly and drop off to zero after 30 days. Obviously links from > hand-maintained resources, and printed on paper, etc, wont change, but those > are usually consumed by humans who are better able to deal with a broken > link anyway. > > == More... > > I'm sure there's more, but this gives the general shape of things. Do we > want the new charter to target some of these? To allow for some of these? > And again: how do we assess when each of these is mature enough for a WG to > begin looking at it? > > Thanks for considering this. > > -- Sandro > > > [1] https://www.youtube.com/watch?v=z0_XaJ97rF0 > [2] http://www.browserscope.org/?category=network&v=top > >
Received on Sunday, 9 November 2014 18:13:53 UTC