W3C home > Mailing lists > Public > whatwg@whatwg.org > January 2010

[whatwg] Web-sockets + Web-workers to produce a P2P website or application

From: Andrew de Andrade <andrew@deandrade.com.br>
Date: Fri, 22 Jan 2010 00:03:44 -0200
Message-ID: <42b395981001211803s6f5e9a2cj173996c210ad0267@mail.gmail.com>
comments inline

On Thu, Jan 21, 2010 at 5:44 PM, Ivan ?u?ak <izuzak at gmail.com> wrote:
> Hi Andrew,
>
> That's an interesting idea. Here are some of my thoughts:
>
> As your Google friend noted, I'm wondering why you'd want to implement
> this on the HTML5 level, not on the browser (C++) level. Implementing
> such a protocol within the browser would probably be faster, more
> secure and a better design choice since the functionality is basically
> the same for every web application. I'm guessing the reason is that
> you do not want to achieve this by changing/extending the browser but
> rather by using new HTML5 technologies instead. Am I right?

While I realize that this is best implemented at the browser level, my
concern is if it will be an idea that would be broadly adopted by all
browser vendors if it is implemented at that level. It's an idea that
really has value once you've reached critical mass. For example, this
would be an extremely potent idea if the two most popular browsers
have implemented this by 2014 to 2015 or so. HTML5 is a web technology
spec. I imagine that putting it in the spec would greatly increase the
chances of browsers supporting this functionality.

However, I am not familiar with the politics that govern whether or
not browser product managers implement these features at the browser
level?

Being pragmatic, do you think it's possible to get widespread adoption
quickly for a feature like this as the browser level or do you agree
that this would be more likely to be adopted quickly if it was part of
a W3C spec?


>
> If so, then a major problem is that the browser is not a network
> server, rather only a client. In order for a browser A to connect
> using WebSockets to a browser B which executes some process, browser B
> must expose a network accessible end-point to which that process is
> tied i.e. the browser must expose TCP/IP end-point. I guess that the
> NAT traversal problem Melvin mentioned basically covers this.
>

Assuming that this idea really only sees adoption within a few years,
won't IPv6 resolve the issue of NAT traversal?


> If the functionality you wish to achieve is distributing serving of
> static content to clients which already acquired content, would a
> dynamically changeable HTTP redirect table implemented on the server
> cover the same functionality? E.g. 1) the client would request and
> receive a web resource from the server, 2) if indicated the resource
> was static - the client would expose the resource from it's machine by
> creating a network accessible end-point and tying a process for
> serving the content to it, 3) the client would let the server know it
> is serving the content 4) the server would register the client
> end-point as a possible server for the content, 5) when another client
> would request the resource from the server, the server would HTTP
> redirect it to the end-point of a client that is serving it. This
> solution implies that a specific resource must be fetched from a
> single server, while your idea implies fetching a resource from
> different servers in parallel (as in torrent protocols) and would
> probably be faster. What do you think?

I'm not sure. I don't really have the technical depth to evaluate this
solution. I'll ask some of my friends what they think.

>
> In a broader context of connecting both servers and client devices
> into a large network for scalable execution of applications, your idea
> reminds me of a recent blog article I read:
> http://highscalability.com/blog/2009/12/16/building-super-scalable-systems-blade-runner-meets-autonomic.html.
> Definitely have a look at it - it's a bit on the visionary side and
> longish, but worth it. I believe a planet-wide execution platform
> consisting of every network device will eventually happen and that's
> why I think we should all be contributing in that direction, which
> your idea definitely is.

I took a brief look. This article looks very interesting. I'm going to
have a gander at it tomorrow. Maybe I'll shoot the HighScalability
people an email and see if they want to comment on this idea on their
blog.

>
> Regards,
> Ivan Zuzak
>
> On Tue, Jan 19, 2010 at 17:59, Andrew de Andrade
> <andrew at deandrade.com.br> wrote:
>> I have an idea for a possible use case that as far as I can tell from
>> previous discussions on this list has not been considered or at least
>> not in the form I present below.
>>
>> I have a friend whose company produces and licenses online games for
>> social networks such as Facebook, Orkut, etc.
>>
>> One of the big problems with these games is the shear amount of static
>> content that must be delivered via HTTP once the application becomes
>> popular. In fact, if a game becomes popular overnight, the scaling
>> problems with this static content quickly becomes a technical and
>> financial problem.
>>
>> To give you an idea of the magnitude and scope, more than 4 TB of
>> static content is streamed on a given day for one of the applications.
>> It's very likely that others with similarly popular applications have
>> encountered the same challenge.
>>
>> When thinking about how to resolve this, I took my usual approach of
>> thinking how do we decentralize the content delivery and move towards
>> an agent-based message passing model so that we do not have a single
>> bottleneck technically and so we can dissipate the cost of delivering
>> this content.
>>
>> My idea is to use web-sockets to allow the browser function more a
>> less like a bit-torrent client. Along with this, web-workers would
>> provide threads for handling the code that would function as a server,
>> serving the static content to peers also using the program.
>>
>> If you have lots of users (thousands) accessing the same application,
>> you effectively have the equivalent of one torrent with a large swarm
>> of users, where the torrent is a package of the most frequently
>> requested static content. (I am assuming that the static content
>> requests follow a power law distribution, with only a few static files
>> being responsible for the overwhelming bulk of static data
>> transferred.).
>>
>> As I have only superficial knowledge of the technologies involved and
>> the capabilities of HTML5, I passed this idea by a couple of
>> programmer friends to get their opinions. Generally they thought is
>> was a very interesting idea, but that as far as they know, the
>> specification as it stands now is incapable of accommodating such a
>> use case.
>>
>> Together we arrived at a few criticisms of this idea that appear to be
>> resolvable:
>>
>> -- Privacy issues
>> -- Security issues (man in the middle attack).
>> -- content labeling (i.e. how does the browser know what content is
>> truly static and therefore safe to share.)
>> -- content signing (i.e. is there some sort of hash that allows the
>> peers to confirm that the content has not been adulterated).
>> -- privacy issues
>>
>> All in all, many of these issues have been solved by the many talented
>> programmers that have developed the current bit-torrent protocol,
>> algorithms and security features. The idea would simply to design the
>> HTML5 in such a way that it can permit the browser to function as a
>> full-fledged web-application bit-torrent client-server.
>>
>> Privacy issues can be resolved by possibly defining something such as
>> "browser security zones" or "content label" whereby the content
>> provider (application developer) labels content (such as images and
>> CSS files) as safe to share (static content) and labels dynamic
>> content (such as personal photos, documents, etc.) as unsafe to share.
>>
>> Also in discussing this, we come up with some potentially useful
>> extensions to this use case.
>>
>> One would be the versioning of the "torrent file", such that the
>> torrent file could represent versions of the application. i.e. I
>> release an application that is version 1.02 and it becomes very
>> popular and there is a sizable swarm. At some point in the future I
>> release a new version with bug-fixes and additional features (such as
>> CSS sprites for the social network game). I should be able to
>> propagate this new version to all clients in the swarm so that over
>> some time window such as 2 to 4 hours all clients in the swarm
>> discover (via push or pull) the new version and end up downloading it
>> from the peers with the new version. The only security feature I could
>> see that would be required would be that once a client discovers that
>> their is a new version, it would hit up the original server to
>> download a signature/fingerprint file to verify that the new version
>> that it is downloading from its peers is legitimate.
>>
>> The interesting thing about this idea is that it would permit large
>> portions of sites to exist in virtual form. Long-term I can imagine
>> large non-profit sites such as Wikipedia functioning on top of this
>> structure in such a way that it greatly reduces the amount of funding
>> necessary. It would be partially distributed with updates to wikipedia
>> being distributed via lots of tiny versions from super-nodes ? la a
>> Skype type P2P model.
>>
>> This would also take a lot of power out of the hands of those telcos
>> that are anti-net neutrality. This feature would basically permit a
>> form of net neutrality by moving content to the fringes of the
>> network.
>>
>> Let me know your thoughts and if you think this would be possible
>> using Web-sockets and web-workers, and if not, what changes would be
>> necessary to allow this to evolve.
>>
>> Sincerely,
>>
>> Andrew J. L. de Andrade
>> S?o Paulo, Brazil
>>
>> (P.S. I consider myself a pretty technical person, but I don't really
>> program. I only dabble in programming as a hobby and to better
>> understand my colleagues. Feel free to be as technical as you want in
>> your reply, but please forgive me if I make or made any bonehead
>> mistakes.)
>>
>
Received on Thursday, 21 January 2010 18:03:44 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:20 UTC