Re: Is a Perfect Storm Forming For Distributed Social Networking? from Tim Anglade on 2009-08-12 (public-xg-socialweb@w3.org from August 2009)

From: Tim Anglade <tim.anglade@af83.com>
Date: Wed, 12 Aug 2009 13:24:33 +0200
To: Dave Raggett <dsr@w3.org>
Cc: Story Henry <henry.story@bblfish.net>, Melvin Carvalho <melvincarvalho@gmail.com>, public-xg-socialweb@w3.org
Message-Id: <76DF55CD-FDBB-406C-92CB-281A625A4E34@af83.com>
Hi there.

Le 12 août 09 à 12:07, Dave Raggett a écrit :

> On Wed, 12 Aug 2009, Story Henry wrote:
>
>> On 12 Aug 2009, at 10:46, Dave Raggett wrote:
>
>>> One of the challenges for distributed social networking is dealing  
>>> with sudden hotspots where a huge spike of interest in a single  
>>> person causes the server that hosts that person's profile to  
>>> falter under the load.

Wait, what? Actually that scenario is almost a non-issue since a  
single profile or asset (video, etc.) can be cached pretty  
efficiently. There are algorithms to detect what is becoming trendy  
very fast and replicate many copies where it's famous before it  
becomes and issue (Akamai bought its first two private islands from  
the IP of this kind of algorithm). On the open-source side, one can  
very easily implement his own, simple algorithm to push popular pages  
into a few memcached thrown here and there, dynamically.

The real technical burdens today in Social Networking come from assets  
or information that can't be cached too easily without missing users'  
expectations. (Aggregated) Activity Feeds (such as the one presented  
on your homepage on Facebook not your profile's) for example, are very  
very hard to handle properly.

(They are even harder to handle in a distributed or peer-to-peer  
context, if you were about to ask.)

>>> This suggests the value for applying peer to peer techniques to  
>>> dynamically distributing the load across many machines. Peer to  
>>> peer techniques can also help to sustain performance for search by  
>>> distributing the processing across many machines.

Well, I think you really mean distributed here instead of peer-to- 
peer. There's no distinct advantage peer-to-peer has over plain  
distributed for this scenario. Actually, being able to enforce a  
minimal amount of control (availability of nodes, forced instantiation  
of new nodes in extreme contexts) gives “corporate-distributed” a  
small advantage here over peer-to-peer.

>> Yes, though I think it should be very rare. Cloud computing  
>> solutions should be able to deal with such situations quite well.  
>> In Social Netoworks I think it is very rare than one person garners  
>> so much visibility that a simple server would get overloaded.  
>> Easier than changing the protocols for such a situation is to  
>> simply upgrade your service contract for a period.
>
> Hmm, the YouTube experience suggests otherwise. An obscure video  
> posted by someone previously unknown suddenly catches the attention  
> of people all around the World. It may not be practical to upgrade  
> your service contract, and just expanding the monthly download limit  
> may not solve the problem if the server is overloaded.

I'm assuming you mean the uploaders personal homepage's server (that  
he'd have linked from his YouTube page) would get overloaded. Because  
I have yet to see YouTube go down because of a single video. Anyway,  
Heroku [1] for example does instant (< 5 seconds) provisioning of CPU  
and RAM, across several machines, across several datacenters. The only  
thing they can't scale in “real time” is the DB capacity (takes <  30  
minutes). Sounds pretty practical to me. Oh and yeah, you can scale  
back down once the surge is over so it's not like you're signing away  
a two-year contract when you bump your specs up.

> Cloud based server's don't solve the problem unless your server is  
> virtualized across a large number of machines. A further issue is  
> putting all your eggs in one basket, in this case, by relying on a  
> cluster of machines owned by a single supplier e.g. Amazon or Google  
> as proposed for the "pushbutton web", see [1]

Well OK. The putting your eggs in one basket argument really has two  
meanings. But I'm gonna assume you mean that users should be worried  
because if their host goes down, their whole site or service goes  
down. Well, Amazon, Google and the rest are distributed across  
datacenters in the world so those scenarios are highly unlikely  
(although it did happen at Amazon at least once, I'll grant you that).  
Still, I'm not exactly losing sleep over the availability of my S3- 
hosted data at night. That's what SLAs are made for.

> If everyone has to pay for their own server (often bundled as part  
> of the ISP fees) we can be independent of any one company. P2P  
> technologies have proven their worth, and could be used to couple  
> everyone's server into a robust social web with automatic load  
> balancing.

Woah, woah. Where to begin.

1) P2P technologies have proven their worth but also their limits.  
Most importantly, no QoS (see: Skype's erratic test/voice/video  
quality; the failure of bittorrent live video streaming). Even in a  
context like ours that involves relatively small amounts of mostly  
text-based data, I'd be worried. Your example is actually very very  
poorly suited for P2P. P2P makes what is already popular (and has been  
for some time) easy to access. It makes the unknown or unpopular  
disappear for ever. It's not built to archive or make available. It's  
made for stealth distribution of what is famous and very connected. So  
a big problem arises at the early stage, when there are very little or  
no copies of the data available on the network.
Case in point. If today I want to download “Slumdog Millionnaire” off  
bittorrent (don't try this at home kids) it's very easy [2]. 38k  
people seeding at the time of this email 22k trying to leech it, and  
it'll probably stay at those levels for a long time. OK, let's  
contrast this with a more obscure reference. Let's say I want to grab  
Patlabor 2. Another critically-acclaimed movie (albeit a bit older),  
that some people consider to be Japanese Director Mamoru Oshii's  
finest offering. There the situation is dire [3]. No copy available.  
And that video might (in a copyright-respecting context) be the next  
big thing. The next stuff would want to talk about and that will be of  
little or no availability. Which in turn, might prevent it from being  
talked about in the first place. Of course, the availability problem  
is addresses differently by different protocols. But it's not an issue  
that can be overlooked and it's inherent to P2P systems.

2) “Automatic load balancing”. Hm. Good luck on that one. Honestly,  
outside of a closed, somewhat poll-able or controlled environment, I  
don't know how you can do this. But maybe you have insider vendor info  
at W3C that I don't. If you meant “distributed dissemination” (again,  
à la bittorrent), then I understand.

3) This whole theory that's been thrown around P2P Social Networks  
assumes that we solved the problem of multi-point, heterogenous data  
synchronization. We haven't. Microsoft can't do it correctly, Apple  
can't do it correctly, Google can't do it correctly. The fact is, it's  
a problem we might never be able to solve in our lifetime and that we  
probably don't need solved [4], especially not to do next-generation  
social networking.

> [1] http://dashes.com/anil/2009/07/the-pushbutton-web-realtime-becomes-real.html
>
> Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
>

That is all.


Cheers,
Tim

[1]  http://heroku.com/pricing
[2] http://btjunkie.org/torrent/Slumdog-Millionaire-2008-DVDSCR-Ahashare/43338af483ef4ce670cec079a8732bc196c4538c6bea
[3] http://www.mininova.org/tor/895938
[4] http://www.joelonsoftware.com/items/2008/05/01.html
Received on Wednesday, 12 August 2009 11:25:17 UTC