[whatwg] The vision behind the <device> element and the ConnectionPeer interface from Ian Hickson on 2010-07-13 (public-whatwg-archive@w3.org from July 2010)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 13 Jul 2010 23:49:52 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.1007132253160.7242@ps20323.dreamhostps.com>

Several people have asked me to elaborate on the vision I have for the
<device> element and the ConnectionPeer interface which are currently
specified in draft form here:

http://www.whatwg.org/specs/web-apps/current-work/complete/commands.html#devices
http://www.whatwg.org/specs/web-apps/current-work/complete/commands.html#connectionpeer

There are several use cases that are informing this work:

- Video conferencing
- Real-time games
- Peer-to-peer file transfer

For video conferencing, we need a mechanism through which a local camera
and microphone setup can be enabled and access to which can be granted to
scripts in the page, but only with the user's consent and without being
vulnerable to the user being tricked (e.g. through click-jacking). There
then needs to be a way to transmit this stream of video to another peer's
browser, for display. There is also a need to display the stream locally.

For real-time games, we need a mechanism that can send data to a peer or
server, where the latency is more important than reliability. That is,
it's not critical that packets arrive in order (or at all); once a packet
is late, it's useless and there is no reason to resend it. For example,
transient game state information becomes stale quickly. (Video is similar;
only data for the latest frame is interesting: data for older frames is
not useful.)

For peer-to-peer file transfer, we need a mechanism to send data, either
in the form of text or binary, to another peer.

In all three cases, we need a mechanism to establish a connection to
another peer. This is a non-trivial problem, because of NAT and firewalls.
Therefore, we may need help from a third-party server to establish the
connection.

Ideally, I'd like the solution to involve no special code at the JS level,
and I'd like the server-side connection helper to be something that can be
implemented by just setting up a server with no special code and minimal
configuration. Thus all the complexity would be in the browsers and in
these reusable servers. The only work the author needs to do in this
vision is have scripts running in the browsers of both peers decide that
they want to connect to each other, and have them exchange some opaque
information and information about the helper server, e.g. by sending a
message through XMLHttpRequest to the server to bounce to the other user.

To do this, there have to be three configuration strings:

1. A description of the helper server to give to the browsers, so they
know who to contact to get things started.

2. A description of the first peer that contains all the information the
second peer needs to connect to the first peer.

3. A description of the second peer that contains all the information the
first peer needs to connect to the second peer.

The server can generate the data for 1.

The browser for the first peer can generate the data for 2.

The browser for the second peer can generate the data for 3.

The script then just needs to get the data for 1 and give it to the two
peers, the data for 2 and give it to the second peer, and the data for 3
and give it to the first peer.

Once the browsers have the information, they can set up a connection to
each other using the helper server. For example, in the simplest case, the
information for 2 is just the IP address and listening port of the first
peer, the information for 3 is just the IP address and listening port of
the second peer, and the browsers just have to each open a TCP connection
to each other to set up a control channel over which everything else can
be done.

Now of course things aren't that simple -- in practice the information
will have to include details such as keys to make sure the browsers can
prove to each other that they are who they expect to be, there will have
to be information obtained from the helper server (e.g. the IP address of
the NAT router), and there will likely be information about what protocols
are supported, and so on.

None of this has to be exposed to the script, though, which is the great
thing about this mechanism.

So what is missing here?

Well the main thing missing in this vision is the format of the
configuration strings, and rules for how they are to be interpreted. We
need a specification that defines how the helper servers and browsers are
to interact, and how, once they have a control channel set up, how they
are to set up further channels such as a way to be ready to receive UDP
packets for a video stream, how to get ready to receive a file, and so on.

I believe this is a self-contained problem, which can be defined as a
black box that exposes only the following:

Browser side:
Methods to:
- add a configuration string from the other peer
- add a configuration string from the helper server
- open a connection
- close a connection
- send plain text reliably
- send plain text unreliably, with low latency
- send binary data reliably
- send binary data unreliably, with low latency
- start sending a stream
- stop sending a stream
Callbacks for:
- getting a configuration string for the other side
- the connection being established
- a permanent failure to establish a connection
- the connection being shut down
- receiving text
- receiving binary data
- being notified of a new stream being received
- being notified that a stream has been dropped

Server side:
Callbacks for:
- getting a configuration string for the browsers

Everything else (how the connections work, what the configuration strings
look like, etc) would just be something internal to that specification,
and separate from HTML.

The black box could also include details like how to negotiate different
bitrates for video streams; these again don't have to be exposed to the
script, at least not in the first version. Since the browser is the one
generating the stream and the one doing the connection work, the black box
can directly control the stream generation as well.

If this is something that resonates with people, then what we really need
is for a group of people to get together and design this black box. If
people would like to do it through the WHATWG I'd be happy to create a
mailing list and whatever other infrastructure is needed for this purpose;
alternatively it could be done at the IETF or in some other group.

If there are any questions about this, please feel free to ask them, I'll
try to explain what I invision here in more detail. If this isn't a
direction people are interested in, then that's fine too. :-)

--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 13 July 2010 16:49:52 UTC