Raw sockets feedback from Mozilla Network team from Jonas Sicking on 2013-08-25 (public-sysapps@w3.org from August 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Sun, 25 Aug 2013 16:26:46 -0700
To: "public-sysapps@w3.org" <public-sysapps@w3.org>, Patrick McManus <mcmanus@ducksong.com>
Message-ID: <CA+c2ei8-XOB99C9JcfGt+jdN23iQHWtZKVBGFpmMkLeczy15aw@mail.gmail.com>
Hi All,

I asked Patrick McManus from the Mozilla Network team to have a look
over the Raw Sockets draft. Here's his feedback (please keep Patrick
on cc for the replies since he's not subscribed to this list):

* the concept of an isolated "default local interface" (used in a few
different places) doesn't really align with networking.. generally
when a local interface isn't specified for a socket the one it is
assigned is derived from looking up the remote address in the routing
table and taking the address of the interface with the most preferred
route to the remote address.. This is equally true of TCP and UDP.

think about a case where you've got 3 interfaces defined on your
machine - 192.168.16.1 which is a natted address used to connect to
the internet, 130.215.21.5 which is an address assigned to you while
you're connected to your university's VPN, and 127.0.0.1 (localhost).

Without additional context - none of those qualify as the default
local interface. What generally happens is that when you ask to
connect to 8.8.8.8 your local address is assigned to be 192.168.16.1
because your Internet route will be used for 8.8.8.8.. but if you ask
to connect to 130.215.21.1 your local address is assigned to be
130.215.21.5.. and if you want to connect to 127.0.0.1 your local
address is also 127.0.0.1. So the remote address and the routing table
matter - there really isn't a default local address outside of that
context.

so in general whenever you want a local interface (and you did not
explicitly provide one) it can only be determined after your provide
the remote address and a system call is made to consult the routing
table.

you specifically asked about
https://github.com/sysapps/raw-sockets/issues/24 .. I'm not concerned
about blocking IO here.. the address lookup will require a system call
but its just another kernel service with a quick response.. no
different than gettimeofday() or something really. To me the issue is
really just that the concept of assigning a local address is
nonsensical until you have assigned the remote one.

* "bind the socket to any available randomly selected local port" -
its not clear you want to say randomly here. Sometimes local ports are
assigned sequentially according to availability.

* I don't really understand the loopback attribute. What does it mean
to set it to true but connect to 8.8.8.8? What does it mean to set it
to false but connect to 15.15.15.15 which your OS has bound to the
localhost interface? What purpose does it serve at all?

* I don't understand the onHalfClose event.. how do you know if the
server called half close or if it hung up completely? (they look the
same on the wire)

* there doesn't seem to be any discussion of the nagle algorithm or a
mapping to TCP_NODELAY anywhere in here and its an important topic for
TCP applications. I would suggest that you provide a TCP attribute
called sendCoalescing which defaults to false. Have the documentation
point out that this corresponds to the nagle algorithm, which in most
TCP APIs defaults to true/on, but because it is often the source of
performance problems we have changed the traditional default.
Applications that do a lot of small sends that aren't expecting
replies to each one (e.g. a ssh application) should enable nagle for
networking performance but most applications will not want to. A bit
more radically you could just disable nagle all the time without an
attribute, but if you do that the API document should really mention
it and the ssh client is an example of somewhere where such a config
is not optimal.

* the TCP onMessage event should be called onData or something. A
message, at least in network parlance, is data with a preserved
length.. UDP is like that - if you send a 500 byte message the
receiver either gets 500 bytes or nothing.. but TCP is all about data
streams.. so if you send 500 bytes in one call the receiver could end
up with anywhere from the first 1 to 500 bytes in its first read and
TCP doesn't provide any way to tell if it is just a partial down
payment.. folks used to TCP APIs will be used to that - its just the
term "message" is confusing.

* While we're talking nomenclature please don't use the term "raw"
anywhere in this document. That is a well known networking term and it
doesn't mean access to TCP and UDP interfaces - this was brought up to
me by several folks at the IETF meeting who were confused about the
applicability of this spec because of the use of the term raw sockets.
(raw sockets generally give access to ethernet level framing in normal
networking parlance) These are "transport level socket interfaces" or
"tcp/udp sockets" or so on..

* for the server socket API it should be called "onAccept" instead of
"onConnect" to match the commonly understood sockets API - accept() is
the system call you used to take an incoming connection. There doesn't
seem to be a compelling reason to invent new lingo for well understood
operations.

* the server socket API doesn't need an onOpen event.. there is
nothing that happens in between the constructor and onOpen that could
block

* some folks will question why the server socket API doesn't contain a
backlog attribute that corresponds to the listen() system call that is
traditionally part of the socket API

* on security - we need to think about this a little harder. What does
it mean to be priv'd enough to use this API? Simply being an installed
app or being an audited/signed one? The security implications are
pretty staggering here and I'm pretty sure the answer needs to be more
than "unprivd js off a webpage can't do this". Our user's privacy is
pretty much undermined by allowing this.. I know this is desired as a
backwards looking bridge, but the truth is it brings new functionality
to the mobile platform and that platform ought to at least be dealing
only in TLS and DTLS as table stakes.. While I think TLS and DTLS
ought to be mandatory - at the very least they ought to possible and
it doesn't really look like that use case has been fully baked into
the API yet.

* I guess I'm also concerned about TCPSocket.send().. the definition
of it says that if it exceeds an internal buffer of unknowable size it
must close the socket and throw an error. How can an application use
that safely if it doesn't know what value will overrun the socket and
trigger the exception and a close?

Rather than the true/false semantic being used as a return value here
(which requires the whole send be buffered) it would be traditional to
let the send accept 0->N bytes of the N bytes being sent and have that
(0->N value) be the return code. Partial sends are part and parcel of
stream APIs. That way if I have 4MB to send but you've only got 1MB of
buffers I don't have to magically guess that - I do a 4MB write, 1MB
gets buffered - 1MB is returned and I come back later to try and write
the next 3MB. (either immediately which probably returns 0 or after an
ondrain event).
Received on Sunday, 25 August 2013 23:27:44 UTC