[whatwg] PeerConnection feedback from Harald Alvestrand on 2011-04-13 (public-whatwg-archive@w3.org from April 2011)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Wed, 13 Apr 2011 11:33:51 +0200
Message-ID: <4DA56DFF.8050709@alvestrand.no>
Since Ian seems to prefer to jumble all threads on a given group of 
issues together in one message, I'll attempt to use the same format this 
time.

On 04/12/11 04:09, Ian Hickson wrote:
> On Tue, 29 Mar 2011, Harald Alvestrand wrote:
>> A lot of firewalls (including Google's, I believe) drop the subsequent
>> part of fragmented UDP packets, because it's impossible to apply
>> firewall rules to fragments without keeping track of all fragmented UDP
>> packets that are in the process of being transmitted (and keeping track
>> would open the firewalls to an obvious resource exhaustion attack).
>>
>> This has made UDP packets larger than the MTU pretty useless.
> So I guess the question is do we want to limit the input to a fixed value
> that is the lowest used MTU (576 bytes per IPv4), or dynamically and
> regularly determine what the lowest possible MTU is?
>
> The former has a major advantage: if an application works in one
> environment, you know it'll work elsewhere, because the maximum packet
> size won't change. This is a erious concern on the Web, where authors tend
> to do limited testing and thus often fail to handle rare edge cases well.
>
> The latter has a major disadvantage: the path MTU might change, meaning we
> might start dropping data if we don't keep trying to determine the Path
> MTU. Also, it's really hard to determine the Path MTU in practice.
>
> For now I've gone with the IPv4 "minimum maximum" of 576 minus overhead,
> leaving 504 bytes for user data per packet. It seems small, but I don't
> know how much data people normally send along these low-latency unreliable
> channels.
>
> However, if people want to instead have the minimum be dynamically
> determined, I'm open to that too. I think the best way to approach that
> would be to have UAs implement it as an experimental extension at first,
> and for us to get implementation experience on how well it works. If
> anyone is interested in doing that I'm happy to work with them to work out
> a way to do this that doesn't interfere with UAs that don't yet implement
> that extension.
The practical MTU of the current Internet is the Ethernet MTU: 1500 
bytes minus headers.
The IPv6 "minimum maximum" of 1280 bytes was chosen to leave some room 
for headers, tunnels and so on.

My suggestion would be to note that applications need to be aware that 
due to firewalls and other types of black holes, you might get 
consistent packet loss for packets larger than a given size, typically 
1280 bytes or 1480 bytes, and leave it at that.
>
> On Tue, 29 Mar 2011, Harald Alvestrand wrote:
>> On 03/29/11 03:00, Ian Hickson wrote:
>>> On Wed, 23 Mar 2011, Harald Alvestrand wrote:
>>>>> Is there really an advantage to not using SRTP and reusing the RTP
>>>>> format for the data messages?
>>> Could you elaborate on how (S)RTP would be used for this? I'm all in
>>> favour of defering as much of this to existing protocols as possible,
>>> but RTP seemed like massive overkill for sending game status packets.
>> If "data" was defined as an RTP codec ("application/packets?"), SRTP
>> could be applied to the packets.
>>
>> It would impose a 12-byte header in front of the packet and the
>> recommended authentication tag at the end, but would ensure that we
>> could use exactly the same procedure for key exchange
> We already use SDP for key exchange for the data stream.
Yes, with a means of applying encryption that is completely unique to 
this specification. I'm not fond of novel cryptography designed by 
non-cryptographers; seen that done before.
(I've also seen flaws found in novel cryptography designed by 
cryptographers....)
>
>> multiplexing of multiple data streams on the same channel using SSRC,
> I don't follow. What benefit would that have?
If, for instance, a FPS wants one stream of events for bullet 
trajectories and another stream of events for sound-source movements, 
multiple data streams will allow the implementor to not invent his own 
multiplexing layer.
>
>> and procedures for identifying the stream in SDP (if we continue to use
>> SDP) - I believe SDP implicitly assumes that all the streams it
>> describes are RTP streams.
> That doesn't seem to be the case, but I could be misinterpreting SDP.
> Currently, the HTML spec includes instructions on how to identify the
> stream in SDP; if those instructions are meaningless due to a
> misunderstanding of SDP then we should fix it (and in that case, it might
> indeed make a lot of sense to use RTP to carry this data).
I'm not familiar with any HTTP-in-SDP spec; can you point out the reference?
>> I've been told that defining RTP packetization formats for a codec needs
>> to be done carefully, so I don't think this is a full specification, but
>> it seems that the overhead of doing so is on the same order of magnitude
>> as the currently proposed solution, and the security properties then
>> become very similar to the properties for media streams.
> There are very big differences in the security considerations for media
> data and the security considerations for the data stream. In particular,
> the media data can't be generated by the author in any meaningful way,
> whereas the data is entirely under author control. I don't think it is
> safe to assume that the security properties that we have for media streams
> necessarily work for data streams.

If we support streaming from recorded files, without transcoding, the 
difference is a lot smaller, since the attacker can create a handcrafted 
"audio/video data" file.
If we allow simplistic codecs like L16 or mu-law, we can't even tell by 
file analysis that it's not a valid file.

Have we ruled out the transmission of recorded data, or mandated 
transcoding?
>
> On Tue, 29 Mar 2011, Harald Alvestrand wrote:
>>>>> Recording any of these requires much more specification than just
>>>>> "record here".
>>> Could you elaborate on what else needs specifying?
>> One thing I remember from an API design talk I viewed: "An ability to
>> record to a file means that the file format is part of your API."
> Indeed.
>
>
>> For instance, for audio recording, it's likely that you want control
>> over whether the resulting file is in Ogg Vorbis format or in MP3
>> format; for video, it's likely that you may want to specify that it will
>> be stored using the VP8 video codec, the Vorbis audio codec and the
>> Matroska container format. These desires have to be communicated to the
>> underlying audio/video engine, so that the proper transforms can be
>> inserted into the processing stream
> Yes, we will absolutely need to add these features in due course. Exactly
> what we should add is something we have to determine from implementation
> experience.
>
>
>> and I think they have to be communicated across this interface; since
>> the output of these operations is a blob without any inherent type
>> information, the caller has to already know which format the media is
>> in.
> Depending on the use case and on implementation trajectories, this isn't a
> given. For example, if all UAs end up implementing one of two
> codec/container combinations and we don't expose any controls, it may be
> that the first few bytes of the output file are in practice enough to
> fully identify the format used.
>
> Note also that Blob does have a MIME type, so even without looking at the
> data itself, or at the UA string, it may be possible to get a general idea
> of the container and maybe even codecs.
I was looking at this from the other end: When I as a script author 
start a Record() process, I need to have some insight into what the 
format of the Blob (or whatever it is) is going to be.

It's possible that a reasonable method is generate-and-test:

    recorder = stream.record()
    recorder.callback = testFormat()
    recorder.getRecordedData()
    function testFormat(blob) {
        mimetype = blob.mimetype()
        if (!acceptableMimeType()) {
            report("Can't record, I don't like this format")
       }
    }

but it doesn't seem optimal to me; if the browser is able to record in 
OGG and MP3, and the application is willing to accept uploaded MP3 files 
but not OGG (or vice versa), it seems unreasonable to be unable to 
record just because the default format for the browser is the "wrong one".
> On Fri, 8 Apr 2011, Harald Alvestrand wrote:
>> The current (April 8) version of section 9.4 says that the config string for a
>> PeerConnection object is this:
>> ---------------------------
>> The allowed formats for this string are:
>>
>> "TYPE 203.0.113.2:3478"
>> Indicates a specific IP address and port for the server.
>>
>> "TYPE relay.example.net:3478"
>> Indicates a specific host and port for the server; the user agent will look up
>> the IP address in DNS.
>>
>> "TYPE example.net"
>> Indicates a specific domain for the server; the user agent will look up the IP
>> address and port in DNS.
>>
>> The "TYPE" is one of:
>>
>> STUN
>> Indicates a STUN server
>> STUNS
>> Indicates a STUN server that is to be contacted using a TLS session.
>> TURN
>> Indicates a TURN server
>> TURNS
>> Indicates a TURN server that is to be contacted using a TLS session.
>> -------------------------------
>> I believe this is insufficient, for a number of reasons:
>> - For future extensibility, new forms of init data needs to be passed without
>> invalidating old implementations. This indicates that a serialized JSON object
>> with a few keys of defined meaning is a better basic structure than a format
>> string with no format identifier.
> The format is already defined in a forward-compatible manner.
> Specifically, UAs are currently required to ignore everything past the
> first line feed character. In a future version, we could extend this API
> by simply including additional data after the linefeed.
The cost of supporting formats is the cost of writing parsers; the JSON 
string parser already exists, and allows extensibility within the scope 
of JSON, while the parser for the new string object will have to be 
written, and changed each time the spec gets extended.

One of the reasons people have given for why they use XML rather than 
the RFC-822 key:value (or key:value, value, value) syntax is that the 
parsers for XML are regular, while the RFC-822 parsers fill up with 
special-casing all the time; they're willing to pay the (hefty) overhead 
of XML in order to have a regularized parser.
>
>> - For use with STUN and TURN, we need to support the case where we need a STUN
>> server and a TURN server, and they're different.
> TURN servers are STUN servers, at least according to the relevant RFCs, as
> far as I can tell. Can you elaborate on which TURN servers do not
> implement STUN, or explain the use cases for having different TURN and
> STUN servers? This is an area where I am most definitely not an expert, so
> any information here would be quite helpful.
They use the same protocol, but for two different purposes: STUN servers 
tell you what your address is, and TURN servers relay data. STUN is so 
cheap, it's not unreasonable to assume that people will not bother with 
authentication-for-use; for TURN, limiting access to your own customers 
is definitely something you expect people to do.

Google Talk deploys its STUN service at stun.l.google.com, and its 
TURN-like service (it's not quite TURN compliant) at relay.l.google.com.

At the moment, they are backed by the same binary, but the DNS lookup 
for the two names does not return the same result.

>
>> - The method of DNS lookup is not specified. In particular, it is not
>> specified whether SRV records are looked up or not.
> This seems to be entirely specified. Please ensure that you are reading
> the normative conformance criteria for user agents, and not the
> non-normative authoring advice, which is only a brief overview.
Yes, I spoke a bit hastily. The authoritative text says (unless I missed 
something):

    * The IP address, host name, or domain name of the server ishost.
    * The port to use isport. If this is the empty string, then only a
      domain name is configured (and the ICE Agent will use DNS SRV
      requests to determine the IP address and port).

This needs a reference to the relevant RFC to be complete (section 9 of 
RFC 5389 for STUN,
RFC 5766 section 6.1 for TURN). It doesn't specify what will happen if 
there is a domain name, no port, and no SRV records (by referencing the 
RFCs, this becomes clear - you look up the A/AAAA record and use port 
3478/5389 as appropriate).

The same section says:

>The long-term username for the STUN or TURN server is theASCII 
serialization 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#ascii-serialization-of-an-origin>of 
theentry script 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/browsers.html#entry-script>'sorigin 
<http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#origin>; 
the >long-term password is the empty string.

I found this exceedingly surprising; this effectively means that you're 
not protecting your STUN/TURN exchanges.

>
>> - We have no evaluation that shows that we'll never need the unencrypted
>> TCP version of STUN or TURN, or that we need to support the encrypted
>> STUN version. We should either support all formats that the spec can
>> generate, or we should get a reasonable survey of implementors on what
>> they think is needed.
>
> If anyone has any data on this, that would indeed be quite useful.

> On Fri, 8 Apr 2011, Glenn Maynard wrote:
>> FWIW, I thought the block-of-text configuration string was peculiar and
>> unlike anything else in the platform.  I agree that using a
>> configuration object (of some kind) makes more sense.
> An object wouldn't work very well as it would add additional steps in the
> case where someone just wants to transmit the configuration information to
> the client as data. Using JSON strings as input as Harald suggested could
> work, but seems overly verbose for such a simple data.
FWIW, I'm completely indifferent to whether the caller or the callee 
calls JSON.parse() (or equivalent) on the data. YMMV.

When data is passed across the wire, it has to be serialized; when it's 
deserialized, the result should not have the potential of being an 
active object (aka "virus carrier"). Apart from that, I don't care.
Received on Wednesday, 13 April 2011 02:33:51 UTC