- From: Justin Uberti <juberti@google.com>
- Date: Mon, 11 Apr 2011 23:17:30 -0700
CIL On Mon, Apr 11, 2011 at 7:09 PM, Ian Hickson <ian at hixie.ch> wrote: > On Tue, 29 Mar 2011, Robert O'Callahan wrote: > > Ian Hickson wrote: > > > > > > I agree that (on the long term) we should support stream filters on > > > streams, but I'm not sure I understand <video>'s role in this. > > > Wouldn't it be more efficient to have something that takes a Stream on > > > one side and outputs a Stream on the other, possibly running some > > > native code or JS in the middle? > > > > We could. > > > > I'm trying to figure out how this is going to fit in with audio APIs. > > Chris Rogers from Google is proposing a graph-based audio API to the W3C > > Audio incubator group which would overlap considerably with a Stream > > processing API like you're suggesting (although in his proposal > > processing nodes, not streams, are first-class). > > Indeed. I think it would make sense to have nodes in this graph that could > take Streams as input, or output the resulting data as Streams. > > > > A fundamental problem here is that HTML media elements have the > > functionality of both sources and sinks. > > Indeed. Unfortunately, at the time that we were designing <video>, the > later needs of multitrack video and video conferencing were not completely > clear. If we could go back, I think it would make sense to split the part > of <video> that does network traffic and the part of <video> that does > rendering and UI control from each other, if only to make it possible > now to have them be split further for video conferencing and multitrack. > Sadly, that's not really an option. > > > > You want to see <video> and <audio> only as sinks which accept streams. > > But in that case, if we create an audio processing API using Streams, > > we'll need a way to download stream data for processing that doesn't use > > <audio> and <video>, which means we'll need to replicate <src> elements, > > the type attribute, networkstates, readystates, possibly the 'loop' > > attribute... should we introduce a new object or element that provides > > those APIs? How much can be shared with <video> and <audio>? Should we > > be trying to share? (In Chris Rogers' proposal, <audio> elements are > > used as sources, not sinks.) > > I think at this point we should probably just make media elements (<video> > and <audio>) support being used both as sources and as sinks. They'll just > be a little overweight when used just for one of those purposes. > > Basically I'm suggesting viewing media elements like this: > > URL to network resource > URL to Stream object > URL to Blob object > | > | ---------------------------- > +-> :SINK SOURCE: -+ > ------------. T .----------- | > | | | > | | Input for > | | Audio API > | | > \ / > \ / > V > DISPLAY > AND > SOUND BOARD > > It's a source that happens to have built-in monitor output. Or a sink that > happens to have a monitor output port. Depending on how you want to see it. > > > On Tue, 29 Mar 2011, Harald Alvestrand wrote: > > > > A lot of firewalls (including Google's, I believe) drop the subsequent > > part of fragmented UDP packets, because it's impossible to apply > > firewall rules to fragments without keeping track of all fragmented UDP > > packets that are in the process of being transmitted (and keeping track > > would open the firewalls to an obvious resource exhaustion attack). > > > > This has made UDP packets larger than the MTU pretty useless. > > So I guess the question is do we want to limit the input to a fixed value > that is the lowest used MTU (576 bytes per IPv4), or dynamically and > regularly determine what the lowest possible MTU is? > > The former has a major advantage: if an application works in one > environment, you know it'll work elsewhere, because the maximum packet > size won't change. This is a erious concern on the Web, where authors tend > to do limited testing and thus often fail to handle rare edge cases well. > > The latter has a major disadvantage: the path MTU might change, meaning we > might start dropping data if we don't keep trying to determine the Path > MTU. Also, it's really hard to determine the Path MTU in practice. > > For now I've gone with the IPv4 "minimum maximum" of 576 minus overhead, > leaving 504 bytes for user data per packet. It seems small, but I don't > know how much data people normally send along these low-latency unreliable > channels. > > However, if people want to instead have the minimum be dynamically > determined, I'm open to that too. I think the best way to approach that > would be to have UAs implement it as an experimental extension at first, > and for us to get implementation experience on how well it works. If > anyone is interested in doing that I'm happy to work with them to work out > a way to do this that doesn't interfere with UAs that don't yet implement > that extension. > > In practice, applications assume that the minimum MTU is 1280 (the minimum IPv6 MTU), and limit payloads to about 1200 bytes so that with framing they will fit into a 1280-byte MTU. Going down to 576 would significantly increase the packetization overhead. > On Tue, 29 Mar 2011, Harald Alvestrand wrote: > > On 03/29/11 03:00, Ian Hickson wrote: > > > On Wed, 23 Mar 2011, Harald Alvestrand wrote: > > > > > > > > > > Is there really an advantage to not using SRTP and reusing the RTP > > > > > format for the data messages? > > > > > > Could you elaborate on how (S)RTP would be used for this? I'm all in > > > favour of defering as much of this to existing protocols as possible, > > > but RTP seemed like massive overkill for sending game status packets. > > > > If "data" was defined as an RTP codec ("application/packets?"), SRTP > > could be applied to the packets. > > > > It would impose a 12-byte header in front of the packet and the > > recommended authentication tag at the end, but would ensure that we > > could use exactly the same procedure for key exchange > > We already use SDP for key exchange for the data stream. > > > > multiplexing of multiple data streams on the same channel using SSRC, > > I don't follow. What benefit would that have? > If you are in a conference that has 10 participants, you don't want to have to set up a new transport for each participant. Instead, SSRC provides an excellent way to multiplex multiple media streams over a single RTP session (and network transport). > > > > and procedures for identifying the stream in SDP (if we continue to use > > SDP) - I believe SDP implicitly assumes that all the streams it > > describes are RTP streams. > > That doesn't seem to be the case, but I could be misinterpreting SDP. > Currently, the HTML spec includes instructions on how to identify the > stream in SDP; if those instructions are meaningless due to a > misunderstanding of SDP then we should fix it (and in that case, it might > indeed make a lot of sense to use RTP to carry this data). > > > > I've been told that defining RTP packetization formats for a codec needs > > to be done carefully, so I don't think this is a full specification, but > > it seems that the overhead of doing so is on the same order of magnitude > > as the currently proposed solution, and the security properties then > > become very similar to the properties for media streams. > > There are very big differences in the security considerations for media > data and the security considerations for the data stream. In particular, > the media data can't be generated by the author in any meaningful way, > whereas the data is entirely under author control. I don't think it is > safe to assume that the security properties that we have for media streams > necessarily work for data streams. > > > On Tue, 29 Mar 2011, Harald Alvestrand wrote: > > > > > > > > > > Recording any of these requires much more specification than just > > > > > "record here". > > > > > > Could you elaborate on what else needs specifying? > > > > One thing I remember from an API design talk I viewed: "An ability to > > record to a file means that the file format is part of your API." > > Indeed. > > > > For instance, for audio recording, it's likely that you want control > > over whether the resulting file is in Ogg Vorbis format or in MP3 > > format; for video, it's likely that you may want to specify that it will > > be stored using the VP8 video codec, the Vorbis audio codec and the > > Matroska container format. These desires have to be communicated to the > > underlying audio/video engine, so that the proper transforms can be > > inserted into the processing stream > > Yes, we will absolutely need to add these features in due course. Exactly > what we should add is something we have to determine from implementation > experience. > > > > and I think they have to be communicated across this interface; since > > the output of these operations is a blob without any inherent type > > information, the caller has to already know which format the media is > > in. > > Depending on the use case and on implementation trajectories, this isn't a > given. For example, if all UAs end up implementing one of two > codec/container combinations and we don't expose any controls, it may be > that the first few bytes of the output file are in practice enough to > fully identify the format used. > > Note also that Blob does have a MIME type, so even without looking at the > data itself, or at the UA string, it may be possible to get a general idea > of the container and maybe even codecs. > > > On Wed, 30 Mar 2011, Stefan H?kansson LK wrote: > > > > This is absolutely correct, and it is not only about codecs or container > > formats. Maybe you need to supply info like audio sampling rate, video > > frame rate, video resolution, ... There was an input on this already > > last November: > > > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-November/029069.html > > Indeed. The situation hasn't changed since then: > > > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-February/030484.html > > > On Tue, 29 Mar 2011, Stefan H?kansson LK wrote: > > > > > > The web application must be able to define the media format to > > > > > > be used for the streams sent to a peer. > > > > > > > > > > Shouldn't this be automatic and renegotiated dynamically via SDP > > > > > offer/answer? > > > > > > > > Yes, this should be (re)negotiated via SDP, but what is unclear is > > > > how the SDP is populated based on the application's preferences. > > > > > > Why would the Web application have any say on this? Surely the user > > > agent is in a better position to know what to negotiate, since it will > > > be doing the encoding and decoding itself. > > > > The best format of the coded media being streamed from UA a to UA b > > depends on a lot of factors. An obvious one is that the codec used is > > supported by both UAs.... As you say much of it can be handled without > > any involvement from the application. > > > > But let's say that the app in UA a does "addStream". The application in > > UA b (the same application as in UA a) has two <video> elements, one > > using a large display size, one using a small size. The UAs don't know > > in which element the stream will be rendered at this stage (that will be > > known first when the app in UA b connects the stream to one of the > > elements at "onaddstream"), so I don't understand how the UAs can select > > a suitable video resolution without the application giving some input. > > (Once the stream is being rendered in an element the situation is > > different - then UA b has knowledge about the rendering and could > > somehow inform UA a.) > > I had assumed that the video would at first be sent with some more or less > arbitrary dimensions (maybe the native ones), and that the receiving UA > would then renegotiate the dimensions once the stream was being displayed > somewhere. Since the page can let the user change the <video> size > dynamically, it seems the UA would likely need to be able to do that kind > of dynamic update anyway. > > > On Thu, 31 Mar 2011, Lachlan Hunt wrote: > > > > When getUserMedia() is invoked with unknown options, the spec currently > > implicitly requires a PERMISSION_DENIED error to be thrown. > > > > e.g. navigator.getUserMedia("foo"); > > > > In this case, the option for "foo" is unknown. Presumably, this would > > fall under platform limitations, and would thus jump from step 11 to the > > failure case, and throw a permission denied error. > > > > We are wondering if this is the most ideal error to throw in this case, > > as opposed to introducing a more logical NOT_SUPPORTED error, and if it > > might be useful to authors to distinguish these cases? > > > > We assume, however, that if the author requests "audio,foo", and the > > user grants access to audio, then the success callback would be invoked, > > despite the unknown option for "foo". > > Good point. I've updated the spec to fire NOT_SUPPORTED_ERR if there's no > known value. > > > On Fri, 8 Apr 2011, Harald Alvestrand wrote: > > > > The current (April 8) version of section 9.4 says that the config string > for a > > PeerConnection object is this: > > --------------------------- > > The allowed formats for this string are: > > > > "TYPE 203.0.113.2:3478" > > Indicates a specific IP address and port for the server. > > > > "TYPE relay.example.net:3478" > > Indicates a specific host and port for the server; the user agent will > look up > > the IP address in DNS. > > > > "TYPE example.net" > > Indicates a specific domain for the server; the user agent will look up > the IP > > address and port in DNS. > > > > The "TYPE" is one of: > > > > STUN > > Indicates a STUN server > > STUNS > > Indicates a STUN server that is to be contacted using a TLS session. > > TURN > > Indicates a TURN server > > TURNS > > Indicates a TURN server that is to be contacted using a TLS session. > > ------------------------------- > > I believe this is insufficient, for a number of reasons: > > - For future extensibility, new forms of init data needs to be passed > without > > invalidating old implementations. This indicates that a serialized JSON > object > > with a few keys of defined meaning is a better basic structure than a > format > > string with no format identifier. > > The format is already defined in a forward-compatible manner. > Specifically, UAs are currently required to ignore everything past the > first line feed character. In a future version, we could extend this API > by simply including additional data after the linefeed. > > > > - For use with STUN and TURN, we need to support the case where we need a > STUN > > server and a TURN server, and they're different. > > TURN servers are STUN servers, at least according to the relevant RFCs, as > far as I can tell. Can you elaborate on which TURN servers do not > implement STUN, or explain the use cases for having different TURN and > STUN servers? This is an area where I am most definitely not an expert, so > any information here would be quite helpful. > > > > - The method of DNS lookup is not specified. In particular, it is not > > specified whether SRV records are looked up or not. > > This seems to be entirely specified. Please ensure that you are reading > the normative conformance criteria for user agents, and not the > non-normative authoring advice, which is only a brief overview. > > > > - We have no evaluation that shows that we'll never need the unencrypted > > TCP version of STUN or TURN, or that we need to support the encrypted > > STUN version. We should either support all formats that the spec can > > generate, or we should get a reasonable survey of implementors on what > > they think is needed. > > If anyone has any data on this, that would indeed be quite useful. > > > On Fri, 8 Apr 2011, Harald Alvestrand wrote: > > > > BTW, I haven't been on this list that long... if anyone has advice on > > whether such discussions are better as buganizer threads or as whatwg > > mailing list threads, please give it! > > Discussion is best on the mailing list. In general Bugzilla is best for > straight-forward bugs rather than design discussions. > > > On Fri, 8 Apr 2011, Glenn Maynard wrote: > > > > FWIW, I thought the block-of-text configuration string was peculiar and > > unlike anything else in the platform. I agree that using a > > configuration object (of some kind) makes more sense. > > An object wouldn't work very well as it would add additional steps in the > case where someone just wants to transmit the configuration information to > the client as data. Using JSON strings as input as Harald suggested could > work, but seems overly verbose for such a simple data. > I have a feeling that this configuration information will only start off simple. > > -- > Ian Hickson U+1047E )\._.,--....,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 11 April 2011 23:17:30 UTC