- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 12 Apr 2011 02:09:23 +0000 (UTC)
On Tue, 29 Mar 2011, Robert O'Callahan wrote: > Ian Hickson wrote: > > > > I agree that (on the long term) we should support stream filters on > > streams, but I'm not sure I understand <video>'s role in this. > > Wouldn't it be more efficient to have something that takes a Stream on > > one side and outputs a Stream on the other, possibly running some > > native code or JS in the middle? > > We could. > > I'm trying to figure out how this is going to fit in with audio APIs. > Chris Rogers from Google is proposing a graph-based audio API to the W3C > Audio incubator group which would overlap considerably with a Stream > processing API like you're suggesting (although in his proposal > processing nodes, not streams, are first-class). Indeed. I think it would make sense to have nodes in this graph that could take Streams as input, or output the resulting data as Streams. > A fundamental problem here is that HTML media elements have the > functionality of both sources and sinks. Indeed. Unfortunately, at the time that we were designing <video>, the later needs of multitrack video and video conferencing were not completely clear. If we could go back, I think it would make sense to split the part of <video> that does network traffic and the part of <video> that does rendering and UI control from each other, if only to make it possible now to have them be split further for video conferencing and multitrack. Sadly, that's not really an option. > You want to see <video> and <audio> only as sinks which accept streams. > But in that case, if we create an audio processing API using Streams, > we'll need a way to download stream data for processing that doesn't use > <audio> and <video>, which means we'll need to replicate <src> elements, > the type attribute, networkstates, readystates, possibly the 'loop' > attribute... should we introduce a new object or element that provides > those APIs? How much can be shared with <video> and <audio>? Should we > be trying to share? (In Chris Rogers' proposal, <audio> elements are > used as sources, not sinks.) I think at this point we should probably just make media elements (<video> and <audio>) support being used both as sources and as sinks. They'll just be a little overweight when used just for one of those purposes. Basically I'm suggesting viewing media elements like this: URL to network resource URL to Stream object URL to Blob object | | ---------------------------- +-> :SINK SOURCE: -+ ------------. T .----------- | | | | | | Input for | | Audio API | | \ / \ / V DISPLAY AND SOUND BOARD It's a source that happens to have built-in monitor output. Or a sink that happens to have a monitor output port. Depending on how you want to see it. On Tue, 29 Mar 2011, Harald Alvestrand wrote: > > A lot of firewalls (including Google's, I believe) drop the subsequent > part of fragmented UDP packets, because it's impossible to apply > firewall rules to fragments without keeping track of all fragmented UDP > packets that are in the process of being transmitted (and keeping track > would open the firewalls to an obvious resource exhaustion attack). > > This has made UDP packets larger than the MTU pretty useless. So I guess the question is do we want to limit the input to a fixed value that is the lowest used MTU (576 bytes per IPv4), or dynamically and regularly determine what the lowest possible MTU is? The former has a major advantage: if an application works in one environment, you know it'll work elsewhere, because the maximum packet size won't change. This is a erious concern on the Web, where authors tend to do limited testing and thus often fail to handle rare edge cases well. The latter has a major disadvantage: the path MTU might change, meaning we might start dropping data if we don't keep trying to determine the Path MTU. Also, it's really hard to determine the Path MTU in practice. For now I've gone with the IPv4 "minimum maximum" of 576 minus overhead, leaving 504 bytes for user data per packet. It seems small, but I don't know how much data people normally send along these low-latency unreliable channels. However, if people want to instead have the minimum be dynamically determined, I'm open to that too. I think the best way to approach that would be to have UAs implement it as an experimental extension at first, and for us to get implementation experience on how well it works. If anyone is interested in doing that I'm happy to work with them to work out a way to do this that doesn't interfere with UAs that don't yet implement that extension. On Tue, 29 Mar 2011, Harald Alvestrand wrote: > On 03/29/11 03:00, Ian Hickson wrote: > > On Wed, 23 Mar 2011, Harald Alvestrand wrote: > > > > > > > > Is there really an advantage to not using SRTP and reusing the RTP > > > > format for the data messages? > > > > Could you elaborate on how (S)RTP would be used for this? I'm all in > > favour of defering as much of this to existing protocols as possible, > > but RTP seemed like massive overkill for sending game status packets. > > If "data" was defined as an RTP codec ("application/packets?"), SRTP > could be applied to the packets. > > It would impose a 12-byte header in front of the packet and the > recommended authentication tag at the end, but would ensure that we > could use exactly the same procedure for key exchange We already use SDP for key exchange for the data stream. > multiplexing of multiple data streams on the same channel using SSRC, I don't follow. What benefit would that have? > and procedures for identifying the stream in SDP (if we continue to use > SDP) - I believe SDP implicitly assumes that all the streams it > describes are RTP streams. That doesn't seem to be the case, but I could be misinterpreting SDP. Currently, the HTML spec includes instructions on how to identify the stream in SDP; if those instructions are meaningless due to a misunderstanding of SDP then we should fix it (and in that case, it might indeed make a lot of sense to use RTP to carry this data). > I've been told that defining RTP packetization formats for a codec needs > to be done carefully, so I don't think this is a full specification, but > it seems that the overhead of doing so is on the same order of magnitude > as the currently proposed solution, and the security properties then > become very similar to the properties for media streams. There are very big differences in the security considerations for media data and the security considerations for the data stream. In particular, the media data can't be generated by the author in any meaningful way, whereas the data is entirely under author control. I don't think it is safe to assume that the security properties that we have for media streams necessarily work for data streams. On Tue, 29 Mar 2011, Harald Alvestrand wrote: > > > > > > > > Recording any of these requires much more specification than just > > > > "record here". > > > > Could you elaborate on what else needs specifying? > > One thing I remember from an API design talk I viewed: "An ability to > record to a file means that the file format is part of your API." Indeed. > For instance, for audio recording, it's likely that you want control > over whether the resulting file is in Ogg Vorbis format or in MP3 > format; for video, it's likely that you may want to specify that it will > be stored using the VP8 video codec, the Vorbis audio codec and the > Matroska container format. These desires have to be communicated to the > underlying audio/video engine, so that the proper transforms can be > inserted into the processing stream Yes, we will absolutely need to add these features in due course. Exactly what we should add is something we have to determine from implementation experience. > and I think they have to be communicated across this interface; since > the output of these operations is a blob without any inherent type > information, the caller has to already know which format the media is > in. Depending on the use case and on implementation trajectories, this isn't a given. For example, if all UAs end up implementing one of two codec/container combinations and we don't expose any controls, it may be that the first few bytes of the output file are in practice enough to fully identify the format used. Note also that Blob does have a MIME type, so even without looking at the data itself, or at the UA string, it may be possible to get a general idea of the container and maybe even codecs. On Wed, 30 Mar 2011, Stefan H?kansson LK wrote: > > This is absolutely correct, and it is not only about codecs or container > formats. Maybe you need to supply info like audio sampling rate, video > frame rate, video resolution, ... There was an input on this already > last November: > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-November/029069.html Indeed. The situation hasn't changed since then: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-February/030484.html On Tue, 29 Mar 2011, Stefan H?kansson LK wrote: > > > > > The web application must be able to define the media format to > > > > > be used for the streams sent to a peer. > > > > > > > > Shouldn't this be automatic and renegotiated dynamically via SDP > > > > offer/answer? > > > > > > Yes, this should be (re)negotiated via SDP, but what is unclear is > > > how the SDP is populated based on the application's preferences. > > > > Why would the Web application have any say on this? Surely the user > > agent is in a better position to know what to negotiate, since it will > > be doing the encoding and decoding itself. > > The best format of the coded media being streamed from UA a to UA b > depends on a lot of factors. An obvious one is that the codec used is > supported by both UAs.... As you say much of it can be handled without > any involvement from the application. > > But let's say that the app in UA a does "addStream". The application in > UA b (the same application as in UA a) has two <video> elements, one > using a large display size, one using a small size. The UAs don't know > in which element the stream will be rendered at this stage (that will be > known first when the app in UA b connects the stream to one of the > elements at "onaddstream"), so I don't understand how the UAs can select > a suitable video resolution without the application giving some input. > (Once the stream is being rendered in an element the situation is > different - then UA b has knowledge about the rendering and could > somehow inform UA a.) I had assumed that the video would at first be sent with some more or less arbitrary dimensions (maybe the native ones), and that the receiving UA would then renegotiate the dimensions once the stream was being displayed somewhere. Since the page can let the user change the <video> size dynamically, it seems the UA would likely need to be able to do that kind of dynamic update anyway. On Thu, 31 Mar 2011, Lachlan Hunt wrote: > > When getUserMedia() is invoked with unknown options, the spec currently > implicitly requires a PERMISSION_DENIED error to be thrown. > > e.g. navigator.getUserMedia("foo"); > > In this case, the option for "foo" is unknown. Presumably, this would > fall under platform limitations, and would thus jump from step 11 to the > failure case, and throw a permission denied error. > > We are wondering if this is the most ideal error to throw in this case, > as opposed to introducing a more logical NOT_SUPPORTED error, and if it > might be useful to authors to distinguish these cases? > > We assume, however, that if the author requests "audio,foo", and the > user grants access to audio, then the success callback would be invoked, > despite the unknown option for "foo". Good point. I've updated the spec to fire NOT_SUPPORTED_ERR if there's no known value. On Fri, 8 Apr 2011, Harald Alvestrand wrote: > > The current (April 8) version of section 9.4 says that the config string for a > PeerConnection object is this: > --------------------------- > The allowed formats for this string are: > > "TYPE 203.0.113.2:3478" > Indicates a specific IP address and port for the server. > > "TYPE relay.example.net:3478" > Indicates a specific host and port for the server; the user agent will look up > the IP address in DNS. > > "TYPE example.net" > Indicates a specific domain for the server; the user agent will look up the IP > address and port in DNS. > > The "TYPE" is one of: > > STUN > Indicates a STUN server > STUNS > Indicates a STUN server that is to be contacted using a TLS session. > TURN > Indicates a TURN server > TURNS > Indicates a TURN server that is to be contacted using a TLS session. > ------------------------------- > I believe this is insufficient, for a number of reasons: > - For future extensibility, new forms of init data needs to be passed without > invalidating old implementations. This indicates that a serialized JSON object > with a few keys of defined meaning is a better basic structure than a format > string with no format identifier. The format is already defined in a forward-compatible manner. Specifically, UAs are currently required to ignore everything past the first line feed character. In a future version, we could extend this API by simply including additional data after the linefeed. > - For use with STUN and TURN, we need to support the case where we need a STUN > server and a TURN server, and they're different. TURN servers are STUN servers, at least according to the relevant RFCs, as far as I can tell. Can you elaborate on which TURN servers do not implement STUN, or explain the use cases for having different TURN and STUN servers? This is an area where I am most definitely not an expert, so any information here would be quite helpful. > - The method of DNS lookup is not specified. In particular, it is not > specified whether SRV records are looked up or not. This seems to be entirely specified. Please ensure that you are reading the normative conformance criteria for user agents, and not the non-normative authoring advice, which is only a brief overview. > - We have no evaluation that shows that we'll never need the unencrypted > TCP version of STUN or TURN, or that we need to support the encrypted > STUN version. We should either support all formats that the spec can > generate, or we should get a reasonable survey of implementors on what > they think is needed. If anyone has any data on this, that would indeed be quite useful. On Fri, 8 Apr 2011, Harald Alvestrand wrote: > > BTW, I haven't been on this list that long... if anyone has advice on > whether such discussions are better as buganizer threads or as whatwg > mailing list threads, please give it! Discussion is best on the mailing list. In general Bugzilla is best for straight-forward bugs rather than design discussions. On Fri, 8 Apr 2011, Glenn Maynard wrote: > > FWIW, I thought the block-of-text configuration string was peculiar and > unlike anything else in the platform. I agree that using a > configuration object (of some kind) makes more sense. An object wouldn't work very well as it would add additional steps in the case where someone just wants to transmit the configuration information to the client as data. Using JSON strings as input as Harald suggested could work, but seems overly verbose for such a simple data. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 11 April 2011 19:09:23 UTC