[![W3C][1]][2] # Media Capture Task Force Teleconference ## 09 Oct 2012 [Agenda][3] See also: [IRC log][4] ## Attendees Present Adam_Bergkvist, Anant_Narayanan, Dominique_Hazael-Massieux, Eric_Rescorla, Giri_Mandyam, Harald_Alvestrand, Jim_Barnett, Josh_Soref, Randell_Jesup, Stefan_Hakansson, Travis_Leithead Regrets Chair hta, stefanh Scribe Josh_Soref ## Contents * [Topics][5] 1. [Minutes Approval][6] 2. [capture settings of a MediaStreamTrack][7] 3. [Constraints and Memory][8] 4. [Recording API proposal][9] 5. [Direct assignment][10] 6. [AOB][11] * [Summary of Action Items][12] * * * Date: 09 October 2012 scribe: Josh_Soref ### Minutes Approval MoM last meeting: [http://lists.w3.org/Archives/Public/public-media- capture/2012Aug/0149.html][13] Resolution: Minutes from last meeting are approved ### capture settings of a MediaStreamTrack [http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream- capture/proposals/SettingsAPI_proposal_v4.html][14] Travis: talking about the proposal made last week ... this is an update of multiple previous proposals ... particularly for device settings, such as microphones/web cameras ... the first section describes a proposal to remove the existing notion of a LocalMediaStream ... along with the rationale ... the second section describes how we propose creating multiple kinds of track objects ... today we have a vanilla-generic MediaStream Track object ... this proposal factors it out into Video and Audio Track objects ... and further factors them to Video and Audio devices ... the third section describes the mechanism for making changes to settings ... and reading seettings back ... a setting can either take an Enumerated set of values, or a Range of values ... it also provides a list of proposed settings ... for Cameras ... as well as for Microphones ... and it describes the event(s) that fire as a result of a settings change ... the fourth section covers a Device List ... a way for a web developer to discretely discover devices ... starting from getUserMedia ... the Device List is a list of obtainable objects ... but a web page wouldn't automatically get it ... the fifth, and last section, is a proposed set of Constraints relating to section 3 ... for use with getUserMedia ... there's also examples for how this would work to accomplish scenarios ... let me recap the feedback i've received so far ... very little feedback about section 1 ... section 2 has received little feedback ... it harmonizes with a counter proposal that richt_ made last month ... it's essentially what he proposed ... it introduces the concept of a Picture Device Track ... i expected to hear feedback on this ... i'm curious to know the group's thoughts on that ... section 3... has received feedback on the mechanism for changing settings ... what happens when devices decide to alter settings as a result of the environment ... and how we respond to that ... and how we use the events (constraintSuccess, constraintError) ... most of the feedback is about section 4, the device list ... most of the feedback is about privacy ... if i approve one camera, that doesn't imply i'm approving all cameras. ... that's good feedback, i'm working on how we could preserve this structure that's how I read it ekr: my understanding is that you could only enumerate one type ... once you've been given permission for that type? ... under no circumstances is approval of the front camera permission for access to the obverse camera Travis: i was very lenient at first about privacy issues This goes back to the entire 'fingerprinting' issue Travis: initially you can access a list of other devices of the same class ekr: it's imperative that there's no access to devices beyond what the user provides ... there's a distinct question relating to fingerprinting Travis: i think i understand your feedback ekr: you should be able to interrogate the list of devices at any time ... but any request to activate must be associated with a user action Travis: i think i agree with that ... we have another proposal variant ... which allows for inspection, but not enabling without consent ekr: i understand people objecting to enumeration ... but the people i speak to in security view access as a security block adambe: this relates to capabilities ... a range from all information about a camera ... down to is video/is audio ... down to nothing ... allowing an application to inspect the whole list is ... XXa1 hta: there's a shift in the thinking about this ... i think people objected to getCapabilities ... if people have stopped objecting to that, it's certainly the simplest way forward adambe: i think we had consensus around hasAudio/hasVideo hta: i think we had consensus on deviceCount ... but not a clear consensus on what makes an application trusted adambe: i think that's correct ... not hearing someone objecting to unrestricted enumeration ... doesn't indicate there isn't objection anant: w3's security WG released a statement that "fingerprinting is no longer an issue" (where was that statement made?) anant: i think we're ok with enumeration ... enumeration is ok, but actual access is ok hta: anant, does enumeration include device capabilities? anant: are you talking about returning constraints? hta: yes anant: i think that's fine ... whatever we return in the list, i think is fine to return hta: working hypothesis: any application can use getCapabilities at any time Travis: i'd like to voice my word of caution ... i backed the word of caution about not exposing arbitrary attributes ... based on the principle of fingerprinting ... while this may seem contradictory ... if a user has approved "a camera" ... i've crossed the first bridge ... and then if we take this a step further, allow the application to request permission for additional resources ... i'm not sure i'm comfortable with getCapabilities in a general sense +1 on not comfortable on general getCapabilities gmandyam: you mentioned later in the document ... where would you have Photo capabilitiies? Travis: the Video Device (like a web camera) may provide a Picture Device ... and you can use that device to apply settings to a high resolution picture ... those settings don't apply to the picture stream ... they only apply to the takePicture API This seems a reasonable way to handle video with pictures gmandyam: i didn't understand how preview would work ... wrt takePicture video stream is preview gmandyam: you should have a video stream continuously during the takePicture Travis: my thought is that the VideoDevice lets you configure your Video stream ... you can go into the PictureDevice ... which may support a 12mp resolution (i.e. much better than video) ... you could request that resolution on the PictureDevice ... that wouldn't affect your Video element ... but takePicture would apply those settings ... take the large (12mp image) and then return back to the Video stream resolution ... i spoke w/ the MS video team this morning ... relating to hta 's comment about cameras that dynamically resize their output for different reasons ... some cameras put settings for the camera to the maximum ... and the camera drivers resample it down for video ... so the sensor is working at high res ... that may dramatically reduce framerate This matches the general thrust of what Mozilla was thinking of in picture capture IMHO anant: i like takePicture ... we have an api we've implemented ... do you feel constraints for a PictureDevice are significantly different from a Video stream? ... to me, the answer seems to be yes ... filter/autofocus Pictures tend to have an almost-infinite set of parameters :-) anant: for Firefox OS, we have an autofocus [https://wiki.mozilla.org/WebAPI/CameraControl][15] anant: You mentioned permissions [http://lists.w3.org/Archives/Public/public- webappsec/2012Sep/0048.html][16] anant: in that message, he says that users concerned about tracking will need a special UA ... the UX for that doesn't seem great ... the first is "allow enumerate" and then "pick a camera" ... if we can get a nicer experience with only one popup, and somehow do enumeration after authorization ... i'm ok with that ... how do you intend to expose Device List? Travis: you get it from an existing Device object anant: that seems convoluted ... i'd prefer a simpler approach ... sophisticated apps will want to enumerate first ... and then pick a device [ time check: 5 minutes remaining for this topic ] Josh_Soref, you wanted to note that some video cameras support auto focus adambe: on anant 's comments ... trying to enumerate first triggers two popups ... for every device, there's at least one popup ekr: in Aurora, the popup has a chooser ... to let you pick the device you want ... if you look at Google Hangouts ... it has an in content interface to select which devices you want ... how much of that interface would continue to be possible under WebRTC ... what should a site be able to do? ... as that would inform what to offer the user ... i don't want two choosers ... and have that be XXek for the user adambe: last week we were cautious about fingerprinting ... and today we aren't ... it feels strange dom, you wanted to ask about statement on fingerprinting and to temperate the extent of "rough consensus on giving up on fingerprinting" and to test his mike dom: anant, thanks for that link ... i wouldn't say that their link is a statement of the world in W3C ... it's limited to web apps sec ... i was on a privacy call two weeks ago ... and i don't think that's their view ... i don't think that view is broadly accepted ... i'm happy to take an action to research that ekr: i cochair webappsec with bradh ... it wasn't a statement on behalf of the Web App Sec WG ... we should probably have a meeting at TPAC to talk about this dom: i agree that it makes sense to talk about this at TPAC **ACTION:** dom to clarify W3C position on fingerprinting [recorded in [http://www.w3.org/2012/10/09-mediacap-minutes.html#action01][17]] Created ACTION-10 - Clarify W3C position on fingerprinting [on Dominique Hazaƫl-Massieux - due 2012-10-16]. dom: coming back to this WG ... i'd be very cautious about making design decisions assuming this is no longer a concern ### Constraints and Memory hta: once you get a device ... after having specified constraints ... take as a given that some devices will change their configuration ... should an application expect a device to stay within constraints? ... or should they expect it wanders outside? ... if we ask to change its configuration ... can we expect that all previously applied constraints are still applicable (unless overridden) XXcc: XXcd? Travis: i want to question that devices will change their configuration ... that may be true for a peer connection ... but for a device (camera/microphone), it's never the device ... but perhaps the OS that responds to input hta: i was using "Device" as shorthand for "device, drivers, and everything else beyond the browser" Disagree, dsp-enabled-cameras will adapt frame rates with no OS input I believe hta: mac cameras are famous for adjusting framerate under low-light conditions Travis: that's the mac os doing it ... not the camera under its own volition hta: it's hard to see where that line is ... if we accept "device" as "everything below the api surface" Travis: the platform evolves ... we have apis exposing environmental sensors ... you may want to implement these in the application itself ... we should provide the way to do those things if you want to ... make the assumption that the device is a consistent mechanism ... apply state, read state ... be able to depend on that hta: i'm skeptical dom, you wanted to ask about modularity and schedule (for a change :) dom: this api brings a number of fairly deep changes ... i'm wondering what the plan is around the schedule for this set of features ... is this part of the main spec ... is it a distinct module? ... are we slipping our schedule? stefanh: feedback we've gotten is that the MediaStream api wasn't supported ... people wanted additional features ... i guess we're slipping dom: does that mean implementers aren't shipping getUserMedia? ... i know MS doesn't announce shipping plans ... maybe mozilla can comment? anant: we want to support getUserMedia and MediaStream ... we don't support everything ... our intention is to support everything from getUserMedia/MediaStream as in the draft dom: that conditions the work of the simple getUserMedia api so sticking our schedule seems reasonable jesup: about hardware/dumb-hardware/smart-hardware ... my experience from embedded devices ... webcams do adaptations automatically unless you stop them ... maybe the OS can do this ... whether the OS/camera does it ... the framerate varies according to light level ... we shouldn't assume the hardware is dumb ... assume the hardware may be more active than that ... be prepared for that ... it's going to be ... and in many cases it already is stefanh: ... ... can you elaborate on the relation between getUserMedia constraints and constraints in the request operation Travis: the proposal defines constraints for Video/Audio in section 5 ... e.g. a width/height constraint ... either a number or min-max range ... the request api ... when you invoke for a settings change ... they build up iteratively ... so if you change 1024x768 to 800x600 ... you request 800x600 ... each time you make a request, you build onto the structure being generated for you ... when your context ends, the constraints being built are applied ... a question applies to specific values or ranges stefanh: if you start with 25-30hz ... and then 15hz? ... if it's outside your original constraints, is that ok? Travis: if you specify within the device range ... but outside the getUserMedia request ... you still try to honor that hta: any other comments? ### Recording API proposal (I guess we haven't quite determined how we integrate this in the spec, but we can figure that after the call) Jim_Barnett: 4 high level questions [http://lists.w3.org/Archives/Public/public-media- capture/2012Oct/0010.html][18] Jim_Barnett: do we want recording to be a separate interface or a partial? separate interface++ Jim_Barnett: a lot of people like a separate one ... Travis identified not likely allowing overlapping recordings ... what's the relationship between recording and media capture? ... if there are separate apis, we might be able to make things simpler with a lower level api? ... XXf? ... do we think there are any MTI formats? I suggest to the list generally Travis: i think i should bring up the background of the Track object instead of a MediaStream ... we started off trying to record a MediaStream ... which is what sane person would have thought would work ... after trying to get some data out of a stream ... you have to face that a MediaStream is mutable ... tracks can come/go at any time ... as a recorder, trying to latch onto a media stream ... you have to specify the behavior of your recorder under all of those changing conditions ... that's how we ended up specifying a Track level based Recorder jesup: i understand the concern about MediaStream v. Tracks ... but trying to integrate Tracks and synchronize them seems to be hard ... for the based non mutating case it seems nice to solve this Jim_Barnett: if we keep the track level api, you can do the more sophisticated thing with that ... hta 's suggestion ... if your format can handle it, great, if not, it gets an error ... but if we don't have mandatory formats ... then recorders will behave very differently on different platforms hta: recordings will be failed ... for many reasons ... saving because the browser ran out of disk for temporary storage or fail hta: if a recording fails because you ask for something the stream doesn't support Travis: that's a fair assumption ... when i discussed Recording with the MS Media Folks ... they assumed all different Tracks in the Media Stream ... would be layered into a container format that could be supported ... they asked for a track limit ... i said we're going to have only one track ... but i learned there are container formats that support multiple tracks ... we can support say 2 tracks and set that as a cap for a recording DVD's (mpeg2-ts I assume) can have N video tracks, and N audio tracks i believe stefanh: for a media element, it's specified so tracks can come and go Jim_Barnett: don't they have a content primary track? stefanh: they used to, but last i checked, they didn't really ... for recorder, you should record all media tracks Jim_Barnett: so for recorder, it should try to record everything ... and then have it throw if it fails? ... for more complicated recording, you'd have to pull the data out into your own object and record that ... if we make them the same interface, it simplifies things gmandyam: it looks like w8 was the inspiration for this ... the android api allows for setting an audio interface and a video interface ... why didn't you just do that? Jim_Barnett: the track by track basis ... for media applications, you need access to just one track Android media recorder: [http://developer.android.com/reference/android/media/MediaRecorder.html][19] Jim_Barnett: for video tracking, you need just the video If you need to work on a single track, create a derivative MediaStream with one track Jim_Barnett: for speech recognition, you need just the audio track MediaStream Processing API :-) gmandyam: i don't know XXq Jim_Barnett: there's no way to access video/media in their own format ... we need an api to ask for media in a known format Travis: why do we latch onto MediaStream/Track v. a standalone recorder? gmandyam: yep adambe: say you have a media stream ... it has a video track playing ... and another video track starts playing ... do you expect to have 2 media tracks? ... suddenly the content is switched Travis: we don't know ... and they're complicated problems to figure out Jim_Barnett: there's very little structure to Media Streams/Tracks ... you could try to assume there's a primary track ... but that may work for some cases, but not others ... that's a reason to have a low level api adambe: say there's a conference ... and one pair records the conference ... a viewer might want to be able to switch between the different participants ... recording a stream is exactly as it'd look in a video element ... the resulting thing Jim_Barnett: if a viewer could switch during playback ... you'd include all in the file agree to Jim Jim_Barnett: and the viewer would choose adambe: while that's neat stefanh is correct adambe: i think that it's more reasonable to just record the visible track Jim_Barnett: you could have a MediaStream where it has 4 Tracks ... each of which is being displayed Travis: we could think of the recorder as a Destination for a MediaStream ... instead of part of the Pipeline ... the Recorder could build a notion of a primary track ... putting the control into the application ... getting away from the view of the application hta: you might want to look at the Web Audio API proposal ... it's implemented in Chrome on Mac ... there you can get Audio from a MediaStream track ... i think that's implemented as a destination adambe: we have the notion of enabled/disabled tracks in a stream ... but i think we're moving away from that hta: the proposal has gotten a deal of feedback. we'll take it to the list ### Direct assignment hta: createURL() ... instead of doing that on the video source ... we have an attribute on the