Media Capture Task Force Teleconference -- 09 Oct 2012

<trackbot> Date: 09 October 2012

<scribe> scribe: Josh_Soref

Minutes Approval

<stefanh> MoM last meeting: http://lists.w3.org/Archives/Public/public-media-capture/2012Aug/0149.html

Resolution: Minutes from last meeting are approved

capture settings of a MediaStreamTrack

<Travis> http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/proposals/SettingsAPI_proposal_v4.html

Travis: talking about the proposal made last week
... this is an update of multiple previous proposals
... particularly for device settings, such as microphones/web cameras
... the first section describes a proposal to remove the existing notion of a LocalMediaStream
... along with the rationale
... the second section describes how we propose creating multiple kinds of track objects
... today we have a vanilla-generic MediaStream Track object
... this proposal factors it out into Video and Audio Track objects
... and further factors them to Video and Audio devices
... the third section describes the mechanism for making changes to settings
... and reading seettings back
... a setting can either take an Enumerated set of values, or a Range of values
... it also provides a list of proposed settings
... for Cameras
... as well as for Microphones
... and it describes the event(s) that fire as a result of a settings change
... the fourth section covers a Device List
... a way for a web developer to discretely discover devices
... starting from getUserMedia
... the Device List is a list of obtainable objects
... but a web page wouldn't automatically get it
... the fifth, and last section, is a proposed set of Constraints relating to section 3
... for use with getUserMedia
... there's also examples for how this would work to accomplish scenarios
... let me recap the feedback i've received so far
... very little feedback about section 1
... section 2 has received little feedback
... it harmonizes with a counter proposal that richt_ made last month
... it's essentially what he proposed
... it introduces the concept of a Picture Device Track
... i expected to hear feedback on this
... i'm curious to know the group's thoughts on that
... section 3... has received feedback on the mechanism for changing settings
... what happens when devices decide to alter settings as a result of the environment
... and how we respond to that
... and how we use the events (constraintSuccess, constraintError)
... most of the feedback is about section 4, the device list
... most of the feedback is about privacy
... if i approve one camera, that doesn't imply i'm approving all cameras.
... that's good feedback, i'm working on how we could preserve this structure

<jesup> that's how I read it

ekr: my understanding is that you could only enumerate one type
... once you've been given permission for that type?
... under no circumstances is approval of the front camera permission for access to the obverse camera

Travis: i was very lenient at first about privacy issues

<jesup> This goes back to the entire 'fingerprinting' issue

Travis: initially you can access a list of other devices of the same class

ekr: it's imperative that there's no access to devices beyond what the user provides
... there's a distinct question relating to fingerprinting

Travis: i think i understand your feedback

ekr: you should be able to interrogate the list of devices at any time
... but any request to activate must be associated with a user action

Travis: i think i agree with that
... we have another proposal variant
... which allows for inspection, but not enabling without consent

ekr: i understand people objecting to enumeration
... but the people i speak to in security view access as a security block

adambe: this relates to capabilities
... a range from all information about a camera
... down to is video/is audio
... down to nothing
... allowing an application to inspect the whole list is
... XXa1

hta: there's a shift in the thinking about this
... i think people objected to getCapabilities
... if people have stopped objecting to that, it's certainly the simplest way forward

adambe: i think we had consensus around hasAudio/hasVideo

hta: i think we had consensus on deviceCount
... but not a clear consensus on what makes an application trusted

adambe: i think that's correct
... not hearing someone objecting to unrestricted enumeration
... doesn't indicate there isn't objection

anant: w3's security WG released a statement that "fingerprinting is no longer an issue"

<dom> (where was that statement made?)

anant: i think we're ok with enumeration
... enumeration is ok, but actual access is ok

hta: anant, does enumeration include device capabilities?

anant: are you talking about returning constraints?

hta: yes

anant: i think that's fine
... whatever we return in the list, i think is fine to return

hta: working hypothesis: any application can use getCapabilities at any time

Travis: i'd like to voice my word of caution
... i backed the word of caution about not exposing arbitrary attributes
... based on the principle of fingerprinting
... while this may seem contradictory
... if a user has approved "a camera"
... i've crossed the first bridge
... and then if we take this a step further, allow the application to request permission for additional resources
... i'm not sure i'm comfortable with getCapabilities in a general sense

<dom> +1 on not comfortable on general getCapabilities

gmandyam: you mentioned later in the document
... where would you have Photo capabilitiies?

Travis: the Video Device (like a web camera) may provide a Picture Device
... and you can use that device to apply settings to a high resolution picture
... those settings don't apply to the picture stream
... they only apply to the takePicture API

<jesup> This seems a reasonable way to handle video with pictures

gmandyam: i didn't understand how preview would work
... wrt takePicture

<jesup> video stream is preview

gmandyam: you should have a video stream continuously during the takePicture

Travis: my thought is that the VideoDevice lets you configure your Video stream
... you can go into the PictureDevice
... which may support a 12mp resolution (i.e. much better than video)
... you could request that resolution on the PictureDevice
... that wouldn't affect your Video element
... but takePicture would apply those settings
... take the large (12mp image) and then return back to the Video stream resolution
... i spoke w/ the MS video team this morning
... relating to hta 's comment about cameras that dynamically resize their output for different reasons
... some cameras put settings for the camera to the maximum
... and the camera drivers resample it down for video
... so the sensor is working at high res
... that may dramatically reduce framerate

<jesup> This matches the general thrust of what Mozilla was thinking of in picture capture IMHO

anant: i like takePicture
... we have an api we've implemented
... do you feel constraints for a PictureDevice are significantly different from a Video stream?
... to me, the answer seems to be yes
... filter/autofocus

<jesup> Pictures tend to have an almost-infinite set of parameters :-)

anant: for Firefox OS, we have an autofocus

<anant> https://wiki.mozilla.org/WebAPI/CameraControl

anant: You mentioned permissions

<anant> http://lists.w3.org/Archives/Public/public-webappsec/2012Sep/0048.html

anant: in that message, he says that users concerned about tracking will need a special UA
... the UX for that doesn't seem great
... the first is "allow enumerate" and then "pick a camera"
... if we can get a nicer experience with only one popup, and somehow do enumeration after authorization
... i'm ok with that
... how do you intend to expose Device List?

Travis: you get it from an existing Device object

anant: that seems convoluted
... i'd prefer a simpler approach
... sophisticated apps will want to enumerate first
... and then pick a device

[ time check: 5 minutes remaining for this topic ]

<Zakim> Josh_Soref, you wanted to note that some video cameras support auto focus

adambe: on anant 's comments
... trying to enumerate first triggers two popups
... for every device, there's at least one popup

ekr: in Aurora, the popup has a chooser
... to let you pick the device you want
... if you look at Google Hangouts
... it has an in content interface to select which devices you want
... how much of that interface would continue to be possible under WebRTC
... what should a site be able to do?
... as that would inform what to offer the user
... i don't want two choosers
... and have that be XXek for the user

adambe: last week we were cautious about fingerprinting
... and today we aren't
... it feels strange

<Zakim> dom, you wanted to ask about statement on fingerprinting and to temperate the extent of "rough consensus on giving up on fingerprinting" and to test his mike

dom: anant, thanks for that link
... i wouldn't say that their link is a statement of the world in W3C
... it's limited to web apps sec
... i was on a privacy call two weeks ago
... and i don't think that's their view
... i don't think that view is broadly accepted
... i'm happy to take an action to research that

ekr: i cochair webappsec with bradh
... it wasn't a statement on behalf of the Web App Sec WG
... we should probably have a meeting at TPAC to talk about this

dom: i agree that it makes sense to talk about this at TPAC

<hta> ACTION: dom to clarify W3C position on fingerprinting [recorded in http://www.w3.org/2012/10/09-mediacap-minutes.html#action01]

<trackbot> Created ACTION-10 - Clarify W3C position on fingerprinting [on Dominique Hazaël-Massieux - due 2012-10-16].

dom: coming back to this WG
... i'd be very cautious about making design decisions assuming this is no longer a concern

Constraints and Memory

hta: once you get a device
... after having specified constraints
... take as a given that some devices will change their configuration
... should an application expect a device to stay within constraints?
... or should they expect it wanders outside?
... if we ask to change its configuration
... can we expect that all previously applied constraints are still applicable (unless overridden)

XXcc: XXcd?

Travis: i want to question that devices will change their configuration
... that may be true for a peer connection
... but for a device (camera/microphone), it's never the device
... but perhaps the OS that responds to input

hta: i was using "Device" as shorthand for "device, drivers, and everything else beyond the browser"

<jesup> Disagree, dsp-enabled-cameras will adapt frame rates with no OS input I believe

hta: mac cameras are famous for adjusting framerate under low-light conditions

Travis: that's the mac os doing it
... not the camera under its own volition

hta: it's hard to see where that line is
... if we accept "device" as "everything below the api surface"

Travis: the platform evolves
... we have apis exposing environmental sensors
... you may want to implement these in the application itself
... we should provide the way to do those things if you want to
... make the assumption that the device is a consistent mechanism
... apply state, read state
... be able to depend on that

hta: i'm skeptical

<Zakim> dom, you wanted to ask about modularity and schedule (for a change :)

dom: this api brings a number of fairly deep changes
... i'm wondering what the plan is around the schedule for this set of features
... is this part of the main spec
... is it a distinct module?
... are we slipping our schedule?

stefanh: feedback we've gotten is that the MediaStream api wasn't supported
... people wanted additional features
... i guess we're slipping

dom: does that mean implementers aren't shipping getUserMedia?
... i know MS doesn't announce shipping plans
... maybe mozilla can comment?

anant: we want to support getUserMedia and MediaStream
... we don't support everything
... our intention is to support everything from getUserMedia/MediaStream as in the draft

dom: that conditions the work of the simple getUserMedia api

so sticking our schedule seems reasonable

jesup: about hardware/dumb-hardware/smart-hardware
... my experience from embedded devices
... webcams do adaptations automatically unless you stop them
... maybe the OS can do this
... whether the OS/camera does it
... the framerate varies according to light level
... we shouldn't assume the hardware is dumb
... assume the hardware may be more active than that
... be prepared for that
... it's going to be
... and in many cases it already is

stefanh: ...
... can you elaborate on the relation between getUserMedia constraints and constraints in the request operation

Travis: the proposal defines constraints for Video/Audio in section 5
... e.g. a width/height constraint
... either a number or min-max range
... the request api
... when you invoke for a settings change
... they build up iteratively
... so if you change 1024x768 to 800x600
... you request 800x600
... each time you make a request, you build onto the structure being generated for you
... when your context ends, the constraints being built are applied
... a question applies to specific values or ranges

stefanh: if you start with 25-30hz
... and then 15hz?
... if it's outside your original constraints, is that ok?

Travis: if you specify within the device range
... but outside the getUserMedia request
... you still try to honor that

hta: any other comments?

Recording API proposal

<dom> (I guess we haven't quite determined how we integrate this in the spec, but we can figure that after the call)

Jim_Barnett: 4 high level questions

<stefanh> http://lists.w3.org/Archives/Public/public-media-capture/2012Oct/0010.html

Jim_Barnett: do we want recording to be a separate interface or a partial?

<jesup> separate interface++

Jim_Barnett: a lot of people like a separate one
... Travis identified not likely allowing overlapping recordings
... what's the relationship between recording and media capture?
... if there are separate apis, we might be able to make things simpler with a lower level api?
... XXf?
... do we think there are any MTI formats?

<jesup> I suggest to the list generally

Travis: i think i should bring up the background of the Track object instead of a MediaStream
... we started off trying to record a MediaStream
... which is what sane person would have thought would work
... after trying to get some data out of a stream
... you have to face that a MediaStream is mutable
... tracks can come/go at any time
... as a recorder, trying to latch onto a media stream
... you have to specify the behavior of your recorder under all of those changing conditions
... that's how we ended up specifying a Track level based Recorder

jesup: i understand the concern about MediaStream v. Tracks
... but trying to integrate Tracks and synchronize them seems to be hard
... for the based non mutating case it seems nice to solve this

Jim_Barnett: if we keep the track level api, you can do the more sophisticated thing with that
... hta 's suggestion
... if your format can handle it, great, if not, it gets an error
... but if we don't have mandatory formats
... then recorders will behave very differently on different platforms

hta: recordings will be failed
... for many reasons
... saving because the browser ran out of disk for temporary storage

<jesup> or fail

hta: if a recording fails because you ask for something the stream doesn't support

Travis: that's a fair assumption
... when i discussed Recording with the MS Media Folks
... they assumed all different Tracks in the Media Stream
... would be layered into a container format that could be supported
... they asked for a track limit
... i said we're going to have only one track
... but i learned there are container formats that support multiple tracks
... we can support say 2 tracks and set that as a cap for a recording

<jesup> DVD's (mpeg2-ts I assume) can have N video tracks, and N audio tracks i believe

stefanh: for a media element, it's specified so tracks can come and go

Jim_Barnett: don't they have a content primary track?

stefanh: they used to, but last i checked, they didn't really
... for recorder, you should record all media tracks

Jim_Barnett: so for recorder, it should try to record everything
... and then have it throw if it fails?
... for more complicated recording, you'd have to pull the data out into your own object and record that
... if we make them the same interface, it simplifies things

gmandyam: it looks like w8 was the inspiration for this
... the android api allows for setting an audio interface and a video interface
... why didn't you just do that?

Jim_Barnett: the track by track basis
... for media applications, you need access to just one track

<hta> Android media recorder: http://developer.android.com/reference/android/media/MediaRecorder.html

Jim_Barnett: for video tracking, you need just the video

<jesup> If you need to work on a single track, create a derivative MediaStream with one track

Jim_Barnett: for speech recognition, you need just the audio track

<jesup> MediaStream Processing API :-)

gmandyam: i don't know XXq

Jim_Barnett: there's no way to access video/media in their own format
... we need an api to ask for media in a known format

Travis: why do we latch onto MediaStream/Track v. a standalone recorder?

gmandyam: yep

adambe: say you have a media stream
... it has a video track playing
... and another video track starts playing
... do you expect to have 2 media tracks?
... suddenly the content is switched

Travis: we don't know
... and they're complicated problems to figure out

Jim_Barnett: there's very little structure to Media Streams/Tracks
... you could try to assume there's a primary track
... but that may work for some cases, but not others
... that's a reason to have a low level api

adambe: say there's a conference
... and one pair records the conference
... a viewer might want to be able to switch between the different participants
... recording a stream is exactly as it'd look in a video element
... the resulting thing

Jim_Barnett: if a viewer could switch during playback
... you'd include all in the file

<stefanh> agree to Jim

Jim_Barnett: and the viewer would choose

adambe: while that's neat

<jesup> stefanh is correct

adambe: i think that it's more reasonable to just record the visible track

Jim_Barnett: you could have a MediaStream where it has 4 Tracks
... each of which is being displayed

Travis: we could think of the recorder as a Destination for a MediaStream
... instead of part of the Pipeline
... the Recorder could build a notion of a primary track
... putting the control into the application
... getting away from the view of the application

hta: you might want to look at the Web Audio API proposal
... it's implemented in Chrome on Mac
... there you can get Audio from a MediaStream track
... i think that's implemented as a destination

adambe: we have the notion of enabled/disabled tracks in a stream
... but i think we're moving away from that

hta: the proposal has gotten a deal of feedback. we'll take it to the list

Direct assignment

hta: createURL()
... instead of doing that on the video source
... we have an attribute on the <video>

anant: roc has posted to the list
... we're not excited about using URLs to represent a MediaStream
... for many reasons as noted on the list
... adam bart has pointed out the many reasons why assignment to src of anything but an URL is a bad idea
... so there's a compromise
... it's also easy for the developer to grok
... i don't see advantages to representing Media Stream as a URL

<jesup> This is available now in Mozilla Nightly and soon Aurora as video.mozSrcObject = stream

anant: it adds a new element to a spec outside our group's specification domain
... our proposal is to not support URLs at all
... there are lifetime issues
... you have to come up with a new url prefix
... which is icky

adambe: any version of direct assigning
... will require some changes to the <video> element

dom: if it's just adding a new attribute to the <video> element interface

<stefanh> guess it is the media element

dom: we could define it as a supplemental interface
... using the WebIDL partial interface {}
... if we need to specify how video.src is handled
... we'd probably need to coordinate with HTML WG

stefanh: maybe we don't need to change the src attribute
... but we'd need to define what happens if you have video.src and then do video.somethingElse =

anant: i think we can define all those behaviors in the partial interface

dom: i think we should try to do something on our
... own
... but we should coordinate with them

hta: i'll assign an action to anant

<hta> ACTION: anant to write up proposal for srcObject [recorded in http://www.w3.org/2012/10/09-mediacap-minutes.html#action02]

<trackbot> Created ACTION-11 - Write up proposal for srcObject [on Anant Narayanan - due 2012-10-16].

AOB

anant: is this the last tel-conf before tpac?

hta: yes

anant: see everyone there

<hta> ACTION: anant to Send a proposal about device enumeration. [recorded in http://www.w3.org/2012/10/09-mediacap-minutes.html#action03]

<trackbot> Created ACTION-12 - Send a proposal about device enumeration. [on Anant Narayanan - due 2012-10-16].

hta: thanks Josh_Soref for scribing

Josh_Soref: see everyone at TPAC

trackbot, end meeting

<jesup> Josh_Soref: ++

Media Capture Task Force Teleconference

09 Oct 2012

Attendees

Contents

Minutes Approval

capture settings of a MediaStreamTrack

Constraints and Memory

Recording API proposal

Direct assignment

AOB

Summary of Action Items