- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Wed, 05 Nov 2014 15:43:18 +0100
- To: public-media-capture@w3.org
Hi,
The minutes of our F2F meeting during TPAC last week are available at:
http://www.w3.org/2014/10/30-mediacap-minutes.html
and copied as text below.
Please send corrections to the list.
Dom
Media Capture F2F (TPAC2014)
30-31 Oct 2014
See also: [2]IRC log
[2] http://www.w3.org/2014/10/30-mediacap-irc
Attendees
Present
Harald, Stefan, Alexandre, Dom, DanBurnett, Justin,
Peter, Martin, EKR, AdamRoach, Jan-Ivar, ShijunS,
BernardA, Cullen, DanDrutta, DanRomascanu, PhilippeCohen
(observer), Josh_Soref (observer), NickDoty (observer)
Remote
anssik, robman, ningxin, Domenic
Regrets
Chair
hta, stefanh
Scribe
dom, alexG, npdoty
Contents
* [3]Topics
1. [4]Welcome
2. [5]Authenticated Origins
3. [6]MediaStreamTrack.ended
4. [7]Audio output device enumeration
5. [8]Next steps for Media Capture and Streams document
6. [9]Mediacapture bugs
7. [10]Media Capture Depth Stream Extensions
8. [11]Screensharing
9. [12]getUserMedia testing
10. [13]Generating MediaStream from HTMLMediaElement
* [14]Summary of Action Items
__________________________________________________________
<inserted> ScribeNick: dom
Welcome
hta: this is the media capture task force meeting as part of
the WebRTC WG F2F
... We're hoping to be at the stage where we stop making big
changes to the document
... we need to review a last set of changes before we're sure
we're done
<stefanh> mom:
[15]http://lists.w3.org/Archives/Public/public-media-capture/20
14Oct/0186.html
[15]
http://lists.w3.org/Archives/Public/public-media-capture/2014Oct/0186.html
stefanh: minutes approved
Authenticated Origins
stefanh: this topic has been discussed in the task force before
... it's been quite a heated debate
hta: we have a TAG representative on the phone
domenic: that would be me
... work for Google, been elected to the TAG ~ a year ago
hta: we wanted someone from the TAG to get some keywords on why
imposing that restriction might be a good idea
domenic: the TAG has made a statement that we should move
sensitive APIs to authenticated origins over time
... i.e. a deprecation plan for using getUserMedia on plain
http
ekr: I don't find that analysis is uncompelling
... the attacker can detour you from an unsecure origin
domenic: a more interesting technical question is whether this
provides protection to the user
... we've seen that it does
<alexG> domenic: we think authenticated origin provide more
protection against proven attacks.
<scribe> scribenick: alexG
ekr: supposing that https is enough even if you connect to an
attacker site is not going to work.
... the problem is not asking GUM access over https
... the problem is to know when you can trust, wether it is
http / https
dominic: at least with https we can have alittl ebit more
control and add red flags in the address bar.
<npdoty> it sounds like ekr is arguing that the user doesn't
have any reason to trust the origin on http-only, but that it
should be allowed in that case anyway
ekr: what is the way forward with this questions in the absence
of specific use case one way or another?
domenic: i just wanted to convey the position of TAG, thank you
hta: this is a good overview of the disagreements today
domenic: .... The specs should be consistent or it is not going
to work. We should make things better for user down the path.
nobody is proposing anything crazy, really.
hta: this is actually how the first message came across. that
might be the reason of the ... strength .... of the reaction
domenic: i m happy i clarified this thing then.
adamR: justin, do you have comment on this conversation?
<npdoty> it sounded to me like TAG was not suggesting a "flag
day"
justin: chrome would like to move to more https in the future,
but breaking existing content is not ok. We should have a multi
year deprecation process, but having a flag day today is not
going to happen.
ekr: couldn t we move ahead with GUM like it is today.
dom: i heard your point, ad i find it compeling
today, GUM is working on any origin, and there dis no
compelling reason that we see to move away from that today.
domenic: the worse outcome would be for users to see that you
guys are putting together specs without making sure we re
looking forward the future, and just saying that s how we do it
today, and so be it. We would like you guys to have a future
direction statement.
ekr: what about a non normative note.
hta: what about a statement like " a conform ant browser MAY
decide to make this available over https only."
<npdoty> that doesn't lend itself to interoperability
ekr: i thought it was the car, but if it s not, i d be happy to
do it."
hta: let s make it the case.
matthew: source selection
<npdoty> +1
domenic: someone brought the interop issue. one problem would
be if one browser would work only over https, and another one
would not, then call would not be established.
ekr: can we stop saying that people that don t want to go for
https don t care about the users.
... we have disagreement here, and let's agree to disagree.
... can we state that everybody wants to do what is right for
the user, we just disagree about how to do it?
domenic: ok
matthew: i was observing that the GUM API as it stands might
not have this property
... but we are having other things in the spec that potentially
change the profile, hum, usage of this thing.
getuserdevices has the possibility to enumerate devices and
expose some things. it does not change the security profile,
but give us the capacity to expose more or less informations
that in turn influence the user decision
martin: is there an opportunity there to use this ?
ekr: my bad :)
dom: we have a rough agreement
... that non normative note is good
domenic: yes, and i encourage that note to encourage https
hta: any volunteer to draft this note
ekr: i suggest that justin does it
hta: ekr has volunteered to do it, and requested help from
justin.
... domenic thank you for having showed up, and enabled this
discussion
dom: we re happy for you to stay of course if you want
domenic: ok, good luck guys
MediaStreamTrack.ended
jib: presenting mediastreamtrack.ended
... i m just going to show the problem
... present a solution
... and then we can speak about it
... JS arrow functions as an example
... <slide 3>
I'm also going to use promises
jib: here is background info and links
burn: and it will also be part of the specs
jib: <slide 4>
... we have this ended event
... the only thing it tells you is that the track has ended.
... two kind of problems
... call could have hanged up or dropped (which one?)
... GUM could have stop capturing, have had permission problem,
or a driver issue (which one?)
... <slide 5>
... so I propose an ended promise
... allows to differentiate between two cases: success and
failure
... consistent with usage of promises for state change
ekr: why not a "started" equivalent?
... my point is that not all those events have state changes
jib: i m not proposing to replace the existing ones, i just
want to show another way to get the errors and differentiate
between several "types" of ended tracks
hta: history of the problem: can we tell the difference between
a track that ended between an error or not
ekr: my concern is API consistency
burn: you did not say you suggested we should remove the
original one
ekr: that s then even worse if we don t remove: we have two
APIs for the same things, and I don t know when to use which
ShijunSun(MS):
is it the right way to handle all the errors, from different
objects
jib: let s get to the second example
<slide 7>
jib: in this example, I don't care if I succeeded, it is just
showing the different syntaxes
... you can do a switch
... i did a pull request where I pull all the existing errors
and show how this would look like.
... here the error can happen upfront, or later on
... you just don t want to end up not catching an error
... and there has been no other proposal that does all that so
far.
juberti: i wouldn t like to use the excuse of having promises
for GUM to use them everywhere else.
ekr: i agree with justin
jib: does seem like ......
shijunshun: +1
stefan: do we have the need for this?
ekr/juberti: yes
jib: you need to ale *some changes*
ekr/justin: yes
jib: then why not using prmises?
ekr: why not in this case, but not all events have that need,
and we should note use promises everywhere because they are
good for GUM.
jib: I hear the consistency argument
... there dis another pattern
... we should use the best language that solves the problem
ekr: i do not thing promises is that language
adam: events happen more than once
jib: ended only happens once
adam: yes, but for other events, it does not stand and promises
should not be used.
<Domenic> promises for state transitions that happen only once
and that people might be interested in after the fact are
strictly better tha nevents
<Domenic> events are not appropriate for that case
ekr: it can t be all promises
<Domenic> and we have only used them because historically they
were all we had
juberti: we changed from having a consistent use of callbacks,
and now we would have some promises and some callbacks, and I
don t like that as it does not bring such added value.
<dom> Domenic, the argument that is being made is that mixing
events and promises for handling events makes for a confusing
API
<Domenic> you have to be looking for consistency among things
that are alike
burn: we spend a lot of time defining which one should be event
and which one should be callbacks
<Domenic> things that can jhappen more than once and things
that happen once are not alike
<dom> well, both describe the object state machine
burn: and we spend a lot of time making sure that programmers
could almost guess from the pattern when one should be used.
dom: we need a tech proposal.
<Domenic> dom: that's fair
ek: right, i m happy to write a proposal
hta: there is such a proposal in the bug that triggered that
proposal
er: even better, i ll do nothing!
hta: there seems to be a rough consensus that we should extend
events and not use promises.
jib: any other questions on this?
... thank you
hta: we are now pretty ahead of schedule
Audio output device enumeration
hta: let's have the audio output device enumeration discussion
<dom> [16]Justin's slides
[16]
https://www.w3.org/wiki/images/d/d6/Output_Device_Selection%2C_TPAC_2014.pdf
juberti: in addition to having enumeration of input device, we
also have the same feature for OUTPUT devices but we have no
way to access it.
... why would we do that? #1 requested feature, before
screensharing and others.
... usage scenario
... changing to usb or bluetooth headset
... right now, haven to change the system settings
<npdoty> does "in chrome" mean "in the web page, not in browser
chrome"?
<slide 4>
juberti: no API for setting devices.
... we have a way to enumerate them, but no way to SET them
<npdoty> why do we even have a way to enumerate output devices?
<dom> npdoty, this idea was to enable this use case
<dom> (even though we've been missing the last piece of that
puzzle)
juberti: you want to avoid a few use case where arbitrary
webpage cannot play content on your audio without user consent.
... a prompt would not be practical
... <slide 5>
... for any mediaElement (<audio. or <video>) would have an ID
... by default, set to empty, and use default output (always
OK, today;s case)
... specific devices could deb set (unsung the enum API info)
is application is authorized. web audio could also use it.
... <slide 7>
... most cases you could use the same group IS for input /
output devices.
... for other apps that would need finer granularity, there
would be another way of doing this.
burn: permission is then for the group by default?
juberti: exactly.
dan: would that show all permutations in the grouping? How do
you define the grouping?
juberti: that s for composite device, they have the same device
ID / group ID.
ekr: what would be the lifetime of the permission ?
juberti: same as GUM.
ekr: as long as origin is tagged, the permission stays.
martin: if you have persistent permission, it means yo have
access to all device at any time.
juberti: yes, if you have access to all INPUT devices, and all
OUTPUT devices are grouped with input device, that s true.
martin: I think we should make it explicit.
juberti: the coupling is quite elegant, and better than just
using input devices.
martin: we don t need more prompt
<npdoty> indeed. why are we pushing this onto the page?
adam: even if I , as an application, have access to all the
input device, i might not have access to all output devices?
<dom> for proper UX, npdoty (it's hard to build a nice user
experience when everything is pushed to the browser chrome)
juberti: you can already enumerate all of them, you just can t
use output device.
... you know by using system permission, y ou already have
practical access to all devices.
<npdoty> dom, the group thinks proper UX is more likely to
happen if we distribute it out to every different web developer
in the world
<npdoty> ?
juberti: i think that 99% will either use the default setting,
or ONE specific coupling they will give permission to.
shijunshun: how to handle one the fly plugging in or plugging
out devices
juberti: not sure yet.
<dom> npdoty, I think that's a correct characterization, yes
martin: ....
shijunshun: we have the notion of a default device, if anything
is plugged in, the headphone has priority, and we fallback to
default automatically. Now, it seems that webrtc would be a
regression from what we propose today.
matrin: i know how to solve that problem i think
martin: there would be physical and logical devices
by using logical devices, then we can switch on the fly between
physical devices.
shijunshun: does not have to be in the OS, could deb in IE.
juberti: enumartion API should preset those so the user know
which one to choose from
shijunshun: iframe might have different settings, so we have to
be careful.
juberti: things working out of iframe would be an issue anyway,
if only for PeerConnection.
shijunshun: my comment was more about the scope. do we want it
restricted? do we want all page to control, including iframe,
kind of overloading iframe settings?
hta: about usage,
... earlier in the week i was in the audio group
... and they are very interesting in using the same mechanism.
shujunshun: great, let's make sure the use case are all
written.
burn: let s say you are in an iframe
... you can only set a device as output if you have permission
to do, even though you could see it in the enum
juberti: well not exactly, i think we can enumerate all, but
you only get access from grouping.
<Zakim> npdoty, you wanted to ask if the cases are so often
coupled, why does it need to be exposed at all?
<someone from W3C> your assumption seems to be that the
coupling is very frequent. It does not seems it need to be
enumerated. and you re also adding a whole list of permission
dialogs.
juberti: this avoids permission dialog
... having a generic API ......
... the API we have here announce that abstraction, but
underneath we have to deal with another layer ...
... we have to deal with the cases where input and output are
not a unique physical device.
npdoty: if the browser does not handle the setting, will the
website allow me to do it.
martin: the site might have many things he wants to do at
different point in time
... if you play music, you might want to keep rendering that
music on the same device.
... but when you have a site that simultaneous plays music and
communication
... you don t really have today the flexibility to handle the
user experience the way you want
phil: many output devices in our case are not "grouped" with
input device
and it s very important for us that the app should be able to
use different devices.
phil: another use case: my son is listening to radio with
headset, while i m watching a movie locally on my coputer
computer
juberti: we did an app that is media mixing, kind of garage
band. There is no input for app. If we are saying that
permission only from GUM, ....
... if you use a real pro audio app, you understand already the
notion of door hanger
<npdoty> it sounds to me like we're suggesting that every page
can choose non-coupled input/output (or maybe it won't have
implemented it), which will cause more permission dialogs, but
the user can also choose it separately on the browser
juberti: for most of web users, the permission is much simpler
<npdoty> and if the user sets it in their browser first and
then the site wants to change it?
juberti: but this API also gives us the capacity to use door
hanger for more professional apps.
<npdoty> and if the user asks where they're supposed to
configure audio output? in the site or in the browser? or in
the site first but maybe overridden in the browser?
ekr: do I understand that the goal is to allow a website to
minimize the number of prompts for the most generic cases
juberti: yes
... 90% would be: use the default or use that one specific set
of devices.
fluffy:
<ekr> I use the system microphone and the headset for sound
fluffy: the app would enumerate the devices
... the app would then ask permission for a specific device
... the door hanger would then kick in, and the app would get
access?
juberti: yes, or you have given persistent permission to that
device beforehand and the door hanger would not even be needed
martin: is there a need for labels for groups ....?
... you said default and default group
juberti: ....
martin: the use case you mentioned is only for app that are
already using this API
juberti: yes
martin: then they should be aware of this problem, and have an
UI, and so on
<npdoty> mt, because you think developers are unlikely to make
mistakes about edge cases of hardware?
juberti: well, yes, but they could still make a bad choice.
generally, the complexity is not transferred to the app.
dan: ... would it be good to be able to select input/output
only to simplify the list ?
juberti: practicalities make it something we don t want.
phil: is there a way that JS know in advance which permission
it has access to?
juberti: yes
phil: some devices are also accessible, how do we populate the
drop down with that?
juberti: good point
... how do we do for output device, what we do with the input
device? .... that s a good question, i need to think about
that.
phil: enumarate device might prompt once for allowing ALL
devices to be used. so the enumerate API also allow them in one
step.
juberti: yes , could do that, but it would be difficult to
understand by users.
martin:
<npdoty> it would be a new permission model to say you get
permission to things that are less egregious than any
permissions you've already granted.
juberti/martin: discussion about how to do it right.
phil: just to clarify, i just want a way for the user to enable
all the output device.
juberti: we might b something new to enable what you propose
burn: the persistent permission implies access to all input
devices
... and that surprises me
... <reading specs>
... I'm realizing that we actually give permission to ALL
devices, while I thought it would give permission for a
specific device (the one i agree on in the prompt)
... the implementation consequences are minimal (at least in
chrome), but for the user it s quite a shock, I was not
personally aware that i was giving away that much
dom: we have to contact other groups for that discussion. e.g.
web audio, HTMLMediaElement belongs to another group and so one
and so forth. We need cross group coordination.
juberti: I think we need to document the attack scenario, and
reach consensus at least within the group before we bring it to
other groups.
dom: my perspective is that we should drealy try to spec it
juberti: how do you do it?
dom: you do a partial interface .....
<Zakim> dom, you wanted to ask where to spec this, talk about
coordination with other groups
juberti: yes, that would be way more efficient
ekr: the problem i typically run into is when i am using the
system microphone, with a non standard headset.
ekr .....
ekr: there are also hierarchy of devices .....
dom: next steps?
juberti: take this proposal and make it into a pull request
against existing specs
dom: I would make it a spec on its own.
juberti: ok, is there a template, and where should that thing
reside?
dom: i can guide you.
juberti: ok great, i know who to delegate to.
... i also think that there are a couple of questions that
showed up here today and should be written as well in the
document.
hta: we re still ahead of schedule
i propose a 15mn break
hta: so break until 20 past.
Next steps for Media Capture and Streams document
<inserted> scribenick: npdoty
talking about Last Call
dom: a refresher on Last Call
... assuming we get consensus to go to Last Call
... have to make a number of decisions about how that last call
will happen
... have to decide the amount of time for comments. W3C Process
minimum is 3 weeks, but can be longer
... review will be open to everyone, but some groups we should
specifically contact
... during the time of the formal review period, need to
formally track each comment, formally respond, formally seek
feedback to our response
hta: a formal definition of "formal"?
dom: need to log each comment (like to the mailing list), needs
to send a response, best effort to see that the comment is
accepted by the commenter
... not every comment needs to be considered an issue
... some comments may repeat existing issues without raising
new information
... even if the comment is not raising a new issue, need to
indicate to the commenter, past discussion and arguments
burn: typically we track every comment that comes in. need to
be prepared to give a proposed resolution
... eg "we already discussed this and we decided not to do
this" or "clarification we'll want to do"
... need to communicate that proposed resolution to the
commenter
... make your best effort to get back their acceptance or
rejection of your proposed resolution
... often give a time limit, if we don't hear from you in two
weeks, then we'll assume you accept our resolution
... should separately track implied vs. explicit acceptance, in
order to have clarity for the transition call later
dom: have a tool for tracking comments that we might or might
not use
... groups we have intersection with, groups mentioned in our
charter
... first list of groups
... Webapps, TAG, Audio, HTML, WAI PF, IETF RTCWeb
... forgot for the slides, but should add the Privacy Interest
Group (PING)
npdoty: thanks
dom: might ask the RTCWeb group to formally chime in
... just my suggestion, for reductions or extensions
... once we're done with Last Call comments
... either go to Candidate Recommendation (no substantive
changes that requires more reviews)
... otherwise, need to go back to Last Call
... transition request to the W3C Director, including the
detailed review of the comments we have received
... for commenters who don't accept the resolution, would check
whether we need a Formal Objection, with a separate process
... Last Call can be a difficult period, which this group may
be familiar with
... attention from groups who may not have followed all the
details of your work
burn: in my experience, Last Call can effectively be first call
fluffy: do you try to get feedback from those groups before we
get into the formal Last Call step?
burn: one way is to involve these groups before Last Call
... ask them ahead of time. may save you from doing a second
Last Call
dom: we've had a number of interactions with TAG and WebApps
... had some early reviews from Privacy Interest Group, but doc
has changed significantly
burn: met with WAI rep, indicated an area they care about a lot
... should get involved sooner rather than later
fluffy: as comments get moved to Formal Objections, who can
raise those?
dom: anyone can raise a Formal Objection.
no Membership requirement, any individual or organization
dom: Formal Objection is not something done cheaply, as a
social matter. requires quite detailed documentation
hta: what constitutes a Last Call comment?
... any message to the mailing list?
dom: if there's ambiguity, you can ask
... most cases it's fairly clear
burn: in some groups, could say that anything from a public
list was a Last Call comment
... but now all groups are operating in public
... social issues, but that doesn't stop some people
dom: understood that WG members should not raise Last Call
comments, but can
... for example, if you understand something that's new
... could have a separate mailing list for comments
... most groups just use public mailing lists
burn: for every comment, it's useful to have an email track as
well as minutes. so that later you can point back to it
... track discussion of comments, not just the comment itself
dom: the tool I'm thinking of can do some of this tracking
<mic noise>
dom: when would we go to Last Call for getUserMedia?
hta: one requirement is to close the bugs
... tomorrow we are going through the remaining bugs (8)
... and the group needs consensus to go to Last Call
... if we have wildly different opinions....
burn: time to go to Last Call is that we don't expect
substantive changes (otherwise CR)
... we have a note in the document today about things we're
expressly seeking feedback on
... about promise backward-compatibility navigator syntax
... and a few editorial notes in the document
hta: once we close these 8 bugs, does the group believe it's in
a state where we should issue a Last Call?
fluffy: how many people have read the document in the last six
months?
... read, not looked at
burn: we should not wait long at all to request review from
these other groups, whether or not Last Call
dom: one of the advantages of wide review of Last Call is to
limit ourselves about not wanting to make big substantive
changes
... developers don't like that as much
burn: the next exclusion period for intellectual property. Last
Call triggers one
mt: what should we do with changes during this time? (don't
want to make changes during the Last Call review)
dom: could make partial interfaces / new specs
... or look at a new version, could be in a different branch
fluffy: should seriously read this document, because it's going
to be frozen for a while
hta: where it's possible in a reasonable way to write a
separate document that extends interfaces, that's preferable
... a separate question about what makes sense about
integrating or keeping a separate spec
burn: if you know you have something substantial to add to this
document
... then it's not really the last Last Call
... putting the community through official review steps
mt: the tension between the idea that we have a living spec
fluffy: this is not a living spec. Last Call is a sign that
we're freezing it
burn: you don't typically do a Last Call unless you're really
indicating that you're done with it
hta: basic conflict between publishing Rec track vs. living
specs
fluffy: if we allocate ten people from this room to review this
document beginning to end, would get a lot of comments
... we should do that before we issue a Last Call and get those
comments from a dozen different groups
dom: goal should be a conservative approach to commenting
fluffy: we should fix the things that everyone will indicate
that we fix
ekr: we should get approximate signoff from implementers, prior
to Last Call
... if those people are basically happy, we can talk about
going to Last Call. but if they're not, then we need to resolve
those issues first
fluffy: we put out a deadline for comments twice. only two
responses?
... can we get volunteers from several, separate individuals
from major implementers to review?
timeless: once we have an announce list for reviews, I'll be a
part of it. I would do a pass, I would do a very detailed
review
<Zakim> timeless, you wanted to say i might read it
timeless: or could contact some individuals like me separately
fluffy: everybody who's ever read it before has had a lot of
comments. rate doesn't seem to be dropping
burn: need a full pass through of entire document
dom: specific action items?
... who volunteers?
hta: give it two weeks for comments. 15 November
<dom> ACTION: ShijunS to make full review of getUserMedia - due
Nov 21 [recorded in
[17]http://www.w3.org/2014/10/30-mediacap-minutes.html#action01
]
<trackbot> Error finding 'ShijunS'. You can review and register
nicknames at
<[18]http://www.w3.org/2011/04/webrtc/mediacap/track/users>.
[18] http://www.w3.org/2011/04/webrtc/mediacap/track/users%3E.
<dom> ACTION: Shijun to make full review of getUserMedia - due
Nov 21 [recorded in
[19]http://www.w3.org/2014/10/30-mediacap-minutes.html#action02
]
<trackbot> Error finding 'Shijun'. You can review and register
nicknames at
<[20]http://www.w3.org/2011/04/webrtc/mediacap/track/users>.
[20] http://www.w3.org/2011/04/webrtc/mediacap/track/users%3E.
mt: a big document. would take time, but IETF/vacation are
conflicts
<dom> ACTION: martin to make full review of getUserMedia - due
Nov 28 [recorded in
[21]http://www.w3.org/2014/10/30-mediacap-minutes.html#action03
]
<trackbot> Error finding 'martin'. You can review and register
nicknames at
<[22]http://www.w3.org/2011/04/webrtc/mediacap/track/users>.
[22] http://www.w3.org/2011/04/webrtc/mediacap/track/users%3E.
burn: November and December can be a slow time for responses
<dom> ACTION: Josh to make full review of getUserMedia - due
Nov 28 [recorded in
[23]http://www.w3.org/2014/10/30-mediacap-minutes.html#action04
]
<trackbot> Created ACTION-30 - Make full review of getusermedia
[on Josh Soref - due 2014-11-28].
<dom> ACTION: juberti to make full review of getUserMedia - due
Nov 28 [recorded in
[24]http://www.w3.org/2014/10/30-mediacap-minutes.html#action05
]
<trackbot> Error finding 'juberti'. You can review and register
nicknames at
<[25]http://www.w3.org/2011/04/webrtc/mediacap/track/users>.
[25] http://www.w3.org/2011/04/webrtc/mediacap/track/users%3E.
hta: will note to the mailing list that we have a few
volunteers for comments by November 28th, and we're soliciting
more
burn: even comments indicating that you can't understand it, is
useful information
<dom> ACTION: PhilCohen to do full review of getUserMedia - due
Nov 28 [recorded in
[26]http://www.w3.org/2014/10/30-mediacap-minutes.html#action06
]
<trackbot> Error finding 'PhilCohen'. You can review and
register nicknames at
<[27]http://www.w3.org/2011/04/webrtc/mediacap/track/users>.
[27] http://www.w3.org/2011/04/webrtc/mediacap/track/users%3E.
dom: but we do want to finalize this thing
mt: will generate pull requests for editorial, grammatical
things
fluffy: commits, can cherry pick, but grateful for any review
at this point
stefanh: end of the morning agenda
... will continue in this room after lunch with #webrtc
<agenda discussion>
Mediacapture bugs
fluffy: "volume" is underdefined
hta: could define as a number of decibels, which would be
inconsistent wtih HTML
fluffy: but it's not that. my proposal is that it's a
multiplier in a linear space
... 0 is silence. 1 is maximum volume
... a volume setting you can move up and down between 0 and 1
... could be a linear or logarithmic curve, just pick one. this
is linear
hta: using a constraint as if it were a control
ekr: if you want this, why not use WebAudio?
dom: doesn't make sense as a constraint
fluffy: we had some confusion over 0.5. could remove "volume".
not sure WebAudio covers all cases
ekr: is there some reason you can't do it with a filter?
fluffy: different implmenations will do it different ways
mt: isolated streams is an example
fluffy: maybe we shouldn't re-open whether to have volume or
not. only proposed change is explaining the meaning of 0.5
hta: let's integrate this change and close this bug
burn: a clarification, not a change to the requirements for it.
... if we all agree
mt: some encouragement will be provided. it's probably a bad
idea to do it over an unauthenticated origin
hta: will assign that to ekr
npdoty: should be clear about whether the requirement on stored
permissions is normative
ekr: it should be normative, as it is in IETF
npdoty: and that stored permissions section would be a good
place for the additional encouragement, and should use a better
definition for "secure origin"
... may follow up in email
jib: constrainable pattern, which is abstract. and specific use
in getUserMedia
... specific use doesn't need to be abstract. should say
exactly what is returned
... reuse the existing MediaTrackContraintSet dictionary, which
may be added to in the future
... a second dictionary, a subset of the capability set
... hopefully I get back success and get back values
... capabilities is a superset of constraints which is a
superset of settings
... pull request illuminates that the datatypes are related
... should we write two more dictionaries (enumerating the same
keys), or should we just re-use the same type?
... re-use the same type because capabilities are exactly the
same structure (based on the prose)
burn: IDL, we don't say that, that would be a change to the
document
jib: we could use a narrower data type for the returned set,
but it could easily be the same data type
mt: no content-accessible type information are available
... maybe it should return an array of strings rather than a
dictionary anyway
... don't mind about the difference between capabilities and
constraints. tough for spec authors and implementers, but oh
well
... JavaScript more natural to use an array, with indexOf
jib: could be a fourth use of the dictionary. return a
dictionary that you can enumerate, all the keys you find in
there are supported
... UA puts in some truthy, an object
burn: trying to remember why we did it this way
fluffy, where are you?
burn: don't want to put the same defined type for all those
different returns
... because they're not the same return
jib: X, Y and Z are different things, even if they're the same
type
... we need more specific text. either this pull request with
using the same dictionary, or we define more specific
dictionaries
hta: separate discussion of getSupportedConstraints
... capabilities, you might want to look at the value, modify
it slightly and then send it back to the browser
burn: even if you want them to be almost the same data
structure, I'd rather see different names for them
jib: different names, same type
... argument type, argument name
dom: developers are not likely to read the spec
mt: something we typically leave to editorial discretion
... if they can address it in some way, leave it up to them
... we will review the outcome and ensure it's not crazy
... acceptable?
jib: fine. but want to specify something, not just abstract
types
burn: I hear you.
... we already have the prose for it, but now have the IDL
fluffy: legal syntax is different
dom: has anyone started implementing?
jib: hoping not to make any functional changes at this point
hta: WG position is to leave to editorial discretion
dom: WebIDL must be valid
fluffy: editors please bring us a proposal
[adjourned for lunch.]
re-convene at 1pm
Media Capture Depth Stream Extensions
<dom> [28]Media Capture Depth Stream Extensions specification
[28] http://w3c.github.io/mediacapture-depth/
<anssik>
[29]https://docs.google.com/presentation/d/1mwlD8H_RzlB2JheyjqX
xa7sMSMTN8x96VgSzjy5B4pc/view
[29]
https://docs.google.com/presentation/d/1mwlD8H_RzlB2JheyjqXxa7sMSMTN8x96VgSzjy5B4pc/view
<dom> [30]Anssi's slides
[30]
https://docs.google.com/presentation/d/1mwlD8H_RzlB2JheyjqXxa7sMSMTN8x96VgSzjy5B4pc/view
<dom> ScribeNick: dom
Anssi: I'm Anssi Koitianen from Intel, Ningxin Hu from Intel,
and Rob Manson (Invited Expert)
... we discussed the idea of bringing 3D camera to the Web last
year at TPAC
... I remember polling for interest back then
... lots has happened since then
... we collected use cases, played with the spec and wrote code
... we will be summarizing this
... [slide 2]
... The spec is about making 3D camera 1st-class citizen of the
Web platform
... up to now, these have required special plugins
... the native platforms have these capabilities
<hta> Stefan is running the slides (so you don't have to say
"stefan or someone")
Anssi: the approach we've taken is to integrate with existing
APIs as much as possible
... reusing primitives rather than inviting new APIs
... this means relying on getUserMedia, Canvas 2D, WebGL
... if you attended the symposium on Wednesday, you saw a live
demonstration on stage
... TimBL mentioned it as exciting :)
... [slide 3]
... Current status: we started with use cases and requirements
— thanks for the contributions!
... it took 2 to 3 months to make sure we had a solid set of
requirements
... over the summer, we started drafting the specification and
published as a FPWD two weeks ago
... parallel to this work, Ninxgin has been working on an
experimental implementation which was used on stage on
Wednesday
... the code is available
Ningxin: the build is available on Windows; the source code is
also available
Anssi: the references are given on the last slide
... [slide 4]
... Regarding use cases: some of them are obvious, like video
games (e.g fruit ninja with your hands)
... 3D object scanning: measure a sofa by pointing at it
... video conferencing — it would let you remove the
background; or make the experience more immersive
... lots of use cases in augmented reality too
... Rob, maybe you want to expand with your favorite AR
Rob: you can add virtual objects behind real objects
... all AR could be improved with depth tracking
Anssi: this is only scratching the surface — there are lots of
other use cases
... I think it's as significant as bringing the RGB stream to
the Web, with lots of potential
... [slide 5]
... This summarizes our IDL interfaces
... not all of them are complete yet
... but this is our current view of what needs to be done
... we're very open to feedback on this
... we've already received good feedback from canvas
implementors — we'll adjustment based on this
... I won't go on the details — look at the spec for that
... DepthData is the data structure that holds the depth map
... CameraParameters, soon to be renamed CameraIntrisics
... it's associated with the DepthData
... it represents the mathematical relationships between the 3D
space and its projection in the image plane
<anssik> [31]http://en.wikipedia.org/wiki/Pinhole_camera_model
[31] http://en.wikipedia.org/wiki/Pinhole_camera_model
Anssi: it's the minimal data required for the pinhole camera
model
... these are the two only new interfaces we're adding; the
rest are extensions to existing interfaces
<juberti> please, please, can we add getTracks(kind) instead of
getDepthTracks
Anssi: We add a boolean flag to MediaStreamConstraints ;
similar to the audio and video boolean
Martin: are you planning on having constraints for these
devices?
anssi: we've chosen to wait for the constraints discussion to
stabilize
martin: I think we're stable enough; we would need your input
on what constraints would be needed in this space
Rob: in a lot of ways, the constraints can be very similar to
the video constraints (e.g. minimal range for width and height)
it's also related to CameraIntrinsics - but they'll largely
just be read only Settings and Capabilities
anssi: we're still looking at this
... the group sounds to be open for us to propose new
constraints
... thanks for that feedback
... we will take care of that aspect
... the next interface we're extending, we add getDepthTracks()
which returns a sequence of depth track
<robman> +1 to getTracksKind() or a more generic idea
dom: justin noted he would prefer to have a generic
getTracks(kind) instead of the specific getDepthTracks
anssi: noted; we'll look at this too
... Next interface is adding the "depth" kind attribute
<juberti> this of course would be generic and obsolete
getAudioTracks and getVideoTracks
anssi: In addition to extending these getUserMedia interfaces,
we have also additional APIs on the Canvas Context API
... similar to the imagedata apis
... we're having discussions with the canvas editors
... [Usage example slide]
... this is copy-pasted from the use cases doc
... this shows how easy it is for someone familiar with
getUserMedia to use that API
... [next steps slide]
... we're engaging with the Khronos folks for a minor extension
to WebGL to be able to pipe data to the WebGL context
Ningxin: we are proposing a small extension to the WebGL
extension called WEBGL_texture_from_depth_video
... with that extension, Web app developers need to know
whether they can upload a video element representing a depth
stream to WebGL
... using shaders
... with this extension, it defines circumstances under which
an HTML video element with depth data can be uploaded there
... we will define the format of the texture
... this is a proposal against WebGL 1.0
... if WebGL2.0 comes, we will update the texture format to
match
... DepthData as unsigned short is to be as close as possible
to the native representation of the depth stream
... (which is what most 3D cameras give)
... so as to limit CPU processing as much possible, and leave
as much as the GPU parallelism as possible
Anssi: we've talked with Dom with regard to the collaboration
with Khronos
... we're currently working on an informal technical basis
... we'll keep both groups updated when we make progress on
either side
... that's our model for operation
... Khronos has an established model for WebGL extensions;
there are tens of extensions that are widely implemented
... ningxin and Rob are the ones watching this space most
closely
... the other part of our work is to address open issues
... we use github issue tracker to track open issues
... that's the place to go to if you want to open new issues
... the slide shows the list of currently identified issues
... the highest priority items should be resolved before we
publish a heartbeat wd
<robman> NOTE: a range of these issues are likely to be
resolved as part of the update to use the ConstrainablePattern
bernard: the issue list talks about transmission via WebRTC
peerconnection
... that would mean changes to WebRTC 1.0?
<scribe> ... new codecs?
peter: what happens if you @@@
ningxin: this is still under discussion
... we're looking at an extension to H264 to support 3D TV to
carry the depth data besides the RGB data in the stream
... there are already several extensions in the codec space to
do that
... there is also an extension to SDP to describe that kind of
media
... we're looking at all these to see if we can support that
transmission
peter: with regard to PeerConnection, it's critical to
determine if it's a separate track or part of the same codec
Rob: our proposal is that it's a different track, that looks
like a video track
peter: but that requires different RTP packets
Shijun: the codec extension defines a different bitstream from
the video
... I was the first proposer for stereo video coding for H264
10 years ago
... I'm working on this at Microsoft still
... it's a fun project, but I'm not sure it's ready for prime
time
Anssi: it's good we have the right people in the room — we'll
continue the discussion on the mailing list
Martin: we need to understand what adding a depth track to a
peerconnection means
... this has impact on many things
bernard: unless the codec supports this, you simply won't get
anything
stefanh: we can extend the timeslot a bit for having this
discussion
anssi: is everyone here active in the task force? we would like
to keep you in the loop and we would appreciate your continued
contributions?
... we're currently at this phase where we're just getting more
and more people to look at our work and giving feedback
... we appreciate feedback from people interested in this
technology and with the right background?
peter: can we do getTracks(kind) instead of getAudioTracks /
getVideoTracks?
hta: we already have that
martin: let's just not add kind-specific tracks any more
Shijun: for any stereo related topic, it would be useful to
check with other groups on whether stereo videos in video tags
... if we don't have any surface to render a 3D video, what
would we do with these streams?
... (even if there are other use cases without that)
anssi: note that webrtc is not required to make use of this;
the same way getUserMedia is used well beyond WebRTC
shijun: I'm not saying don't do stereo video capture
... but whether we want to make that transmissible via WebRTC
is another question
ningxin: regarding the 3D video question, our proposal makes it
possible to use the depth stream independently
... 3D cameras can capture the depth stream without the RGB
stream
... e.g. for hand gesture detection
Martin: if we have both video and depth with different
constraints, what would that do the cameras?
Rob: we need to calibrate the cameras
... but otherwise, the constraints should apply to the both at
the same time
... for a calibrated stream with the two together, you should
consider them as a single source
Shijun: these are two sensors with different ids
... synchronizing the signals across these is quite challenging
... delays can induce headaches
martin: if you request a mediastream with both video and depth,
you get back a single mediastream
... which by definition are kept in synchrony
rob: asking for both depth and video gets you a calibrated
stream
dom: the WebRTC story is far from being done
... but looking at it will be a good test of the extensibility
of the WebRTC API
hta: the depthinput kind needs to be added to the enum of type
of devices in enumerateDevices()
... (I don't want to think of depthoutput quite yet)
martin: let's get the non-WebRTC stuff done; the WebRTC
interactions are a whole new enterprise
... we should scope the work to reflect that
anssi: makes sense
... thanks for the great feedback
... [demos]
... [magic xylophone demo]
ningxin: this is based on a demo that was done with a simple
RGB stream analysis
... trying to detect the movements by analysing the RGB stream
... I modified that to add depth data
... there is a video representing the depth data
... js-based hand recognition is based on js-handtracking
... also originally based on RGB data; we updated it to use
depth data
... it's more accurate, more stable and more performant
... we can extract the background and apply the recognition
algorithm only on the foreground objects, reducing computation
... because depth camera are infra-red based, they can be used
in low-illumination context
anssi: [fruit ninja demo]
ningxin: the idea is similar; still based on js-handtracking
library
... originally, this is done with a mouse or a touch screen
... here we integrate this with finger gestures
... you can also see the integration of the depth image that we
composite above the background and beyond the foreground
... this is done via WebGL via shaders
... that demonstrates depth rendering with WebGL still in 2D
space
anssi: [RGB+Depth with WebGL]
ningxin: we initiate a request with depth only where the user
is seen only as data
... then we request also RGB data
... main idea is here to use RGB texture and depth for the
positioning
anssi: hopefully this gave a good idea of what this technology
is about
... please give feedback, and let's discuss this further on the
mailing list or in github, etc
... the spec has still lots of room for changes
... we have ongoing discussions with the canvas folks
... if you're active also in the HTML WG, the public-canvas-api
mailing list is where this is discussed
stefanh: thank you anssi!
<robman> excellent discussion and feedback - thank you everyone
stefanh: no further questions from the room
hta: I think this work is great; if scoped adequately (i.e.
getusermedia stuff first), this will be useful in many contexts
<robman> 8)
hta: I'm glad you brought that here!
Screensharing
martin: this discussion will be around directions and guidance
in this space
... we want to create media stream from stuff on the screen
... [slide 3] this is what we are looking at for the API — an
additional source property to the mediastreamconstraints
... that source could apply for non-video cases (e.g. get the
audio from a given app)
... [slide 5] we're down with terrible names, but not sure what
to do
... "monitor" is also used in the context of audio
burn: I like "system"; it works well for audio, video, etc
martin: only issue is that system sounds global, where we would
want for some sub-part (e.g. a window)
dom: we could hook the API on another end point than
navigator.mediaDevices
martin: jib suggested the same
pthatcher: another suggestion would be to enumerate "system"
devices to which you could then constraint your getUserMedia
call
martin: that doesn't quite fit with the security model where we
want the user to be entirely in control of what is shared
... we want to distinguish the sources that are enumerable and
the ones that aren't
... screen sharing would fit in the latter case
shijun: the browser is in a better position also to pre-select
sources too
martin: the UX mock up our UI guys have done comes with a
doorhanger where the user can pick which windows to share (with
no default selection)
... none of this is exposed to the app until it is agreed by
the user
... I don't think that's a problem that is too troubling here
... but we need a common taxonomy to have these discussions
... I'm gravitating towards "monitor", "window", and "browser"
(without distinguish tab or window)
<robman> display?
peter: window doesn't really apply well across mobile / desktop
... leaving it a bit generic is valuable
martin: "system" is too generic because my microphone is part
of system
burn: maybe getOutputMedia?
peter: (or alternatively 'source: output')
martin: I'll go with one of these
... for application, our Cisco friends who have experience in
this area have shared that "application" doesn't work very well
for users
... someone decides to share a power point presentation
... they choose to share the whole PPT application rather than
just the single presentation they want to show
... and leak information without realizing
... so the suggestion was to simplify the interface and leave
it to the browser to determine how to e.g. composite several
window in one if they feel that's adequate
... this would be a new enum constraint
... we distinguish browser windows from other app windows for
same origin protection
... breaking that isolation is too potential scary to be just
dismissed via a getUserMedia prompt
dom: for "browser" filtering, would you also identify other
instances of the current browser? of other browsers?
martin: typically, only the current instance since that's the
only one the attacker is potentially in control of
... doing that with other browsers require a lot more difficult
social engineering
... it's a bit more far stretched for other browsers, but not
completely unthinkable either
... we may want to filter other browsers out
... clearly sharing another browser window is something we want
to enable without too much work for the user
... we're currently whitelisting, but want to have a more
generic approach
... but we're still trying to figure how to get a clear signal
from the user that they trust enough the site to share that
other site
bernard: there are other risks we're not protecting against
(e.g. an app revealing the user of the password)
martin: users are likely to understand this
... but users probably don't understand the cross-origin risks
where we need to protect a site from another site
alexandre: what's the impact on the API?
... independently of the UI impact
martin: it doesn't have impact on the API
... there will be a number of sources that can produce screen
sharing
... and access to any one of those will be what the browser
allows combined with the consent mechanism
shijun: for screen sharing, should we isolate by default?
martin: it doesn't solve the problem - the attacker could
simply sends the screen sharing stream to itself
shijun: I think there won't be a unique solution; but I think
isolated streams might be part of the solution
dom: what if one could only start screen sharing after a
peerIdentity-certified connection has already been established?
martin: I'm not convinced this would really help
dom: but I guess this shows that not all policy decisions can
be API-neutral
martin: right... Will need to think more about this, please
send more ideans and suggestions
... [slide 6]
... some platforms offer distinction between logical and
visible windows
shijun: on some platforms, content that is not visible is not
fully rendered
martin: screen sharing guys have tricks to circumvent this
alex: we need to make sure that the security considerations
don't break interop and make it horrible to develop for
hta: what are our next steps?
martin: we've been working on this informally so far
... there is a w3c github repo but that's as much as formality
as we got
... we probably need to decide whether this group wants to take
it on
hta: so the next step should be to present a draft to the group
and push it to FPWD
martin: ok, let's proceed with that; I'll make a few updates on
terminology before we get that reviewed
hta: rough plan would to achieve consensus on FPWD by week
after IETF
... is there rough consensus here we should proceed with that?
shijun: agreed
[lots of heads nodding]
getUserMedia Testing
[32]Dom's slides
[32] http://www.w3.org/2014/Talks/dhm-gum-testing/
<stefanh> slides shown on the getUserMedia test suite
<stefanh> (webrtc test suite non-exitstent right now)
Generating MediaStream from HTMLMediaElement
[33]https://github.com/dontcallmedom/mediacapture-fromelement
[33] https://github.com/dontcallmedom/mediacapture-fromelement
Summary of Action Items
[NEW] ACTION: Josh to make full review of getUserMedia - due
Nov 28 [recorded in
[34]http://www.w3.org/2014/10/30-mediacap-minutes.html#action04
]
[NEW] ACTION: juberti to make full review of getUserMedia - due
Nov 28 [recorded in
[35]http://www.w3.org/2014/10/30-mediacap-minutes.html#action05
]
[NEW] ACTION: martin to make full review of getUserMedia - due
Nov 28 [recorded in
[36]http://www.w3.org/2014/10/30-mediacap-minutes.html#action03
]
[NEW] ACTION: PhilCohen to do full review of getUserMedia - due
Nov 28 [recorded in
[37]http://www.w3.org/2014/10/30-mediacap-minutes.html#action06
]
[NEW] ACTION: Shijun to make full review of getUserMedia - due
Nov 21 [recorded in
[38]http://www.w3.org/2014/10/30-mediacap-minutes.html#action02
]
[NEW] ACTION: ShijunS to make full review of getUserMedia - due
Nov 21 [recorded in
[39]http://www.w3.org/2014/10/30-mediacap-minutes.html#action01
]
[End of minutes]
__________________________________________________________
Minutes formatted by David Booth's [40]scribe.perl version
1.138 ([41]CVS log)
$Date: 2014-11-05 14:38:03 $
__________________________________________________________
[40] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[41] http://dev.w3.org/cvsweb/2002/scribe/
Scribe.perl diagnostic output
[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.138 of Date: 2013-04-25 13:59:11
Check for newer version at [42]http://dev.w3.org/cvsweb/~checkout~/2002/
scribe/
[42] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/
Guessing input format: RRSAgent_Text_Format (score 1.00)
FAILED: i/Toipc: Welcome/ScribeNick: dom
Succeeded: i/Topic: Welcome/ScribeNick: dom
Succeeded: s/might decide/MAY decide/
Succeeded: s/matthew/martin/
Succeeded: s/sss/ShijunSun/
Succeeded: s/<someone from w3c>/npdoty/
Succeeded: s/audio/input/
Succeeded: s/video/output/
Succeeded: s/WebPaps/Webapps/
Succeeded: s/hi Domenic, thanks for joining! we're still getting set up
here FWIW//
Succeeded: s/ok, np.//
Succeeded: s/lol wut//
Succeeded: s/(hi everyone)//
Succeeded: s/:P//
Succeeded: s|i/Toipc: Welcome/ScribeNick: dom||
Succeeded: i/talking about Last Call/scribenick: npdoty
Succeeded: s/@@/jib/g
Succeeded: s/improved with head tracking/improved with depth tracking/
Succeeded: s/ ... @@@/it's also related to CameraIntrinsics - but they'l
l largely just be read only Settings and Capabilities/
Succeeded: s/@@@_video/WEBGL_texture_from_depth_video/
Found ScribeNick: dom
Found ScribeNick: alexG
Found ScribeNick: npdoty
Found ScribeNick: dom
Inferring Scribes: dom, alexG, npdoty
Scribes: dom, alexG, npdoty
ScribeNicks: dom, alexG, npdoty
WARNING: Replacing list of attendees.
Old list: TPAC Domenic
New list: Portland anssik +86216116aaaa robman ningxin
Default Present: Portland, anssik, +86216116aaaa, robman, ningxin
Present: Portland anssik +86216116aaaa robman ningxin Josh_Soref
Got date from IRC log name: 30 Oct 2014
Guessing minutes URL: [43]http://www.w3.org/2014/10/30-mediacap-minutes.
html
People with action items: josh juberti martin philcohen shijun shijuns
[43] http://www.w3.org/2014/10/30-mediacap-minutes.html
Received on Wednesday, 5 November 2014 14:43:40 UTC