[minutes] W3C WebRTC WG F2F in Santa Clara - day 1/2 - 2011-10-31 from Francois Daoust on 2011-11-08 (public-webrtc@w3.org from November 2011)

From: Francois Daoust <fd@w3.org>
Date: Tue, 08 Nov 2011 16:37:54 +0100
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <4EB94CD2.1050708@w3.org>

Hi all,

The minutes of the first day of last week's F2F meeting are available at:
http://www.w3.org/2011/10/31-webrtc-minutes.html

... and copied as raw text below.

I'll send the minutes for day 2 right after this email.

Minutes include links to slides. By their very nature, minutes are kind of a dry read and do not always manage to convey the arguments that have been exchanged. I'll work on a summary of the two days that I'll send later on.

Thanks,
Francois.

-----
WebRTC WG F2F Santa Clara - Day 1/2

31 Oct 2011

[2]Agenda

[2] http://www.w3.org/2011/04/webrtc/wiki/October_31_-_November_1_2011

See also: [3]IRC log

[3] http://www.w3.org/2011/10/31-webrtc-irc

Attendees

Present - group participants
Harald_Alvestrand, Adam_Bergkvist, Dan_Burnett,
Francois_Daoust, Dan_Druta, Christophe_Eyrignoux,
Narm_Gadiraju, Vidhya_Gholkar, Stefan_Hakansson,
Cullen_Jennings, Kangchan_Lee, Wonsuk_Lee, Kepeng_Li,
Gang_Liang, Anant_Narayanan, Eric_Rescorla, Youngsun_Ryu,
Youngwan_So, Timothy_Terriberry, Rich_Tibbett, Justin_Uberti,
Milan_Young

Present - observers
Adrian_Bateman, Robin_Berjon, Mauro_Cabuto, Suresh_Chitturi,
Manyoung_Cho, Mohammed_Dadas, Shunan_Fan, Tatsuya_Hayashi,
Dominique_Hazael-Massieux, Tatsuya_Igarashi,
David_Yushin_Kim, Ingmar_Kliche, Dong-Young_Lee, Ileana_Leuca
(a few other observers attended the meeting)

Chair
Harald_Alvestrand, Stefan_Hakansson

Scribe
francois, Rich, burn, fluffy, anant, DanD

Contents

* [4]Topics
1. [5]IETF Architecture Overview
2. [6]Use-cases and Requirements
3. [7]Security requirements
4. [8]Status and plans in the DAP WG
5. [9]Access control model and privacy/security aspects
6. [10]Stages for moving to a Rec
7. [11]Low Level Control
8. [12]Data Streams
9. [13]MediaStream
* [14]Summary of Action Items

See also: [15]Minutes of day 2/2
_________________________________________________________

[15] http://www.w3.org/2011/11/01-webrtc-minutes.html

Stefan: [starting meeting. Reviewing agenda]

IETF Architecture Overview

Slides: [16]RTCWEB Architecture (PDF)

[16] http://www.w3.org/2011/04/webrtc/wiki/images/7/79/WEBRTC_Overview_TPAC_SC_presentation.pdf

hta: The goal for RTCWeb is real-time communication between browsers
... arbitrarily define that as within ~100ms
... Trying to drive a design by use cases. Must have a design that
meet the priority use cases.
... we want to design general purpose functions.
... one use case we're looking at is the interworking with legacy
systems. We're fairly sure we want to make that work.

hta: relays must be possible otherwise we don't have a universal
solution.
... <goes through the basic architecture in his slide deck>

hta: All components (except RTCWeb implementing browsers) must be
assumed evil.
... Keep trust to a minimum
... Need to look at mechanisms for establishing trust from a web
page to a browser.
... data congestion control must also be a priority.
... RTP exists. We will use it.
... encrypt everything

hta: considering DTLS-SRTP key negotiation for that purpose.
... UI issues are important to the overall security.
... always fun to agree on codecs
... connection management: least controversial proposal is ROAP
... We expect innovation in what-connects-to-what
... ROAP does allow us to interconnect to SIP and XMPP based systems
... lots of other pieces, media buffering, muting, game control.

hta: a lot of that needs to be done in the browser.

burn: ...caveated by keeping in mind that we want to allow
innovation.

hta: W3C has an Audio group defining interfaces for accessing audio
data.
... hopefully we'll be able to use that but we need to confirm that
down the line.
... All of this is captured in [17]draft-ietf-rtcweb-overview-02.txt

[17] http://tools.ietf.org/html/draft-ietf-rtcweb-overview-02

DanD: We know web is beyond browsers. We do have the ability to
execute web apps in non-browser UAs.

DanD: We need to ensure that a browser endpoint can communicate with
a non-browser endpoint.

hta: We need communication to devices that are not browsers.
... We should not lose track of the browser use cases first and
foremost.
... One principle is that as long as the other side obeys the
interface then it doesn't matter what it is.

DanD: Another comment RE: interdependencies with other groups. One
example is on the discovery of the capabilities on other devices.
... this might be a missing piece in our discussions to date.

anant: There are some capabilities in the proposal to negotiate.

fluffy: If we figure out how the protocols work for interoperability
then we might get this legacy interworking.

Use-cases and Requirements

Slides: [18]Use Cases and Requirements (odp format)
Draft: [19]Web Real-Time Communication Use-cases and Requirements
IETF draft

[18] http://www.w3.org/2011/04/webrtc/wiki/images/e/e1/Use_cases_and_reqs_v3.odp
[19] http://tools.ietf.org/html/draft-ietf-rtcweb-use-cases-and-requirements-06

stefan: <goes through some of the key use cases in his presentation>

hta: regarding the Distributed Music Band use cases. We're going to
need really low latency. Concert-mode? We also need to distinguish
between voice and music where we will remove noise from the former
that is not suitable for the latter.

francois: Perhaps we should try to stick to something simple since
the really low latency issue is a problem.

stefan: It's in the use cases document anyway so we can discuss
further on that.
... In the document there are a list of use cases where the
discussion has died out.
... or not concluded.
... such use cases relate to different situations, E911, Recording,
Emergency access, Security Camera. Large multi-party session etc.
... these use cases could get added to the document if they get more
support.
... draft-jesup. I think we should cover both unreliable and
reliable data channels for WebRTC data.

stefan: draft-sipdoc. 4 requirements derived. I think this is
covered by the current use cases document

<juberti> I agree, these data use cases should go into this doc.

<juberti> We only have one use case for data in the current doc.

stefan: draft-kaplan. Doesn't introduce new use cases but does put a
lot more requirements on the document.
... Questions/comments on the use cases?

DanD: Observation: augmented reality is not covered.

<francois> [20]Open issues on use cases and Req on WebRTC WG wiki

[20] http://www.w3.org/2011/04/webrtc/wiki/Main_Page#Use_cases_and_requirements

richt: we've been looking at that. We have the building blocks.
Would be good to have a use case on this.

DanD: that's covered in some of these use cases but maybe something
we could add

cullen: The ability to overlay a video stream on top of another
would be good.

richt: you could do it with canvas

cullen: that has a big security implication.
... will talk about it later on.

DanD: plus video might come from an ad-serving service.

fluffy: Back to the 1-800-FEDEX use case. Anything we can provide to
scope that out futher?

stefan: not my area of specialty so feedback on this use case would
be good.

fluffy: The use cases puts emphasis on DTMF.

burn: I agree that DTMF is extremely important. We have to support
DTMF.

stefan: let's take a break since we're waiting on next presenter.

Security requirements

Slides: [21]Security requirements

[21] http://svn.resiprocate.org/rep/ietf-drafts/ekr/tpac2011/rtcweb-security.pdf

ekr: IETF trying to work on thread models and security models. I
don't think we're at the consensus level already, but here are the
directions.
... [showing slides]

ekr: Funny state: Browser threat model, browser protects you. It
includes the notion that you're in an Internet cafe. Basic security
technique is isolation.
... Site A and site B sandboxed.
... Browser acts as a trusted base.
... IETF adds the Internet threat model: "you hand the packets to
the attacker to deliver".
... In the IETF oriented view of the universe, cryptography is the
main technique.
... We can't force people to use cryptography all the time.
... We need a solid protection under the browser threat model, and
the best we can on the Internet threat model
... 3 main issues: 1) access to "local devices" (use my camera,
microphone)
... 2) Communications security. If we do our job right, we won't
have to worry too much about that here.
... 3) consent to communications, ties in with CORS, WebSockets
... Starting with access to local devices:
... If you go to visit a malicious, you have no idea where your
video is going to. It can bug you. Somehow we need the user to
consent, but it's not clear when, how many times.
... One thing I do want to mention is that people make a distinction
between sending video to a site and sending video to another peer,
but from a technical perspective, they are the same.
... Permissions models: we need short-term permissions, click on a
button for an Amazon customer service. Not a long-term permission.
... Until last night, I thought we needed long-term permissions.
... Tim indicated that he was not sure browsers will want to do
this.
... Do you want to support long-term permissions? That's a question
for the group

burn: why isn't this just a browser policy question?

cullen: the question here is: is it a requirement for the group?

burn: went through it in another group. Informed user consent is
needed but can take the form of downloading the browser.

ekr: Then, there's the notion of per-peer permissions.
... Another example of the short-term case, showing an example of an
injected ad.
... [thoughts on UI for short-term permissions]
... This has implications for the API.
... user clicks and calls Ford, but he's on Slashdot
... Dialog showing video call. There needs to be a non-maskable
indicator of call status so that you know you're still on the call.
You need to be consistently aware that the call is going on.
... Access to microphone/camera linked with call permission.
... Back to the example, Slashdot might have to be able to say a
word.
... [thoughts on UI for long-term permissions]
... Interface should be different. Possible: door hanger style UI.
You want an action that is less easy for people to do during a call.
... There's a tension between convenience and security. It gives a
lot of power to the site.
... That's an open question whether we want to support that or not.
... IETF has been assuming we want, so great feedback to have if we
actually don't
... [thoughts on peer-identity based permissions]

<juberti> I think we want to find a way to handle this. We don't
want the web platform to miss something that will be present in
native app platforms.

cullen: what's important to you is where is that going. Our media is
going to a different place than the Web site. The identity is
important.

burn: same issue in the Speech XG.

hta: Usually, you can read the form and find the address in the
form, but sometimes the address is constructed by the JavaScript.

ekr: Partial digression on network attackers. If I'm in an Internet
Cafe, and an attacker manages to inject an Iframe, he can bug my
computer, redirecting the call to him. The attacker controls the
network on HTTP.
... Assumption is that it's safe to authorize PokerWeb and then surf
the Internet. It's basically the same on your Wifi if not secure
enough.
... An open question is: should this facility be available on HTTP
at all? Mandate HTTPS?
... e.g. an HTTPS page that loads jQuery through HTTP

DanD: not all the devices have the ability to securely preserve a
token. That would be a good way to solve the problem.

ekr: [thoughts on consent for real-time peer-to-peer communication]
... From a protocol point of view, we have ICE. Remember that you
cannot trust the JS.

burn: the point is you disabled security completely

ekr: not entirely agree that it's the same thing
... Transaction ID needs to be hidden from the JavaScript
... When I surf to HTTP gmail, any attacker can inject the
JavaScript and redirect calls for him.
... In the context of SIP, we're already addressed most of
communications security issues.
... There's also protocol attack issue which hopefully should not be
a real problem in the end.
... otherwise security issue.
... Assuming that ROAP style API is used, we're going to make it
good to hide security settings from JavaScript.

AdamB: IDs might be owned by FaceBoox, and so on.

ekr: my view is: 3 basic scenarios. 1) Gmail to Gmail, Facebook to
Facebook, etc. 2) Gmail to Facebook, etc. where you'll need
federation of ID. 3) Identity separated from the service I use to
make the call.
... I have some possible solutions for that. Happy to discuss.

Cullen: My position is a bit stronger. This group wants encrypted
calls, but if you can't tell who the call is going to, that's
useless.
... We need to take that into account.

hta: for many cases, I think it's quite ok to say that the call is
encrypted to an identity and that this identity is verified by the
fact that the guy I talk to presented himself.

cullen: I want to know the trust chain. If this call is being
intercepted, I want to have some indication on that.

anant: slightly disagree with what Harald said.
... The federated use case.

burn: how do you know that things are going to the right person?

Anant: given that we have that use case in the document, we have to
touch upon that issue.
... We want a completely peer-to-peer system in the end.

ekr: Is there a good way to bootstrap these systems? I think the
answer is "yes".

Status and plans in the DAP WG

Stefan: wanted to know status of controlling camera and microphone.

robin: Hi. I'm chair of DAP. We need to figure out how we split the
work on who does what.
... We haven't done a lot of work on Media Capture recently.
... One dividing line that could be useful: DAP could be picking up
media capture very quickly, some interest from DAP side.
... We would do the simple thing that doesn't include streaming or
any complex processing.
... Then hopefully this would be pluggable in what this group needs

burn: what do you mean without streams?

robin: you could not bind a video stream to some back channel, but
you could do stuff such as video mail or recording.
... In the declarative style, most of it in the browser.

hta: main difference is who controls the UI.

Anant: If you're going to do programmatic access, important to agree
on what they look like between groups. Another solution is you take
care of declarative, and we handle programmatic way.

Anant: If you do programmatic way, we may end up with two APIs doing
sensibly the same thing

robin: heard feedback that some people wanted to do simple things
immediately.

anant: cannot "simple" be done with pure declarative approach?

robin: not really.

anant: something we've discussed in Mozilla. Media type in the
input, such as video/mp4. The browser prompts user with camera view.
Nice property that is avoids to deal with security issues in a nice
way.

robin: it would be useful if you had a demo you could show in DAP.
We're meeting Thursday/Friday.

Adrian: Microsoft just joined DAP. One of our interests is media
capture. API based on what getUserMedia is doing. WebRTC could build
on top of this API. This way, we could split the work easily.

Anant: does that mean that you have use cases that require
programmatic APIs?

Adrian: yes, in general we want developers to build their own
experience.

Cullen: how do you deal with permissions?

Adrian: same way as other APIs

Cullen: agree with short-term, long-term permissions presented here?

Adrian: need to check, but didn't look wrong.

richt: in Opera, we agree that many use cases require getUserMedia
but we want to decouple that from peer-to-peer connectivity. So
agree to split things up.

Anant: can two groups work on the same spec?

Adrian: liaison explicit in the charter of WebRTC. Feasible for DAP
to own the spec and go through the liaison.

richt: Peer-to-peer relies on a stream. We give you a stream and you
deal with it.

Cullen: that's a bit more complex than that, because of the hardware
support for compression, and permissions too.
... It sounds DAP needs a permissions model as well and doesn't have
one for the time being.
... We have all the permission problems that have to be enforced at
the getUserMedia level.

richt: the barcode scanner, face recognition use cases haven't been
taken up in the group.

cullen: I don't think anyone will disagree with these use cases

hta: want to make things more complicated ;)
... If you go on with the assumption that media is always sourced
locally, you're in the bad corner.
... As long as it's a media stream, the current getUserMedia doesn't
care where the stream is coming from. I look at it as a first and
easy step.
... thinking about Web Introducers.

robin: That's a DAP deliverable. I'd rather not drag this spec in
this discussion, although I agree it's a good way to make
introductions.

Anant: the resources you get are not more priviledged.

hta: I was more thinking about my computer getting access to your
camera.
... We might want to explore deeper levels of complexity for passing
streams around at a later stage.
... In terms of where things go, the WebRTC WG is chartered to get
this thing done. The charter is written in such a way that if
someone else does it, that's good!
... What I don't want to happen is one group that comes with a
vocabulary that describes front camera, back camera, etc. and
another group coming with one on camera orientation, in particular.

dom: can getUserMedia be split from WebRTC spec in general?
Independently of where the final spec resides, that's something
people are interested in seeing sooner rather than later.

cullen: I'm just wondering how much faster things will be if we
split things out. Browser vendors in WebRTC already indicated their
intention to implement the spec.

dom: Implementations of getUserMedia in Opera.

hta: there's on in Chrome too but part of RTC.

richt: we're going to push something out soon with getUserMedia.

burn: actually, it's a "super-subset".

Anant: if it's published as a separate spec, the use cases of
getUserMedia are a subset of use cases.

cullen: what I worry about is totally changing the directions we're
going to in something we're supposed to ship in a matter of months.

[discussion on Microsoft joining WebRTC]

dom: one way to have the IPR commitments that we want is to split
spec out.
... That means adding the SOTD, and accepting DAP's input.

Anant: if we start taking input from DAP, we're going to lose time.

dom: I don't think so, actually.
... nothing more than what we'll get with last call comments.

[further discussion on getUserMedia]

burn: this group wants to move forward very quickly. Other want it
for other purpose. Is there a way to do something quickly that does
not prevent other uses?

hta: getUserMedia returns a MediaStream, so MediaStream needs to be
defined before getUserMedia

cullen: [back to hardward support for video compression]
... Lots of things are wrong and need to be fixed. We haven't
focused on this right now. I'd like to see use cases that we're
missing (yours are great, richt).

stefan: that's the direction I'd like to follow, yes.

burn: yes, would be good to have use cases to see what's missing.

richt: the only thing we get from getting to DAP is extra IPR
coverage and comments.

cullen: is there a way to get comments early on?

adrian: there's a lot of process involved to get comments sent to a
group we're not participating on.

[discussion on IPR commitment]

robin: Nothing bad in splitting up and doing a joint deliverable.

dom: getting comments is something the group needs to do.

Suresh(RIM): so what happens to the draft in DAP's group?

robin: we'll kill it and keep the declarative one.

richt: it needs killing. Nothing happened on this spec for a year.

Stefan: so what do we need to do in the end?

dom: we need to ensure DAP agrees with that direction and then you
need to split up the part.
... The key question is where you draw the line. The administrative
side is easy.

Anant: Fine to reference WebRTC spec for definition of MediaStream?

dom: yes, but introduces a dependency in terms of timeline.
... Other question is editing.

cullen: I want someone with deep understanding of video

Adrian: we're happy to participate to make things easier since we're
making things more complex to start with.

robin: ready to volunteer an editor?

Adrian: I think so.

burn: if requirements are separable, that may be good to separate
them.

cullen: I think this group should agree on the mailing-list before
things get done.

Stefan: we have had chairs discussions earlier on.

richt: all of the work is staying in WebRTC in the end.

robin: all you get is better IPR protection and better comments.

cullen: important to put it on the list, first time people will hear
about it.

stefan: anyone objecting to have a joint deliverable?

PROPOSED RESOLUTION: split up getUserMedia and publish as joint
deliverable with DAP WG.

cullen: worried that joint deliverables always take longer.

robin: one thing that is important is to specify which mailing-list
takes discussions. We really should not have joint deliverable where
discussion is split in groups. Smallest issues turn into a war when
that happens.

<richt> proposal to RESOLUTION status: one/two week period for
mailing list discussion. Resolution to be made on next conf. call.
(?)

cullen: this whole thing is an integrated system. It's going to be
very difficult to discuss this without discussing other ideas.

dom: I think the key issue is splitting the spec, not the joint
deliverable.

robin: if we can't split the discussion, then we probably can't
split the spec.

burn: question is: can we write WebRTC requirements for getUserMedia
precisely enough for this virtual joint working group.

cullen: you'll need so much low-level details in getUserMedia

robin: two actions: one on splitting the spec, second on refining
joint proposal.

<scribe> ACTION: anant to check how to split getUserMedia from the
spec [recorded in
[22]http://www.w3.org/2011/10/31-webrtc-minutes.html#action01]

<trackbot> Created ACTION-8 - Check how to split getUserMedia from
the spec [on Anant Narayanan - due 2011-11-07].

<scribe> ACTION: robin to draft a draft proposal for joint
deliverable. [recorded in
[23]http://www.w3.org/2011/10/31-webrtc-minutes.html#action02]

<trackbot> Sorry, couldn't find user - robin

burn: Adrian, do you actually need to see something pulled out first
before you can help out?

Adrian: we can help with splitting out the spec, I think.

burn: it's more a pratical question, given the way editors work in
WebRTC.

cullen: can someone send use cases on one of the mailing-lists?

<scribe> ACTION: tibbett to send new use cases on getUserMedia to
webRTC mailing-list [recorded in
[24]http://www.w3.org/2011/10/31-webrtc-minutes.html#action04]

<trackbot> Created ACTION-9 - Send new use cases on getUserMedia to
webRTC mailing-list [on Richard Tibbett - due 2011-11-07].

[discussion on DAP interaction over]

Access control model and privacy/security aspects

Slides: [25]WebRTC: User Security and Privacy

[25] http://www.w3.org/2011/04/webrtc/wiki/images/7/73/Webrtc_privacy.pdf

anant: currently don't specify what happens with user permission
when using getUserMedia
... UAs vary, so may not be appropriate to define a standard for
permissions
... propose we write guidelines for browsers rather than something
mandated

richt: this is definitely difficult to get right. UA should provide
opt-in in UA

francois: typically such SHOULD requirements aren't testable so they
become guidelines in the end
... there is a way to make such informative statements

hta: browser differentiation is harmful to user. we have enough
browser representation here to figure out where we have agreement
and should have recommendations that reduce unnecessary
differentiation

richt: we don't mention doorhangers because there is a lot more that
can be done.

fluffy: we can say "browser needs to somehow do X" without
specifying precisely how.
... if completely optional no one implements. we can learn from
existing softphones, etc. I like the "check my hair" dialog, a UA
where there is a popup that tells you you're sending video and who
you're sending it to. PeerConnection could confirm that this is
correct.

(UA = User Agent = Browser)

fluffy: e.g. JS can select camera, provide name of contact to that
is displayed at the same time.
... can't check before connection happens, but later can cancel if
PeerConnection learns name is wrong

anant: mandate requirements on UI but not how to do it.

burn: +1

anant: hta believes that opera user contacts chrome user, so
differences could be confusing. right?

francois: some apps will use getUserMedia to send it, and others
will use it for local purposes, so needs are different

anant: maybe app has to make clear what media will be used for.

francois: user might have consented to call in advance of using
getusermedia

anant: we can check for stored permission
... do we have consensus to lay out steps but not specify how?

(generally yes)

richt: not sure. we don't know what we need to show yet

anant: we know some things, like previewing video

richt: anything that doesn't affect interop should not be required

fluffy: where we need encrypted name we need to require this

richt: let's not bake in too quickly because we are still
experimenting

fluffy: today we support encrypted media (but not yet required).
problem would be like using TLS but not showing name of site.

anant: we need global identifiers

adambe: with p2p may not know all names in advance.

anant: UI for accepting and initiating calls may be very different

adambe: what about two people talking and a third joins. media
streams already availaeble.

fluffy: same problem if you have single conversation moved from one
entdpoint to another

hta: good to discuss, but don't agree with cullen's request to
mandate requirements. want to hear about stuffy other than just
names

anant: (returning to slides)
... do we allow apps to enumerate devices? no, would like for app to
request what it needs (say, hints proposal).
... if user agrees, we return success call.
... user should always have complete control over what is
transmitted, independent of what the app asks for

adambe: with proper hints you need to enumerate and can get same
result. prefer hints approach

fluffy: every app i use for voice and video allows me to switch
cameras and mics. how does that work

anant: don't want app to choose switching, but want user to be able
to switch
... UI has to be independent in UA independent of app

burn: in html speech we have notion of default mic. app doesn't
choose, the user does via the chrome.

fluffy: yes, happens all the time. i'm using existing crummy mic or
camera, go find a better one and plug it in.

Tim: others want to know what's available in advance so you don't
even prevent option if it doesn't exist

anant: hints can solve this. some hints are compulsory, others
optional.

francois: can't app just check?

anant: this way doesn't reveal info about user.

burn: failures give user info

richt: yes, hints are good. web app doesn't need to know which
camera.

<richt> webapps provide a hint in the true sense of the word but the
impl. can fallback to any camera if necessary (rather than fail).

<francois> francois: exposing capabilities is fingerprinting issue.
Exposing "incapabilities" is as well.

anant: the comment was that UIs are best when they know what devices
are available

<francois> anant: right, the key is the time it takes so that the
app can't tell it's a fail because of an incapability and a user
action.

hta: if you don't know what's available you can't distinguish
between "you need more cameras to run this app" and "you need to
allow me to use more cameras"

richt: we can't allow fingerprinting
... one error, regardless of how it fails

fluffy: when would you need a case where you'd rather have a failure
than use a hint?
... would rather feed one camera into both than a failure

anant: (back to slides, showing early mockup)
... doorhanger hanging off info bar indicates that it's a web app
rather than the browser. don't like this approach, but best so far.
... we have "hair check", live preview of camera before
communication is active. can mute audio, click to share cameras
... webcam button on address bar gives you options to change cameras
(in UI, part of browser)

adambe: what about webcam with microphone display in it

anant: we should allow it, but may be an advanced checkbox. want 95%
of use cases to be handled

fluffy: sounds need to be able to changed to where they come from
and where they go.
... we will see this more and more as you have more devices. "skype
headset" and "facebook headset"

richt: what about tabbing implications. when you swithch tabs need
to know what happens

anant: will get to that
... (back to slides) default to what app asks for but users can
always override
... preferences pane to control all

anant: mockup used one-time permission grant model
... we allow user to say "always allow example,org to access a/v"

tim: if browser on phone and in pocket and permission has been
given, app could just turn it on in my pocket. accelerometer info
can tell you that the person is walking (and may have in pocket).

richt: we will use some kind of visual and/or vibration to indicate

anant: we need something because users won't want to clkci every
time at facebook

richt: we could try to learn it based on user behavior

fluffy: from privacy standpoint, webex on your phone and laptop
could do this today.
... it always starts with strong privacy position and eventually
disappears to no privacy. better to have something only strong
enough that it is still used
... indicators are probably more important than prevention
... anything stronger than this will be widely ignored.

richt: that is already in spec

anant: maybe sholud also have vibration or audio indication

stefan: how is this compared to geolocation

richt: we are 10, they are 2

adambe: like "watch position" but without user knowing

anant: need to let user know that previously-given permisison is now
using it

fluffy: users hate apps that grabbed device and turned on indicator.
needs to be when device used.

anant: in today's world we won't exclusively grab device that way
anymore
... should web app be able to specify what type of access it needs?

richt: user should always be in control

francois: maybe app could say instead when it does'nt need long-term
access

hta: option of granting long-term access only the second time you
try it has worked well

anant: (back to slides) initially tried to tie permisison grant to a
time-frame and domain name.
... deevvlopers hated this. want permissions tied to user session
not just domain
... could perhaps allow app itself to revoke a permission if it
detects a change in user session.

fluffy: can JS app provide a user-identifying token, so can index
using both user criteria and this token

anant: yes, as optional param in JS call. could try it.

richt: browser can handle this since it runs session.

anant: we don't know what's in cookie, so no.
... but most websites won't use it.

burn: financial sites wil like this.

fluffy: bad guys don't care but helps good guys ==> okay

richt: if you injected script that just replays in different domain
you can get permission easily

anant: how

richt: user-installed script

anant: yeah, but then you can do anything
... (back to slides, showing mockup of notification)
... one option is the entire tab pulses, with camera/mic control
right on tab.

richt: we pin audio/video. user has to explicitly request keeping
it.

hta: needs to be in spec
... switch tabs all the time and want my voice to be heard

anant: tricky across all UIs, including video phone

fluffy: something unspecified that irritates user is whether a video
starts playing when you open a new tab. we should make this the same
everywhere

anant: we browser vendors need to work this out.
... prefer default of not blocking audio/video just because you
switched tabs. if new tab wants to start video, should ask user.

richt: but may be hard to tell which tab has audio/video

anant: if whole tab pulses it works
... (back to summary slide) what happens if device already in use by
other app
... maybe can't tell which app is requesting access
... what is interaction for incoming call. assume signed in to
service to receive call/audio

fluffy: yes, but others might want web apps that run in the
background and have no bar (headless web apps)

hta: if headless web app reads sdp off disk and passed into
PeerConnection, it should just work, with no browser connection.

anant: so we should allow headless apps and let browser determine
how incoming call works.

fluffy: some chrome has to be involved when video is requested.

anant: yes. js can tell user about incoming call, but then need to
get permission.

hta: gum (getUserMedia) should have enough info to identify where
call is from
... apps will want "one button accept". can't avoid showing some
chrome. would be better for that to be the doorhanger. neeed extra
API call so web app calls receiver's browser and asks if they want
to accept. then get doorhanger.

oops, previous speaker was anant

richt: (missed detailed example)

fluffy: sometimes want long-term approval to at least negotiate and
reveal IP address. also a different mode where don't reveal IP
address until user has accepted.
... first one allows you to deal with ICE slowness by doing ICE and
acceptance in parallel.

francois; users won't understand this distinction.

fluffy: okay, then maybe don't need first case.

anant: we don't know how to implement incoming call.

richt: can do OS-level notification

anant: yes, but also want to give all the user controls when
accepting call.
... other questions (not on slides)
... what about embedded iframes? we don't allow anything other than
toplevel to do that. an iframe would have to pop up its own toplevel
window to do this.

richt: what happens with geolocation?

anant: we don't do the same but would like to
... other use case is where ad is embedded in slashdot. In that case
slashdot is accepting responsibility and you are giving permission
to slashdot.

richt: iframes from different origin

anant: yes. if same origin we just let them through.

(general approval of this approach)

adambe; what about call-in widget you can add to page.

anant: can't avoid this.

adambe: could sandbox the iframe.

anant: problem is that user does'nt know it's a different site.
... when new top bar user can tell
... also, only allow long-term approval for https
... don't enforce https for all uses, but definitely if site wants
long-term access

fluffy: what about mixed content
... will probably need more discussion. everyone will hate requiring
https, but they may realize they need it.
... difficulty today requiring https is that many sites would break
today. but with new sites where everything needs to be built from
scratch, like with webrtc, we could require it now. we should
consider it.
... but we need more info.

richt: could do tls as JS, so that might take care of it

hta: that's giving JS direct access to TCP
... JS should not have this power!

Stages for moving to a Rec

Slides: [26]W3C Recommendation Track

[26] http://www.w3.org/2011/04/webrtc/wiki/images/5/5c/Webrtc_w3c_rec_track.pdf

Moving on to Dan talking about W3C Recommendation practice
... discussion on consensus, moving to First Public Working Draft,
periodic publication]
... Good to reach out to groups with opinions early.
... On the "Candidate Recommentation" slide, at this stage, you
defend the document needs - at this point, you need to have a test
suite that tests the spec, not the implementations.

anant: Is this code and what is it run against?

francois: There is another group trying to come up with generic test
framework that can be used
... Should think about how to write a testable specification when
you write the spec

Dan: great to have the spec working be the same as the assertion
code in test

Can two implementations share code? If have good answer, perhaps OK,
but ...

Some times single implementations of optional features

Dan: on to Proposed Recommendation slide

francois: This is stage where W3C members have their last chance to
comment

Dan: On to Recommendation slide

anant: How do we deal with later version of spec for features we
wanted in a later version ?

francois: Need to recharter WG, go through same processes,
... also a proposed edited rec to include errata (not very common)

Dan: On to addressing public comments

Harald: What's process when can't agree

francois: The group is strongly encouraged to avoid such situations.
Comments can get escalated as formal objection that goes up to W3C
Director.

Dan: On to Status of WebRTC API draft slide

richard: should we stay at candidate for a year or so

Dan: better to have an exit criteria - such as meet this number of
implementations

Dan: two specs on their own time line other than whatever reference
dependencies are

anant: If we are doing two specs, should we push out our dates
beyond Q2 ?

francois: at the point we know we won't make it, then will need to
update

Low Level Control

Slides: [27]Low-level control

[27] http://www.w3.org/2011/04/webrtc/wiki/images/f/f6/Webrtc_lowlevel.pdf

Moving to low-level control presentation by burn

Dan: original proposal for a low level API (link in slide 2)
received limited discussion and little support from IETF's signaling
API
... But there is some interest in a low level API
... Look at [28]requirements document (IETF) by hadriel to drive
discussion
... Hints vs Capabilities will be an interesting discussion
... Some discussion now but we should move it to list soon
... Existing requirements are not the same level (higher level) than
what we want for low level hints and capabilities
... Browser UI requirements are things we've discussed and should
move into the current document

[28] http://tools.ietf.org/html/draft-kaplan-rtcweb-api-reqs-00

Dan: Media properties are the interesting ones
... A2-1 a web API to learn what codecs a browser supports

anant: How does this relate to JS application-level
decoders/encoders?

fluffy: that's independent of an API that exposes what codecs the
browser takes

tim: the API can only be used after the user has consented, so
there's already some trust in the app

fluffy: we should go through all of the requirements

<juberti> regarding fingerprinting, aren't we sending user-agent
already

<derf> We've (jokingly) discussed replacing the user-agent with an
empty string.

<juberti> i think there are enough implementation differences that
fingerprinting can be done using existing apis.

juberti: need to be able to query browser capabilities so that JS
can generate SDP on its own

(without user consent?)

<juberti> user consent is ok

<juberti> this would happen around the same time as camera access

<derf> But if you're going to have hardware codecs, capabilities can
differ even with the same UA.

<juberti> the thought experiment here is whether it would be
possible to fully implement signaling, except for telling the
browser what the offer and answer are.

<juberti> (fully implement signaling in JS)

<ekr> there are a lot of fingerprinting mechanisms out there. is
this really making it worse?

tim: but how can you restrict information if you want JS to
encode/decode (eg: hardware support for some codecs at certain
resolutions)

<derf> ekr: It clearly makes it worse. The question is, is it worth
the price?

<juberti> I don't like having to expose a billion knobs to JS, but
if we can give the browser a SDP blob from JS, that might allow a
flexible but simple compromise.

harald: if you negotiate on the principle that SDP is generated
independently of setting up media streams then you don't need
permission - there are use cases for that

<juberti> to generate said blob, we need to know what the browser
supports.

<derf> juberti: Sounds like you're asking to give the browser an
ANSWER from JS, and you want an OFFER in order to generate it.

<derf> Or did I miss what you were really asking?

<juberti> derf: I want to generate an OFFER in JS. I send the offer
to the remote side, and also tell my own browser about it. The
remote side generates an ANSWER in JS from the OFFER, tells the
browser about both, and sends the ANSWER back to the initiator. The
initiator then plugs the received ANSWER into the browser, and media
flows.

<derf> juberti: Okay. Why can't you do that with ROAP today?

<juberti> a) you can't generate the OFFER, since you don't know the
browser caps. b) even if you could generate your own offer, there's
no way to tell the local browser about it. lastly, the state machine
for ROAP lives inside the browser, so the JS can only do what ROAP
allows (i.e. no trickle candidates like Jingle)

<ekr> clarification: trickle candidates is candidates in pieces like
with Jingle transport-info?

<juberti> ekr: exactly

<derf> juberti: fluffy is saying what I would have replied to you
right now.

dan: we jumped from A2-2 to A2-3, but they both look like they go
together

fluffy: what is the use case for knowing codec properties? it only
makes sense if you can control the properties

<Mani> would it be more appropriate to require that the capabilities
described should be consistent with the capneg RFC5939 security
properties?

adam: is A2-2/A2-3 a codec abstraction of some kind?

harald: you want to select the best possible codec for a given
bandwidth requirement

harald: different for video and images etc.

richt: considering whether you can update the SDP proposal the
browser sends to the JS directly through JavaScript

cullen: when we get to ROAP, we'll see that it's possible.

anant: in order for JavaScript to add things to SDP, it needs to be
able to query.

cullen: if the browser supports stuff that it didn't say it
supports, then it's only normal that you cannot use it.
... I think you're going to get that one way or the other, so not
opposed to an API.

hta: we don't have an opaque proposal between browsers right now.

cullen: in the SIP proposal, you do

hta: cannot be used to setup the initial connection

<ekr> SIP isn't really opaque, it just looks opaque.

cullen: if we're trying to protect from fingerprinting, we need to
know what kind of information we think we can reveal.

anant: hardware information is the critical key
... Easy to identify who the user is with some nuances on hardware
capabilities.

hta: are we getting it worse in a way that makes a difference,
that's the question.

[exchanges about fingerprinting]

cullen: my guess is that even fingerprinting was revealing that I'm
using a Mac Book Air, that's still a large set.

<ekr> there's a lot more uniqueness than that. For instance, window
size, fonts, plugin support, etc.

<ekr> Important to distnguish between new capabilities that expose
more information to the server versus capabilities that expose info
to the peer.

burn: going through the requirements provides food for issues that
are relevant.

hta: looking at A2-4, in many scenarios, the application is the best
place to know what can be cut off.
... e.g. stop sending video that 's not crucial for this
communication.

cullen: I would be very concerned if the congestion control loop was
done in JavaScript.

hta: my thinking is that, in the case when the message is "no way to
get more than 100Kb/s through", the app can react and select the
streams it wants to send.
... then the browser can take it from there.

cullen: level of control in JavaScript is: on/off, framerate,
bandwidth... slippery road.
... Where do we draw the line?
... Implementation experience will teach us a lot here.

hta: I very much agree with that.

anant: declarative approach could work, e.g. "please turn on the
stream at this bitrate"

burn: moving on level in audio streams requirements A2-8 and A2-9

cullen: security implication I think. Attacker can detect volume,
and could perhaps derive words from that.

[moving on to A3-x requirements]

<ekr> cullen: depends on granularity with which it is reported

cullen: getting for SSRC and CNAME is good. Setting is more of an
issue.

hta: what if you negotiate the Payload Type value and then change it
afterwards?
... I don't see a reason to allow an API to do something that is not
useful.

burn: A3-4 is basically already possible.

anant: what does it mean to set the audio and video codecs of
streams you receive?
... At the point of rendering, it's too late.

hta: take all A3-4, A3-5, A3-6, A3-7, A3-8 together, it amounts to
"the application must be able to configure a media stream across RTP
sessions".
... I don't think the right approach, but I'd prefer to see a
requirement like that actually.

<juberti> for receive codecs, you might choose to change the PT
mapping.

<juberti> and you'd need to tell the media layer about that.

[discussion on A3-10 and A3-11, same in requirements although not as
low-level]

anant: do we have use cases that we can map to these requirements?
That would be useful.

burn: there were some general description that provided some context
for theses. I didn't want to read it here.

anant: it would be easier to get it into the spec if these
requirements were motivated by actual use cases.
... We should get more specific about the level of extensibility we
need.

burn: there is a list in section 3 of this document. It explains
what the problems are

anant: not convinced by argument 6) (some Web application developers
may prefer to make the decision of which codecs/media-properties).
... don't see why you need to involve the server at all.

hta: it's clear that we don't have general agreement on how this is
phrased.
... let's wrap this up.

burn: Moving on to hings API, last discussed on the mailing-list.
Simple example is "audioType: 'spoken"music"
... question is which level of details.
... Agreement that this is needed.
... Question is do we need an API for that?

anant: new things will keep coming. Extensibility is needed.

cullen: agree.
... IANA registry could be used, I think.

burn: problem in other groups is knowing the IETF process. Won't be
a problem here.

hta: we have to define some kind of namespaces for hints. Just one
level, multiple levels, strings, tokenized, etc.

DanD: two things, structure and semantics.

burn: someone may want to propose finer granularity that you want to
relate to other values.
... in the end, they are hints, so it doesn't matter so much. If you
give something that is general, and something that is specific, you
don't know what you're going to end up with.

adam: side comment that the hints should be an optional argument to
addStream.

[agreed]

stefan: we should reuse MediaStreamHints object for getUserMedia

anant: true.

hta: having just one registry is probably ok. The video, you could
have a hint saying low resolution.

burn: one registry makes sense.

anant: different object but same values

burn: moving on to Statistics API.
... MediaStream.getStats()

DanD: where do you specify the timeframe for those statistics?
... maybe just "what the system knows".

<derf> burn: Just a nit... if your processingDelay is 20 ms, I
expect your framerate is 50 fps.

cullen: agree. Maybe we can steal this from the IETF XRBLOCK WG

hta: the caller can always call the function twice and check
difference.
... just return total, and the time you think it is at the time when
the function is called. Then easier to compute average.

DanD: important for that to be extensible.

cullen: there needs to be some of stats that need to be mandatory to
support. Multiple layers of stats are possible.
... any structure you put in there is not really useful, you have to
know the property.

hta: structure might buys you some namespace.
... Same property may be defined in different areas, so prefixing
might be good.

burn: I'm not hearing any disagreement here.

hta: I note devil's in the details.

burn: then, moving on to Capabilities API
... ROAP proposes to get an SDP blob back.
... getCapabilities() would return an SDP blob.
... It's using the syntax to represent capabilities

cullen: let's take fingerprinting off the table for a second. This
seems to make sense, though it may not be the syntax you could dream
about to list codecs you support.
... This seems to give you all the information.

anant: why do you need this info in advance?
... more reliable to wait until getUserMedia. No guarantee you'll
get video when the call is made.

DanD: I would render a different UI if I know video is not
available.

anant: you could do that later on.

cullen: lots of application grey out the video when not available
for instance.
... use case for "video", not specific codec.

DanD: on a mobile device, I may present a widget on the spec if I
know I have support video.

anant: I understand the argument. I don't like it because you need
to gracefully handle the case when video is not available in any
case.

Tim: the expectation is that it would be rare.

hta: you should be able to set a callback that "if capabilities
change, I want to know"

cullen: right.
... First, is video available? Then, can comeone come up with a use
case for more detailed info?

[more discussion on fingerprinting, if you know when the camera
comes in, you can correlate the user on Facebook and Google+, for
instance]

burn: general interest in something like this, except
getCapabilities early on and then callbacks.

anant: we can figure out later on if it's callback or event.
... we're going to try what Cullen suggests: simple audio/video,
then if someone comes up with a use case for more, we'll add more.

DanD: good, but let's not restrict. Extensibility would be good, not
to change the spec afterwards.

burn: suggests that the browser simply lies about more specific
parameters.
... 3 APIs presented here. Who's gonna do this?

cullen: happy to work on the callback, with Anant's help.

burn: will work on the hints API

cullen: all three of them assigned to editors spec.

Data Streams

Slides: [29]WebRTC Data Streams

[29] https://docs.google.com/presentation/pub?id=10OpPqGB2hhXxMFLeqok5wrwL10oDzUK4Vq7hqy_N5pc&start=false

juberti: There are use cases for unreliable data
... Need for the datachannel for mesh apps
... Encryption should be required for the data channel
... Design for DataStream should be similar to MediaStream
... there is no need for inheritance between DataStream and
MediaStream
... We'll use the same flow as in MediaStream to attached to the
peerConnection instead of an atomic flow

fluffy: I like this proposal. I think the priority needs to be
addressed as people tend to set priority high.

juberti: We can keep it very high level with specific enumerations

fluffy: Trying to come up with some other prioritization ideas

anant: What is the use case for the readyToSend?

juberti: Application should have some notion of the flow stage
... You need to know if you have buffer available

anant: we should align this with webSockets

fluffy: we need flow control for a large transfer

hta: the JS app has the concept of blocking

anant: What if the developer wants to block?

Adam: It can't

anant: API looks good
... How about security considerations?
... how do you know who's on the other side

fluffy: You would have been able to send this anyway

anant: what are the different attack possibilities? Should be
captured

juberti: What's unique is that you can send it in peer to peer way.
No server involved

hta: You said data must be encrypted
... being encrypted will take care of some concerns
... it would make more sense to have a constructor of itself and
then be attached to a peerConnection

Milan: Question about ack

juberti: The choices considered for the wire protocol make it useful

Milan: Protocol has an ack and it doesn't need to be exposed
... an example with the ack would be useful to understand

juberti: I'll take it as an action point

Stefan: we can conclude this session

juberti: I'll have it updated and sent to the mailing list for
review

fluffy: this is just the API proposal not the actual implementation,
right?
... We're moving along with this until we figure out the
implementation.

juberti: Requirements came from the wire protocol

fluffy: looks good. Can we build it?
... That's what I'm concerned and maybe we should relax our
requirements

<francois> [ref possible alignment with Websockets, perhaps change
"sendMessage" to "send"]

francois: there's a process called feature at risk

MediaStream

Slides: [30]MediaStream slides (odp format)

[30] http://www.w3.org/2011/04/webrtc/wiki/images/1/1c/MediaStream_TPAC_2011.odp

[going through slides]

cullen: why do audio tracks precede?

adam: if the last track is not a video track, you can assume there's
no video in there.
... there used to be 2 lists.

anant: the order doesn't have to correspond to anything.

cullen: there's another ordering in SDP.

anant: not related.

cullen: wondering whether that ordering could be the same.
... just strikes me as something weird.

DanD: think we should be explicit that the order does not have to
match that of SDP

anant: the only people who have to worry about that is browser
vendors, no need to be exposed to users.

stefan: I liked it better when there were two different lists.

adam: it was easier to query whether there is audio or video.
... Moving on to definitions.
... MediaStream represents stream of media data. Do I need to go
through it?

cullen: find this definition fascinating. Can you have stereo audio
in two tracks? Is voice and video one track? audio and DTMF? No
idea.

anant: a track is lowest you can go. Having 5.1 audio in one track
looks weird.

<juberti> what about comfort noise?

<juberti> is that the same track as audio?

cullen: need some group for synchronization, but separate thing.

anant: getObjectURL function is on the MediaStream, right? When you
assign a stream to a video element.

cullen: presumably, if I have a stream with 3 video streams, I want
to send it to 3 different video elements.

anant: media fragment could be used to select the track you're
interested in.

DanD: as long as we all agree on what's inside, we're in good shape.
... This is a good start for a glossary.

cullen: let's say that graphic card has VP8 support. You can't
assume that the clone happens before the decoding happens.

[discussion on gstreamer and tracks]

anant: I think gstreamer has two separate tracks-like for stereo
audio.

tim: surely, a 5.1 audio is one source for gstreamer.

adam: the motivation to remove the parallel between MediaStreamTrack
and media track is that audio was a multiple list whereas video was
an exclusive track.

hta: basically one media streamtrack is one stream of audio.

cullen: stereo is two tracks, 5.1 is 6 tracks. That's very easy to
deal with.

anant: you want to be able to disable audio tracks.

tim: how do I know which track is the rear right and so on?

DanD: technically, with 3D video, you'll want to sync those two
tracks.

francois: 6 tracks for 5.1 audio means disabling audio is disabling
6 tracks.

anant: we can add a layer at MediaStream level.

burn: the real world allows both, combined or not.

cullen: question is does something that is jointly coded with
multiple channels, is that one track?
... If that's one track with a bunch of channels, the fact that it
could be represented as two tracks sounds like a complete disaster.
... We need some abstraction layer to ease the life of Web
developers.

hta: in the case of 4 microphones, you want to send 4 tracks. With
6, you want to send 6 tracks.

anant: I think early implementations will only support one or two
channels at most.

tim: there are plenty of places where we can get audio that is not
one channel.

anant: right, from files, for instance.
... my preference is to stick to a MediaStreamTrack as the lowest
thing.

adam: moving on. An instance of a MediaStreamTrack can only belong
to one MediaStream.

anant: noting that "track" is really not the same thing as a track
in container formats, etc., so we need to be explicit in the doc
about that, not to create additional confusion.

[meeting adjourned, discussion on MediaStream to be continued on
[31]day 2]

[31] http://www.w3.org/2011/11/01-webrtc-minutes.html

Summary of Action Items

[NEW] ACTION: anant to check how to split getUserMedia from the spec
[recorded in
[32]http://www.w3.org/2011/10/31-webrtc-minutes.html#action01]
[NEW] ACTION: rich to send new use cases on getUserMedia to webRTC
mailing-list [recorded in
[33]http://www.w3.org/2011/10/31-webrtc-minutes.html#action03]
[NEW] ACTION: robin to draft a draft proposal for joint deliverable.
[recorded in
[34]http://www.w3.org/2011/10/31-webrtc-minutes.html#action02]

[End of minutes]

Received on Tuesday, 8 November 2011 15:38:27 UTC