W3C home > Mailing lists > Public > public-webrtc@w3.org > November 2011

[minutes] W3C WebRTC WG F2F in Santa Clara - day 1/2 - 2011-10-31

From: Francois Daoust <fd@w3.org>
Date: Tue, 08 Nov 2011 16:37:54 +0100
Message-ID: <4EB94CD2.1050708@w3.org>
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Hi all,

The minutes of the first day of last week's F2F meeting are available at:

... and copied as raw text below.

I'll send the minutes for day 2 right after this email.

Minutes include links to slides. By their very nature, minutes are kind of a dry read and do not always manage to convey the arguments that have been exchanged. I'll work on a summary of the two days that I'll send later on.


WebRTC WG F2F Santa Clara - Day 1/2

31 Oct 2011


       [2] http://www.w3.org/2011/04/webrtc/wiki/October_31_-_November_1_2011

    See also: [3]IRC log

       [3] http://www.w3.org/2011/10/31-webrtc-irc


    Present - group participants
           Harald_Alvestrand, Adam_Bergkvist, Dan_Burnett,
           Francois_Daoust, Dan_Druta, Christophe_Eyrignoux,
           Narm_Gadiraju, Vidhya_Gholkar, Stefan_Hakansson,
           Cullen_Jennings, Kangchan_Lee, Wonsuk_Lee, Kepeng_Li,
           Gang_Liang, Anant_Narayanan, Eric_Rescorla, Youngsun_Ryu,
           Youngwan_So, Timothy_Terriberry, Rich_Tibbett, Justin_Uberti,

    Present - observers
           Adrian_Bateman, Robin_Berjon, Mauro_Cabuto, Suresh_Chitturi,
           Manyoung_Cho, Mohammed_Dadas, Shunan_Fan, Tatsuya_Hayashi,
           Dominique_Hazael-Massieux, Tatsuya_Igarashi,
           David_Yushin_Kim, Ingmar_Kliche, Dong-Young_Lee, Ileana_Leuca
           (a few other observers attended the meeting)

           Harald_Alvestrand, Stefan_Hakansson

           francois, Rich, burn, fluffy, anant, DanD


      * [4]Topics
          1. [5]IETF Architecture Overview
          2. [6]Use-cases and Requirements
          3. [7]Security requirements
          4. [8]Status and plans in the DAP WG
          5. [9]Access control model and privacy/security aspects
          6. [10]Stages for moving to a Rec
          7. [11]Low Level Control
          8. [12]Data Streams
          9. [13]MediaStream
      * [14]Summary of Action Items

    See also: [15]Minutes of day 2/2

      [15] http://www.w3.org/2011/11/01-webrtc-minutes.html

    Stefan: [starting meeting. Reviewing agenda]

IETF Architecture Overview

    Slides: [16]RTCWEB Architecture (PDF)

      [16] http://www.w3.org/2011/04/webrtc/wiki/images/7/79/WEBRTC_Overview_TPAC_SC_presentation.pdf

    hta: The goal for RTCWeb is real-time communication between browsers
    ... arbitrarily define that as within ~100ms
    ... Trying to drive a design by use cases. Must have a design that
    meet the priority use cases.
    ... we want to design general purpose functions.
    ... one use case we're looking at is the interworking with legacy
    systems. We're fairly sure we want to make that work.

    hta: relays must be possible otherwise we don't have a universal
    ... <goes through the basic architecture in his slide deck>

    hta: All components (except RTCWeb implementing browsers) must be
    assumed evil.
    ... Keep trust to a minimum
    ... Need to look at mechanisms for establishing trust from a web
    page to a browser.
    ... data congestion control must also be a priority.
    ... RTP exists. We will use it.
    ... encrypt everything

    <this is controversial>

    hta: considering DTLS-SRTP key negotiation for that purpose.
    ... UI issues are important to the overall security.
    ... always fun to agree on codecs
    ... connection management: least controversial proposal is ROAP
    ... We expect innovation in what-connects-to-what
    ... ROAP does allow us to interconnect to SIP and XMPP based systems
    ... lots of other pieces, media buffering, muting, game control.

    hta: a lot of that needs to be done in the browser.

    burn: ...caveated by keeping in mind that we want to allow

    hta: W3C has an Audio group defining interfaces for accessing audio
    ... hopefully we'll be able to use that but we need to confirm that
    down the line.
    ... All of this is captured in [17]draft-ietf-rtcweb-overview-02.txt

      [17] http://tools.ietf.org/html/draft-ietf-rtcweb-overview-02

    DanD: We know web is beyond browsers. We do have the ability to
    execute web apps in non-browser UAs.

    DanD: We need to ensure that a browser endpoint can communicate with
    a non-browser endpoint.

    hta: We need communication to devices that are not browsers.
    ... We should not lose track of the browser use cases first and
    ... One principle is that as long as the other side obeys the
    interface then it doesn't matter what it is.

    DanD: Another comment RE: interdependencies with other groups. One
    example is on the discovery of the capabilities on other devices.
    ... this might be a missing piece in our discussions to date.

    anant: There are some capabilities in the proposal to negotiate.

    fluffy: If we figure out how the protocols work for interoperability
    then we might get this legacy interworking.

Use-cases and Requirements

    Slides: [18]Use Cases and Requirements (odp format)
    Draft: [19]Web Real-Time Communication Use-cases and Requirements
    IETF draft

      [18] http://www.w3.org/2011/04/webrtc/wiki/images/e/e1/Use_cases_and_reqs_v3.odp
      [19] http://tools.ietf.org/html/draft-ietf-rtcweb-use-cases-and-requirements-06

    stefan: <goes through some of the key use cases in his presentation>

    hta: regarding the Distributed Music Band use cases. We're going to
    need really low latency. Concert-mode? We also need to distinguish
    between voice and music where we will remove noise from the former
    that is not suitable for the latter.

    francois: Perhaps we should try to stick to something simple since
    the really low latency issue is a problem.

    stefan: It's in the use cases document anyway so we can discuss
    further on that.
    ... In the document there are a list of use cases where the
    discussion has died out.
    ... or not concluded.
    ... such use cases relate to different situations, E911, Recording,
    Emergency access, Security Camera. Large multi-party session etc.
    ... these use cases could get added to the document if they get more
    ... draft-jesup. I think we should cover both unreliable and
    reliable data channels for WebRTC data.

    stefan: draft-sipdoc. 4 requirements derived. I think this is
    covered by the current use cases document

    <juberti> I agree, these data use cases should go into this doc.

    <juberti> We only have one use case for data in the current doc.

    stefan: draft-kaplan. Doesn't introduce new use cases but does put a
    lot more requirements on the document.
    ... Questions/comments on the use cases?

    DanD: Observation: augmented reality is not covered.

    <francois> [20]Open issues on use cases and Req on WebRTC WG wiki

      [20] http://www.w3.org/2011/04/webrtc/wiki/Main_Page#Use_cases_and_requirements

    richt: we've been looking at that. We have the building blocks.
    Would be good to have a use case on this.

    DanD: that's covered in some of these use cases but maybe something
    we could add

    cullen: The ability to overlay a video stream on top of another
    would be good.

    richt: you could do it with canvas

    cullen: that has a big security implication.
    ... will talk about it later on.

    DanD: plus video might come from an ad-serving service.

    fluffy: Back to the 1-800-FEDEX use case. Anything we can provide to
    scope that out futher?

    stefan: not my area of specialty so feedback on this use case would
    be good.

    fluffy: The use cases puts emphasis on DTMF.

    burn: I agree that DTMF is extremely important. We have to support

    stefan: let's take a break since we're waiting on next presenter.

Security requirements

    Slides: [21]Security requirements

      [21] http://svn.resiprocate.org/rep/ietf-drafts/ekr/tpac2011/rtcweb-security.pdf

    ekr: IETF trying to work on thread models and security models. I
    don't think we're at the consensus level already, but here are the
    ... [showing slides]

    ekr: Funny state: Browser threat model, browser protects you. It
    includes the notion that you're in an Internet cafe. Basic security
    technique is isolation.
    ... Site A and site B sandboxed.
    ... Browser acts as a trusted base.
    ... IETF adds the Internet threat model: "you hand the packets to
    the attacker to deliver".
    ... In the IETF oriented view of the universe, cryptography is the
    main technique.
    ... We can't force people to use cryptography all the time.
    ... We need a solid protection under the browser threat model, and
    the best we can on the Internet threat model
    ... 3 main issues: 1) access to "local devices" (use my camera,
    ... 2) Communications security. If we do our job right, we won't
    have to worry too much about that here.
    ... 3) consent to communications, ties in with CORS, WebSockets
    ... Starting with access to local devices:
    ... If you go to visit a malicious, you have no idea where your
    video is going to. It can bug you. Somehow we need the user to
    consent, but it's not clear when, how many times.
    ... One thing I do want to mention is that people make a distinction
    between sending video to a site and sending video to another peer,
    but from a technical perspective, they are the same.
    ... Permissions models: we need short-term permissions, click on a
    button for an Amazon customer service. Not a long-term permission.
    ... Until last night, I thought we needed long-term permissions.
    ... Tim indicated that he was not sure browsers will want to do
    ... Do you want to support long-term permissions? That's a question
    for the group

    burn: why isn't this just a browser policy question?

    cullen: the question here is: is it a requirement for the group?

    burn: went through it in another group. Informed user consent is
    needed but can take the form of downloading the browser.

    ekr: Then, there's the notion of per-peer permissions.
    ... Another example of the short-term case, showing an example of an
    injected ad.
    ... [thoughts on UI for short-term permissions]
    ... This has implications for the API.
    ... user clicks and calls Ford, but he's on Slashdot
    ... Dialog showing video call. There needs to be a non-maskable
    indicator of call status so that you know you're still on the call.
    You need to be consistently aware that the call is going on.
    ... Access to microphone/camera linked with call permission.
    ... Back to the example, Slashdot might have to be able to say a
    ... [thoughts on UI for long-term permissions]
    ... Interface should be different. Possible: door hanger style UI.
    You want an action that is less easy for people to do during a call.
    ... There's a tension between convenience and security. It gives a
    lot of power to the site.
    ... That's an open question whether we want to support that or not.
    ... IETF has been assuming we want, so great feedback to have if we
    actually don't
    ... [thoughts on peer-identity based permissions]

    <juberti> I think we want to find a way to handle this. We don't
    want the web platform to miss something that will be present in
    native app platforms.

    cullen: what's important to you is where is that going. Our media is
    going to a different place than the Web site. The identity is

    burn: same issue in the Speech XG.

    hta: Usually, you can read the form and find the address in the
    form, but sometimes the address is constructed by the JavaScript.

    ekr: Partial digression on network attackers. If I'm in an Internet
    Cafe, and an attacker manages to inject an Iframe, he can bug my
    computer, redirecting the call to him. The attacker controls the
    network on HTTP.
    ... Assumption is that it's safe to authorize PokerWeb and then surf
    the Internet. It's basically the same on your Wifi if not secure
    ... An open question is: should this facility be available on HTTP
    at all? Mandate HTTPS?
    ... e.g. an HTTPS page that loads jQuery through HTTP

    DanD: not all the devices have the ability to securely preserve a
    token. That would be a good way to solve the problem.

    ekr: [thoughts on consent for real-time peer-to-peer communication]
    ... From a protocol point of view, we have ICE. Remember that you
    cannot trust the JS.

    burn: the point is you disabled security completely

    ekr: not entirely agree that it's the same thing
    ... Transaction ID needs to be hidden from the JavaScript
    ... When I surf to HTTP gmail, any attacker can inject the
    JavaScript and redirect calls for him.
    ... In the context of SIP, we're already addressed most of
    communications security issues.
    ... There's also protocol attack issue which hopefully should not be
    a real problem in the end.
    ... otherwise security issue.
    ... Assuming that ROAP style API is used, we're going to make it
    good to hide security settings from JavaScript.

    AdamB: IDs might be owned by FaceBoox, and so on.

    ekr: my view is: 3 basic scenarios. 1) Gmail to Gmail, Facebook to
    Facebook, etc. 2) Gmail to Facebook, etc. where you'll need
    federation of ID. 3) Identity separated from the service I use to
    make the call.
    ... I have some possible solutions for that. Happy to discuss.

    Cullen: My position is a bit stronger. This group wants encrypted
    calls, but if you can't tell who the call is going to, that's
    ... We need to take that into account.

    hta: for many cases, I think it's quite ok to say that the call is
    encrypted to an identity and that this identity is verified by the
    fact that the guy I talk to presented himself.

    cullen: I want to know the trust chain. If this call is being
    intercepted, I want to have some indication on that.

    anant: slightly disagree with what Harald said.
    ... The federated use case.

    burn: how do you know that things are going to the right person?

    Anant: given that we have that use case in the document, we have to
    touch upon that issue.
    ... We want a completely peer-to-peer system in the end.

    ekr: Is there a good way to bootstrap these systems? I think the
    answer is "yes".

Status and plans in the DAP WG

    Stefan: wanted to know status of controlling camera and microphone.

    robin: Hi. I'm chair of DAP. We need to figure out how we split the
    work on who does what.
    ... We haven't done a lot of work on Media Capture recently.
    ... One dividing line that could be useful: DAP could be picking up
    media capture very quickly, some interest from DAP side.
    ... We would do the simple thing that doesn't include streaming or
    any complex processing.
    ... Then hopefully this would be pluggable in what this group needs

    burn: what do you mean without streams?

    robin: you could not bind a video stream to some back channel, but
    you could do stuff such as video mail or recording.
    ... In the declarative style, most of it in the browser.

    hta: main difference is who controls the UI.

    Anant: If you're going to do programmatic access, important to agree
    on what they look like between groups. Another solution is you take
    care of declarative, and we handle programmatic way.

    Anant: If you do programmatic way, we may end up with two APIs doing
    sensibly the same thing

    robin: heard feedback that some people wanted to do simple things

    anant: cannot "simple" be done with pure declarative approach?

    robin: not really.

    anant: something we've discussed in Mozilla. Media type in the
    input, such as video/mp4. The browser prompts user with camera view.
    Nice property that is avoids to deal with security issues in a nice

    robin: it would be useful if you had a demo you could show in DAP.
    We're meeting Thursday/Friday.

    Adrian: Microsoft just joined DAP. One of our interests is media
    capture. API based on what getUserMedia is doing. WebRTC could build
    on top of this API. This way, we could split the work easily.

    Anant: does that mean that you have use cases that require
    programmatic APIs?

    Adrian: yes, in general we want developers to build their own

    Cullen: how do you deal with permissions?

    Adrian: same way as other APIs

    Cullen: agree with short-term, long-term permissions presented here?

    Adrian: need to check, but didn't look wrong.

    richt: in Opera, we agree that many use cases require getUserMedia
    but we want to decouple that from peer-to-peer connectivity. So
    agree to split things up.

    Anant: can two groups work on the same spec?

    Adrian: liaison explicit in the charter of WebRTC. Feasible for DAP
    to own the spec and go through the liaison.

    richt: Peer-to-peer relies on a stream. We give you a stream and you
    deal with it.

    Cullen: that's a bit more complex than that, because of the hardware
    support for compression, and permissions too.
    ... It sounds DAP needs a permissions model as well and doesn't have
    one for the time being.
    ... We have all the permission problems that have to be enforced at
    the getUserMedia level.

    richt: the barcode scanner, face recognition use cases haven't been
    taken up in the group.

    cullen: I don't think anyone will disagree with these use cases

    hta: want to make things more complicated ;)
    ... If you go on with the assumption that media is always sourced
    locally, you're in the bad corner.
    ... As long as it's a media stream, the current getUserMedia doesn't
    care where the stream is coming from. I look at it as a first and
    easy step.
    ... thinking about Web Introducers.

    robin: That's a DAP deliverable. I'd rather not drag this spec in
    this discussion, although I agree it's a good way to make

    Anant: the resources you get are not more priviledged.

    hta: I was more thinking about my computer getting access to your
    ... We might want to explore deeper levels of complexity for passing
    streams around at a later stage.
    ... In terms of where things go, the WebRTC WG is chartered to get
    this thing done. The charter is written in such a way that if
    someone else does it, that's good!
    ... What I don't want to happen is one group that comes with a
    vocabulary that describes front camera, back camera, etc. and
    another group coming with one on camera orientation, in particular.

    dom: can getUserMedia be split from WebRTC spec in general?
    Independently of where the final spec resides, that's something
    people are interested in seeing sooner rather than later.

    cullen: I'm just wondering how much faster things will be if we
    split things out. Browser vendors in WebRTC already indicated their
    intention to implement the spec.

    dom: Implementations of getUserMedia in Opera.

    hta: there's on in Chrome too but part of RTC.

    richt: we're going to push something out soon with getUserMedia.

    burn: actually, it's a "super-subset".

    Anant: if it's published as a separate spec, the use cases of
    getUserMedia are a subset of use cases.

    cullen: what I worry about is totally changing the directions we're
    going to in something we're supposed to ship in a matter of months.

    [discussion on Microsoft joining WebRTC]

    dom: one way to have the IPR commitments that we want is to split
    spec out.
    ... That means adding the SOTD, and accepting DAP's input.

    Anant: if we start taking input from DAP, we're going to lose time.

    dom: I don't think so, actually.
    ... nothing more than what we'll get with last call comments.

    [further discussion on getUserMedia]

    burn: this group wants to move forward very quickly. Other want it
    for other purpose. Is there a way to do something quickly that does
    not prevent other uses?

    hta: getUserMedia returns a MediaStream, so MediaStream needs to be
    defined before getUserMedia

    cullen: [back to hardward support for video compression]
    ... Lots of things are wrong and need to be fixed. We haven't
    focused on this right now. I'd like to see use cases that we're
    missing (yours are great, richt).

    stefan: that's the direction I'd like to follow, yes.

    burn: yes, would be good to have use cases to see what's missing.

    richt: the only thing we get from getting to DAP is extra IPR
    coverage and comments.

    cullen: is there a way to get comments early on?

    adrian: there's a lot of process involved to get comments sent to a
    group we're not participating on.

    [discussion on IPR commitment]

    robin: Nothing bad in splitting up and doing a joint deliverable.

    dom: getting comments is something the group needs to do.

    Suresh(RIM): so what happens to the draft in DAP's group?

    robin: we'll kill it and keep the declarative one.

    richt: it needs killing. Nothing happened on this spec for a year.

    Stefan: so what do we need to do in the end?

    dom: we need to ensure DAP agrees with that direction and then you
    need to split up the part.
    ... The key question is where you draw the line. The administrative
    side is easy.

    Anant: Fine to reference WebRTC spec for definition of MediaStream?

    dom: yes, but introduces a dependency in terms of timeline.
    ... Other question is editing.

    cullen: I want someone with deep understanding of video

    Adrian: we're happy to participate to make things easier since we're
    making things more complex to start with.

    robin: ready to volunteer an editor?

    Adrian: I think so.

    burn: if requirements are separable, that may be good to separate

    cullen: I think this group should agree on the mailing-list before
    things get done.

    Stefan: we have had chairs discussions earlier on.

    richt: all of the work is staying in WebRTC in the end.

    robin: all you get is better IPR protection and better comments.

    cullen: important to put it on the list, first time people will hear
    about it.

    stefan: anyone objecting to have a joint deliverable?

    PROPOSED RESOLUTION: split up getUserMedia and publish as joint
    deliverable with DAP WG.

    cullen: worried that joint deliverables always take longer.

    robin: one thing that is important is to specify which mailing-list
    takes discussions. We really should not have joint deliverable where
    discussion is split in groups. Smallest issues turn into a war when
    that happens.

    <richt> proposal to RESOLUTION status: one/two week period for
    mailing list discussion. Resolution to be made on next conf. call.

    cullen: this whole thing is an integrated system. It's going to be
    very difficult to discuss this without discussing other ideas.

    dom: I think the key issue is splitting the spec, not the joint

    robin: if we can't split the discussion, then we probably can't
    split the spec.

    burn: question is: can we write WebRTC requirements for getUserMedia
    precisely enough for this virtual joint working group.

    cullen: you'll need so much low-level details in getUserMedia

    robin: two actions: one on splitting the spec, second on refining
    joint proposal.

    <scribe> ACTION: anant to check how to split getUserMedia from the
    spec [recorded in

    <trackbot> Created ACTION-8 - Check how to split getUserMedia from
    the spec [on Anant Narayanan - due 2011-11-07].

    <scribe> ACTION: robin to draft a draft proposal for joint
    deliverable. [recorded in

    <trackbot> Sorry, couldn't find user - robin

    burn: Adrian, do you actually need to see something pulled out first
    before you can help out?

    Adrian: we can help with splitting out the spec, I think.

    burn: it's more a pratical question, given the way editors work in

    cullen: can someone send use cases on one of the mailing-lists?

    <scribe> ACTION: tibbett to send new use cases on getUserMedia to
    webRTC mailing-list [recorded in

    <trackbot> Created ACTION-9 - Send new use cases on getUserMedia to
    webRTC mailing-list [on Richard Tibbett - due 2011-11-07].

    [discussion on DAP interaction over]

Access control model and privacy/security aspects

    Slides: [25]WebRTC: User Security and Privacy

      [25] http://www.w3.org/2011/04/webrtc/wiki/images/7/73/Webrtc_privacy.pdf

    anant: currently don't specify what happens with user permission
    when using getUserMedia
    ... UAs vary, so may not be appropriate to define a standard for
    ... propose we write guidelines for browsers rather than something

    richt: this is definitely difficult to get right. UA should provide
    opt-in in UA

    francois: typically such SHOULD requirements aren't testable so they
    become guidelines in the end
    ... there is a way to make such informative statements

    hta: browser differentiation is harmful to user. we have enough
    browser representation here to figure out where we have agreement
    and should have recommendations that reduce unnecessary

    richt: we don't mention doorhangers because there is a lot more that
    can be done.

    fluffy: we can say "browser needs to somehow do X" without
    specifying precisely how.
    ... if completely optional no one implements. we can learn from
    existing softphones, etc. I like the "check my hair" dialog, a UA
    where there is a popup that tells you you're sending video and who
    you're sending it to. PeerConnection could confirm that this is

    (UA = User Agent = Browser)

    fluffy: e.g. JS can select camera, provide name of contact to that
    is displayed at the same time.
    ... can't check before connection happens, but later can cancel if
    PeerConnection learns name is wrong

    anant: mandate requirements on UI but not how to do it.

    burn: +1

    anant: hta believes that opera user contacts chrome user, so
    differences could be confusing. right?

    francois: some apps will use getUserMedia to send it, and others
    will use it for local purposes, so needs are different

    anant: maybe app has to make clear what media will be used for.

    francois: user might have consented to call in advance of using

    anant: we can check for stored permission
    ... do we have consensus to lay out steps but not specify how?

    (generally yes)

    richt: not sure. we don't know what we need to show yet

    anant: we know some things, like previewing video

    richt: anything that doesn't affect interop should not be required

    fluffy: where we need encrypted name we need to require this

    richt: let's not bake in too quickly because we are still

    fluffy: today we support encrypted media (but not yet required).
    problem would be like using TLS but not showing name of site.

    anant: we need global identifiers

    adambe: with p2p may not know all names in advance.

    anant: UI for accepting and initiating calls may be very different

    adambe: what about two people talking and a third joins. media
    streams already availaeble.

    fluffy: same problem if you have single conversation moved from one
    entdpoint to another

    hta: good to discuss, but don't agree with cullen's request to
    mandate requirements. want to hear about stuffy other than just

    anant: (returning to slides)
    ... do we allow apps to enumerate devices? no, would like for app to
    request what it needs (say, hints proposal).
    ... if user agrees, we return success call.
    ... user should always have complete control over what is
    transmitted, independent of what the app asks for

    adambe: with proper hints you need to enumerate and can get same
    result. prefer hints approach

    fluffy: every app i use for voice and video allows me to switch
    cameras and mics. how does that work

    anant: don't want app to choose switching, but want user to be able
    to switch
    ... UI has to be independent in UA independent of app

    burn: in html speech we have notion of default mic. app doesn't
    choose, the user does via the chrome.

    fluffy: yes, happens all the time. i'm using existing crummy mic or
    camera, go find a better one and plug it in.

    Tim: others want to know what's available in advance so you don't
    even prevent option if it doesn't exist

    anant: hints can solve this. some hints are compulsory, others

    francois: can't app just check?

    anant: this way doesn't reveal info about user.

    burn: failures give user info

    richt: yes, hints are good. web app doesn't need to know which

    <richt> webapps provide a hint in the true sense of the word but the
    impl. can fallback to any camera if necessary (rather than fail).

    <francois> francois: exposing capabilities is fingerprinting issue.
    Exposing "incapabilities" is as well.

    anant: the comment was that UIs are best when they know what devices
    are available

    <francois> anant: right, the key is the time it takes so that the
    app can't tell it's a fail because of an incapability and a user

    hta: if you don't know what's available you can't distinguish
    between "you need more cameras to run this app" and "you need to
    allow me to use more cameras"

    richt: we can't allow fingerprinting
    ... one error, regardless of how it fails

    fluffy: when would you need a case where you'd rather have a failure
    than use a hint?
    ... would rather feed one camera into both than a failure

    anant: (back to slides, showing early mockup)
    ... doorhanger hanging off info bar indicates that it's a web app
    rather than the browser. don't like this approach, but best so far.
    ... we have "hair check", live preview of camera before
    communication is active. can mute audio, click to share cameras
    ... webcam button on address bar gives you options to change cameras
    (in UI, part of browser)

    adambe: what about webcam with microphone display in it

    anant: we should allow it, but may be an advanced checkbox. want 95%
    of use cases to be handled

    fluffy: sounds need to be able to changed to where they come from
    and where they go.
    ... we will see this more and more as you have more devices. "skype
    headset" and "facebook headset"

    richt: what about tabbing implications. when you swithch tabs need
    to know what happens

    anant: will get to that
    ... (back to slides) default to what app asks for but users can
    always override
    ... preferences pane to control all

    anant: mockup used one-time permission grant model
    ... we allow user to say "always allow example,org to access a/v"

    tim: if browser on phone and in pocket and permission has been
    given, app could just turn it on in my pocket. accelerometer info
    can tell you that the person is walking (and may have in pocket).

    richt: we will use some kind of visual and/or vibration to indicate

    anant: we need something because users won't want to clkci every
    time at facebook

    richt: we could try to learn it based on user behavior

    fluffy: from privacy standpoint, webex on your phone and laptop
    could do this today.
    ... it always starts with strong privacy position and eventually
    disappears to no privacy. better to have something only strong
    enough that it is still used
    ... indicators are probably more important than prevention
    ... anything stronger than this will be widely ignored.

    richt: that is already in spec

    anant: maybe sholud also have vibration or audio indication

    stefan: how is this compared to geolocation

    richt: we are 10, they are 2

    adambe: like "watch position" but without user knowing

    anant: need to let user know that previously-given permisison is now
    using it

    fluffy: users hate apps that grabbed device and turned on indicator.
    needs to be when device used.

    anant: in today's world we won't exclusively grab device that way
    ... should web app be able to specify what type of access it needs?

    richt: user should always be in control

    francois: maybe app could say instead when it does'nt need long-term

    hta: option of granting long-term access only the second time you
    try it has worked well

    anant: (back to slides) initially tried to tie permisison grant to a
    time-frame and domain name.
    ... deevvlopers hated this. want permissions tied to user session
    not just domain
    ... could perhaps allow app itself to revoke a permission if it
    detects a change in user session.

    fluffy: can JS app provide a user-identifying token, so can index
    using both user criteria and this token

    anant: yes, as optional param in JS call. could try it.

    richt: browser can handle this since it runs session.

    anant: we don't know what's in cookie, so no.
    ... but most websites won't use it.

    burn: financial sites wil like this.

    fluffy: bad guys don't care but helps good guys ==> okay

    richt: if you injected script that just replays in different domain
    you can get permission easily

    anant: how

    richt: user-installed script

    anant: yeah, but then you can do anything
    ... (back to slides, showing mockup of notification)
    ... one option is the entire tab pulses, with camera/mic control
    right on tab.

    richt: we pin audio/video. user has to explicitly request keeping

    hta: needs to be in spec
    ... switch tabs all the time and want my voice to be heard

    anant: tricky across all UIs, including video phone

    fluffy: something unspecified that irritates user is whether a video
    starts playing when you open a new tab. we should make this the same

    anant: we browser vendors need to work this out.
    ... prefer default of not blocking audio/video just because you
    switched tabs. if new tab wants to start video, should ask user.

    richt: but may be hard to tell which tab has audio/video

    anant: if whole tab pulses it works
    ... (back to summary slide) what happens if device already in use by
    other app
    ... maybe can't tell which app is requesting access
    ... what is interaction for incoming call. assume signed in to
    service to receive call/audio

    fluffy: yes, but others might want web apps that run in the
    background and have no bar (headless web apps)

    hta: if headless web app reads sdp off disk and passed into
    PeerConnection, it should just work, with no browser connection.

    anant: so we should allow headless apps and let browser determine
    how incoming call works.

    fluffy: some chrome has to be involved when video is requested.

    anant: yes. js can tell user about incoming call, but then need to
    get permission.

    hta: gum (getUserMedia) should have enough info to identify where
    call is from
    ... apps will want "one button accept". can't avoid showing some
    chrome. would be better for that to be the doorhanger. neeed extra
    API call so web app calls receiver's browser and asks if they want
    to accept. then get doorhanger.

    oops, previous speaker was anant

    richt: (missed detailed example)

    fluffy: sometimes want long-term approval to at least negotiate and
    reveal IP address. also a different mode where don't reveal IP
    address until user has accepted.
    ... first one allows you to deal with ICE slowness by doing ICE and
    acceptance in parallel.

    francois; users won't understand this distinction.

    fluffy: okay, then maybe don't need first case.

    anant: we don't know how to implement incoming call.

    richt: can do OS-level notification

    anant: yes, but also want to give all the user controls when
    accepting call.
    ... other questions (not on slides)
    ... what about embedded iframes? we don't allow anything other than
    toplevel to do that. an iframe would have to pop up its own toplevel
    window to do this.

    richt: what happens with geolocation?

    anant: we don't do the same but would like to
    ... other use case is where ad is embedded in slashdot. In that case
    slashdot is accepting responsibility and you are giving permission
    to slashdot.

    richt: iframes from different origin

    anant: yes. if same origin we just let them through.

    (general approval of this approach)

    adambe; what about call-in widget you can add to page.

    anant: can't avoid this.

    adambe: could sandbox the iframe.

    anant: problem is that user does'nt know it's a different site.
    ... when new top bar user can tell
    ... also, only allow long-term approval for https
    ... don't enforce https for all uses, but definitely if site wants
    long-term access

    fluffy: what about mixed content
    ... will probably need more discussion. everyone will hate requiring
    https, but they may realize they need it.
    ... difficulty today requiring https is that many sites would break
    today. but with new sites where everything needs to be built from
    scratch, like with webrtc, we could require it now. we should
    consider it.
    ... but we need more info.

    richt: could do tls as JS, so that might take care of it

    hta: that's giving JS direct access to TCP
    ... JS should not have this power!

Stages for moving to a Rec

    Slides: [26]W3C Recommendation Track

      [26] http://www.w3.org/2011/04/webrtc/wiki/images/5/5c/Webrtc_w3c_rec_track.pdf

    Moving on to Dan talking about W3C Recommendation practice
    ... discussion on consensus, moving to First Public Working Draft,
    periodic publication]
    ... Good to reach out to groups with opinions early.
    ... On the "Candidate Recommentation" slide, at this stage, you
    defend the document needs - at this point, you need to have a test
    suite that tests the spec, not the implementations.

    anant: Is this code and what is it run against?

    francois: There is another group trying to come up with generic test
    framework that can be used
    ... Should think about how to write a testable specification when
    you write the spec

    Dan: great to have the spec working be the same as the assertion
    code in test

    Can two implementations share code? If have good answer, perhaps OK,
    but ...

    Some times single implementations of optional features

    Dan: on to Proposed Recommendation slide

    francois: This is stage where W3C members have their last chance to

    Dan: On to Recommendation slide

    anant: How do we deal with later version of spec for features we
    wanted in a later version ?

    francois: Need to recharter WG, go through same processes,
    ... also a proposed edited rec to include errata (not very common)

    Dan: On to addressing public comments

    Harald: What's process when can't agree

    francois: The group is strongly encouraged to avoid such situations.
    Comments can get escalated as formal objection that goes up to W3C

    Dan: On to Status of WebRTC API draft slide

    richard: should we stay at candidate for a year or so

    Dan: better to have an exit criteria - such as meet this number of

    Dan: two specs on their own time line other than whatever reference
    dependencies are

    anant: If we are doing two specs, should we push out our dates
    beyond Q2 ?

    francois: at the point we know we won't make it, then will need to

Low Level Control

    Slides: [27]Low-level control

      [27] http://www.w3.org/2011/04/webrtc/wiki/images/f/f6/Webrtc_lowlevel.pdf

    Moving to low-level control presentation by burn

    Dan: original proposal for a low level API (link in slide 2)
    received limited discussion and little support from IETF's signaling
    ... But there is some interest in a low level API
    ... Look at [28]requirements document (IETF) by hadriel to drive
    ... Hints vs Capabilities will be an interesting discussion
    ... Some discussion now but we should move it to list soon
    ... Existing requirements are not the same level (higher level) than
    what we want for low level hints and capabilities
    ... Browser UI requirements are things we've discussed and should
    move into the current document

      [28] http://tools.ietf.org/html/draft-kaplan-rtcweb-api-reqs-00

    Dan: Media properties are the interesting ones
    ... A2-1 a web API to learn what codecs a browser supports

    anant: How does this relate to JS application-level

    fluffy: that's independent of an API that exposes what codecs the
    browser takes

    tim: the API can only be used after the user has consented, so
    there's already some trust in the app

    fluffy: we should go through all of the requirements

    <juberti> regarding fingerprinting, aren't we sending user-agent

    <derf> We've (jokingly) discussed replacing the user-agent with an
    empty string.

    <juberti> i think there are enough implementation differences that
    fingerprinting can be done using existing apis.

    juberti: need to be able to query browser capabilities so that JS
    can generate SDP on its own

    (without user consent?)

    <juberti> user consent is ok

    <juberti> this would happen around the same time as camera access

    <derf> But if you're going to have hardware codecs, capabilities can
    differ even with the same UA.

    <juberti> the thought experiment here is whether it would be
    possible to fully implement signaling, except for telling the
    browser what the offer and answer are.

    <juberti> (fully implement signaling in JS)

    <ekr> there are a lot of fingerprinting mechanisms out there. is
    this really making it worse?

    tim: but how can you restrict information if you want JS to
    encode/decode (eg: hardware support for some codecs at certain

    <derf> ekr: It clearly makes it worse. The question is, is it worth
    the price?

    <juberti> I don't like having to expose a billion knobs to JS, but
    if we can give the browser a SDP blob from JS, that might allow a
    flexible but simple compromise.

    harald: if you negotiate on the principle that SDP is generated
    independently of setting up media streams then you don't need
    permission - there are use cases for that

    <juberti> to generate said blob, we need to know what the browser

    <derf> juberti: Sounds like you're asking to give the browser an
    ANSWER from JS, and you want an OFFER in order to generate it.

    <derf> Or did I miss what you were really asking?

    <juberti> derf: I want to generate an OFFER in JS. I send the offer
    to the remote side, and also tell my own browser about it. The
    remote side generates an ANSWER in JS from the OFFER, tells the
    browser about both, and sends the ANSWER back to the initiator. The
    initiator then plugs the received ANSWER into the browser, and media

    <derf> juberti: Okay. Why can't you do that with ROAP today?

    <juberti> a) you can't generate the OFFER, since you don't know the
    browser caps. b) even if you could generate your own offer, there's
    no way to tell the local browser about it. lastly, the state machine
    for ROAP lives inside the browser, so the JS can only do what ROAP
    allows (i.e. no trickle candidates like Jingle)

    <ekr> clarification: trickle candidates is candidates in pieces like
    with Jingle transport-info?

    <juberti> ekr: exactly

    <derf> juberti: fluffy is saying what I would have replied to you
    right now.

    dan: we jumped from A2-2 to A2-3, but they both look like they go

    fluffy: what is the use case for knowing codec properties? it only
    makes sense if you can control the properties

    <Mani> would it be more appropriate to require that the capabilities
    described should be consistent with the capneg RFC5939 security

    adam: is A2-2/A2-3 a codec abstraction of some kind?

    harald: you want to select the best possible codec for a given
    bandwidth requirement

    harald: different for video and images etc.

    richt: considering whether you can update the SDP proposal the
    browser sends to the JS directly through JavaScript

    cullen: when we get to ROAP, we'll see that it's possible.

    anant: in order for JavaScript to add things to SDP, it needs to be
    able to query.

    cullen: if the browser supports stuff that it didn't say it
    supports, then it's only normal that you cannot use it.
    ... I think you're going to get that one way or the other, so not
    opposed to an API.

    hta: we don't have an opaque proposal between browsers right now.

    cullen: in the SIP proposal, you do

    hta: cannot be used to setup the initial connection

    <ekr> SIP isn't really opaque, it just looks opaque.

    cullen: if we're trying to protect from fingerprinting, we need to
    know what kind of information we think we can reveal.

    anant: hardware information is the critical key
    ... Easy to identify who the user is with some nuances on hardware

    hta: are we getting it worse in a way that makes a difference,
    that's the question.

    [exchanges about fingerprinting]

    cullen: my guess is that even fingerprinting was revealing that I'm
    using a Mac Book Air, that's still a large set.

    <ekr> there's a lot more uniqueness than that. For instance, window
    size, fonts, plugin support, etc.

    <ekr> Important to distnguish between new capabilities that expose
    more information to the server versus capabilities that expose info
    to the peer.

    burn: going through the requirements provides food for issues that
    are relevant.

    hta: looking at A2-4, in many scenarios, the application is the best
    place to know what can be cut off.
    ... e.g. stop sending video that 's not crucial for this

    cullen: I would be very concerned if the congestion control loop was
    done in JavaScript.

    hta: my thinking is that, in the case when the message is "no way to
    get more than 100Kb/s through", the app can react and select the
    streams it wants to send.
    ... then the browser can take it from there.

    cullen: level of control in JavaScript is: on/off, framerate,
    bandwidth... slippery road.
    ... Where do we draw the line?
    ... Implementation experience will teach us a lot here.

    hta: I very much agree with that.

    anant: declarative approach could work, e.g. "please turn on the
    stream at this bitrate"

    burn: moving on level in audio streams requirements A2-8 and A2-9

    cullen: security implication I think. Attacker can detect volume,
    and could perhaps derive words from that.

    [moving on to A3-x requirements]

    <ekr> cullen: depends on granularity with which it is reported

    cullen: getting for SSRC and CNAME is good. Setting is more of an

    hta: what if you negotiate the Payload Type value and then change it
    ... I don't see a reason to allow an API to do something that is not

    burn: A3-4 is basically already possible.

    anant: what does it mean to set the audio and video codecs of
    streams you receive?
    ... At the point of rendering, it's too late.

    hta: take all A3-4, A3-5, A3-6, A3-7, A3-8 together, it amounts to
    "the application must be able to configure a media stream across RTP
    ... I don't think the right approach, but I'd prefer to see a
    requirement like that actually.

    <juberti> for receive codecs, you might choose to change the PT

    <juberti> and you'd need to tell the media layer about that.

    [discussion on A3-10 and A3-11, same in requirements although not as

    anant: do we have use cases that we can map to these requirements?
    That would be useful.

    burn: there were some general description that provided some context
    for theses. I didn't want to read it here.

    anant: it would be easier to get it into the spec if these
    requirements were motivated by actual use cases.
    ... We should get more specific about the level of extensibility we

    burn: there is a list in section 3 of this document. It explains
    what the problems are

    anant: not convinced by argument 6) (some Web application developers
    may prefer to make the decision of which codecs/media-properties).
    ... don't see why you need to involve the server at all.

    hta: it's clear that we don't have general agreement on how this is
    ... let's wrap this up.

    burn: Moving on to hings API, last discussed on the mailing-list.
    Simple example is "audioType: 'spoken"music"
    ... question is which level of details.
    ... Agreement that this is needed.
    ... Question is do we need an API for that?

    anant: new things will keep coming. Extensibility is needed.

    cullen: agree.
    ... IANA registry could be used, I think.

    burn: problem in other groups is knowing the IETF process. Won't be
    a problem here.

    hta: we have to define some kind of namespaces for hints. Just one
    level, multiple levels, strings, tokenized, etc.

    DanD: two things, structure and semantics.

    burn: someone may want to propose finer granularity that you want to
    relate to other values.
    ... in the end, they are hints, so it doesn't matter so much. If you
    give something that is general, and something that is specific, you
    don't know what you're going to end up with.

    adam: side comment that the hints should be an optional argument to


    stefan: we should reuse MediaStreamHints object for getUserMedia

    anant: true.

    hta: having just one registry is probably ok. The video, you could
    have a hint saying low resolution.

    burn: one registry makes sense.

    anant: different object but same values

    burn: moving on to Statistics API.
    ... MediaStream.getStats()

    DanD: where do you specify the timeframe for those statistics?
    ... maybe just "what the system knows".

    <derf> burn: Just a nit... if your processingDelay is 20 ms, I
    expect your framerate is 50 fps.

    cullen: agree. Maybe we can steal this from the IETF XRBLOCK WG

    hta: the caller can always call the function twice and check
    ... just return total, and the time you think it is at the time when
    the function is called. Then easier to compute average.

    DanD: important for that to be extensible.

    cullen: there needs to be some of stats that need to be mandatory to
    support. Multiple layers of stats are possible.
    ... any structure you put in there is not really useful, you have to
    know the property.

    hta: structure might buys you some namespace.
    ... Same property may be defined in different areas, so prefixing
    might be good.

    burn: I'm not hearing any disagreement here.

    hta: I note devil's in the details.

    burn: then, moving on to Capabilities API
    ... ROAP proposes to get an SDP blob back.
    ... getCapabilities() would return an SDP blob.
    ... It's using the syntax to represent capabilities

    cullen: let's take fingerprinting off the table for a second. This
    seems to make sense, though it may not be the syntax you could dream
    about to list codecs you support.
    ... This seems to give you all the information.

    anant: why do you need this info in advance?
    ... more reliable to wait until getUserMedia. No guarantee you'll
    get video when the call is made.

    DanD: I would render a different UI if I know video is not

    anant: you could do that later on.

    cullen: lots of application grey out the video when not available
    for instance.
    ... use case for "video", not specific codec.

    DanD: on a mobile device, I may present a widget on the spec if I
    know I have support video.

    anant: I understand the argument. I don't like it because you need
    to gracefully handle the case when video is not available in any

    Tim: the expectation is that it would be rare.

    hta: you should be able to set a callback that "if capabilities
    change, I want to know"

    cullen: right.
    ... First, is video available? Then, can comeone come up with a use
    case for more detailed info?

    [more discussion on fingerprinting, if you know when the camera
    comes in, you can correlate the user on Facebook and Google+, for

    burn: general interest in something like this, except
    getCapabilities early on and then callbacks.

    anant: we can figure out later on if it's callback or event.
    ... we're going to try what Cullen suggests: simple audio/video,
    then if someone comes up with a use case for more, we'll add more.

    DanD: good, but let's not restrict. Extensibility would be good, not
    to change the spec afterwards.

    burn: suggests that the browser simply lies about more specific
    ... 3 APIs presented here. Who's gonna do this?

    cullen: happy to work on the callback, with Anant's help.

    burn: will work on the hints API

    cullen: all three of them assigned to editors spec.

Data Streams

    Slides: [29]WebRTC Data Streams

      [29] https://docs.google.com/presentation/pub?id=10OpPqGB2hhXxMFLeqok5wrwL10oDzUK4Vq7hqy_N5pc&start=false

    juberti: There are use cases for unreliable data
    ... Need for the datachannel for mesh apps
    ... Encryption should be required for the data channel
    ... Design for DataStream should be similar to MediaStream
    ... there is no need for inheritance between DataStream and
    ... We'll use the same flow as in MediaStream to attached to the
    peerConnection instead of an atomic flow

    fluffy: I like this proposal. I think the priority needs to be
    addressed as people tend to set priority high.

    juberti: We can keep it very high level with specific enumerations

    fluffy: Trying to come up with some other prioritization ideas

    anant: What is the use case for the readyToSend?

    juberti: Application should have some notion of the flow stage
    ... You need to know if you have buffer available

    anant: we should align this with webSockets

    fluffy: we need flow control for a large transfer

    hta: the JS app has the concept of blocking

    anant: What if the developer wants to block?

    Adam: It can't

    anant: API looks good
    ... How about security considerations?
    ... how do you know who's on the other side

    fluffy: You would have been able to send this anyway

    anant: what are the different attack possibilities? Should be

    juberti: What's unique is that you can send it in peer to peer way.
    No server involved

    hta: You said data must be encrypted
    ... being encrypted will take care of some concerns
    ... it would make more sense to have a constructor of itself and
    then be attached to a peerConnection

    Milan: Question about ack

    juberti: The choices considered for the wire protocol make it useful

    Milan: Protocol has an ack and it doesn't need to be exposed
    ... an example with the ack would be useful to understand

    juberti: I'll take it as an action point

    Stefan: we can conclude this session

    juberti: I'll have it updated and sent to the mailing list for

    fluffy: this is just the API proposal not the actual implementation,
    ... We're moving along with this until we figure out the

    juberti: Requirements came from the wire protocol

    fluffy: looks good. Can we build it?
    ... That's what I'm concerned and maybe we should relax our

    <francois> [ref possible alignment with Websockets, perhaps change
    "sendMessage" to "send"]

    francois: there's a process called feature at risk


    Slides: [30]MediaStream slides (odp format)

      [30] http://www.w3.org/2011/04/webrtc/wiki/images/1/1c/MediaStream_TPAC_2011.odp

    [going through slides]

    cullen: why do audio tracks precede?

    adam: if the last track is not a video track, you can assume there's
    no video in there.
    ... there used to be 2 lists.

    anant: the order doesn't have to correspond to anything.

    cullen: there's another ordering in SDP.

    anant: not related.

    cullen: wondering whether that ordering could be the same.
    ... just strikes me as something weird.

    DanD: think we should be explicit that the order does not have to
    match that of SDP

    anant: the only people who have to worry about that is browser
    vendors, no need to be exposed to users.

    stefan: I liked it better when there were two different lists.

    adam: it was easier to query whether there is audio or video.
    ... Moving on to definitions.
    ... MediaStream represents stream of media data. Do I need to go
    through it?

    cullen: find this definition fascinating. Can you have stereo audio
    in two tracks? Is voice and video one track? audio and DTMF? No

    anant: a track is lowest you can go. Having 5.1 audio in one track
    looks weird.

    <juberti> what about comfort noise?

    <juberti> is that the same track as audio?

    cullen: need some group for synchronization, but separate thing.

    anant: getObjectURL function is on the MediaStream, right? When you
    assign a stream to a video element.

    cullen: presumably, if I have a stream with 3 video streams, I want
    to send it to 3 different video elements.

    anant: media fragment could be used to select the track you're
    interested in.

    DanD: as long as we all agree on what's inside, we're in good shape.
    ... This is a good start for a glossary.

    cullen: let's say that graphic card has VP8 support. You can't
    assume that the clone happens before the decoding happens.

    [discussion on gstreamer and tracks]

    anant: I think gstreamer has two separate tracks-like for stereo

    tim: surely, a 5.1 audio is one source for gstreamer.

    adam: the motivation to remove the parallel between MediaStreamTrack
    and media track is that audio was a multiple list whereas video was
    an exclusive track.

    hta: basically one media streamtrack is one stream of audio.

    cullen: stereo is two tracks, 5.1 is 6 tracks. That's very easy to
    deal with.

    anant: you want to be able to disable audio tracks.

    tim: how do I know which track is the rear right and so on?

    DanD: technically, with 3D video, you'll want to sync those two

    francois: 6 tracks for 5.1 audio means disabling audio is disabling
    6 tracks.

    anant: we can add a layer at MediaStream level.

    burn: the real world allows both, combined or not.

    cullen: question is does something that is jointly coded with
    multiple channels, is that one track?
    ... If that's one track with a bunch of channels, the fact that it
    could be represented as two tracks sounds like a complete disaster.
    ... We need some abstraction layer to ease the life of Web

    hta: in the case of 4 microphones, you want to send 4 tracks. With
    6, you want to send 6 tracks.

    anant: I think early implementations will only support one or two
    channels at most.

    tim: there are plenty of places where we can get audio that is not
    one channel.

    anant: right, from files, for instance.
    ... my preference is to stick to a MediaStreamTrack as the lowest

    adam: moving on. An instance of a MediaStreamTrack can only belong
    to one MediaStream.

    anant: noting that "track" is really not the same thing as a track
    in container formats, etc., so we need to be explicit in the doc
    about that, not to create additional confusion.

    [meeting adjourned, discussion on MediaStream to be continued on
    [31]day 2]

      [31] http://www.w3.org/2011/11/01-webrtc-minutes.html

Summary of Action Items

    [NEW] ACTION: anant to check how to split getUserMedia from the spec
    [recorded in
    [NEW] ACTION: rich to send new use cases on getUserMedia to webRTC
    mailing-list [recorded in
    [NEW] ACTION: robin to draft a draft proposal for joint deliverable.
    [recorded in

    [End of minutes]
Received on Tuesday, 8 November 2011 15:38:27 UTC

This archive was generated by hypermail 2.3.1 : Monday, 23 October 2017 15:19:26 UTC