[minutes] W3C WebRTC WG F2F in Santa Clara - day 2/2 - 2011-01-01 from Francois Daoust on 2011-11-08 (public-webrtc@w3.org from November 2011)

From: Francois Daoust <fd@w3.org>
Date: Tue, 08 Nov 2011 16:39:08 +0100
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <4EB94D1C.2060902@w3.org>

Hi again,

The minutes of the second day of last week's F2F meeting are available at:
http://www.w3.org/2011/11/01-webrtc-minutes.html

... and copied as raw text below.

See previous disclaimer for possible inaccurate or incomplete exchanges. I'll send a summary of the meeting later on.

Thanks,
Francois.

-----
WebRTC WG F2F Santa Clara - Day 2/2
01 Nov 2011

[2]Agenda

[2] http://www.w3.org/2011/04/webrtc/wiki/October_31_-_November_1_2011

See also: [3]IRC log

[3] http://www.w3.org/2011/11/01-webrtc-irc

Attendees

Present - group participants
Harald_Alvestrand, Adam_Bergkvist, Dan_Burnett,
Francois_Daoust, Dan_Druta, Christophe_Eyrignoux,
Narm_Gadiraju, Vidhya_Gholkar, Stefan_Hakansson,
Cullen_Jennings, Kangchan_Lee, Wonsuk_Lee, Kepeng_Li,
Gang_Liang, Mahalingam_Mani, Anant_Narayanan, Youngsun_Ryu,
Youngwan_So, Timothy_Terriberry, Rich_Tibbett, Justin_Uberti,
Milan_Young

Present - observers
Mauro_Cabuto, Suresh_Chitturi, Mohammed_Dadas, Steve_Dennett,
Shunan_Fan, David_Yushin_Kim, John_S_Lee, Ileana_Leuca,
Alistair_MacDonald, Chris_Rogers, Doug_Schepers, Junya_Yamada
(a few other observers attended the meeting)

Chair
Harald_Alvestrand, Stefan_Hakansson

Scribe
francois, burn, Milan, fluffy, anant, derf

Contents

* [4]Topics
1. [5]MediaStream (continued)
2. [6]PeerConnection (Cullen)
3. [7]Status of Audio WG
4. [8]Implementation status
5. [9]Incoming notifications
6. [10]Any other business?
* [11]Summary of Action Items

See also: [12]Minutes of day 1/2
_________________________________________________________

[12] http://www.w3.org/2011/10/31-webrtc-minutes.html

MediaStream (continued)

[resuming discussion from yesterday's meeting]

adam: some confusion on whether 5.1 audio should be 6 tracks, or 6
parallel channels of a single track. I tend to prefer the second
approach, but suggest to move on to the mailing-list.

cullen: it sounds good to come up with a decision

hta: 2 alternatives: 1) a track is whatever has one sensible
encoding and 2) a track is whatever cannot be subdivided

adam: cannot be subdivided in JavaScript.

hta: in the case of 5.1, there are encodings that will take 5.1 as
one entity that JavaScript will have trouble decoding.

cullen draws a diagram with stereo channels on white board ([13]Fig
1)]
Media channels (e.g. stereo channels) may appear in a track.
Multiple tracks may take part in a Stream. Streams may be
synchronized. The question is: is WebRTC MediaTrack at the 'track'
level or 'channel' level?

Fig 1 - Stream, track, and channel

anant: question is does JavaScript need access to the channels?

cullen: it has to be addressable e.g. for one video to go to one
monitor and the other to the second one.

adam: why do they need to be addressable?
... they're two different things

cullen: no, they're not.
... [taking example of audio with multiple channels]
... I was assuming they need to be addressable.

anant: only problem is that track means something else in other
contexts.

burn: if we only need to come up with a different name for track,
not a big deal. It makes more sense to focus on the definition of
the term than on the term itself.

anant: I think JavaScript needs to have access to different channels
e.g. to decode things itself.

hta: provided it should be able to do that

[more discussion on audio tracks]

cullen: you need one more name, English won't be encoded with
Russian.

[Fig 1 completed]

burn: 3 levels now. Is there a possibility that we'll end up with
another, and another?

adam: I don't think we need to go any deeper than channel.

cullen: some possibility to group tracks in pseudo-tracks in some
contexts

burn: if we agree on the top level, and the bottom level.

hta: for the moment, we have stream, track and channel.

adam: question is whether channel has to be a JavaScript object.

hta: to say that the right channel should go to the left speaker,
that's useful.

cullen: if you have polycom setup, you can end up with 4 different
channels, that may be encoded together.

stefan: we have tracks mainly for legacy with the media element.

cullen: the grouping is going to go to the other side through SDP, I
think.

tim: that square box is a single SSRC. MediaStream is a single
CNAME.

cullen: ignoring these points for the time being.

stefan: I think synchronization should be on an endpoint basis

hta: you can't do that.

stefan: that's my view, we'll discuss that later on.

cullen: we're all confused so need clarification.

anant: for video, if you have the context of channels, what would
that be useful for?

cullen: stereo video.

adam: the channels are dependent on each other.

anant: I don't understand why channels have to be exposed in the
context of tracks.

[discussion on relationship with ISO base file format]

<Travis_MSFT> Can a "track" be a text caption too?

francois: wondering about mapping to media element. MediaStreamTrack
(or channel) will have to be mapped to the track in an audio/video
element for rendering in the end. Needs to be clarified.

hta: seems we need some audio expertise. Perhaps discuss during
joint session with Audio WG.

adam: moving on. We removed parent-child relationship within a
stream. Good opportunity to make MediaStreamTrackList immutable.
... I don't have a strong opinion, but it simplifies things.

anant: if you clone a MediaStream and disable a track on cloned
object, it will disable, not remove the track.
... The right way to fix this is not to make MediaStreamTrack list
immutable.
... If someone decides to disable the parent track, the child should
receive an event that the track is disabled. Same as if the track
was received from a remote peer.

adam draws [14]Fig 2 on the paperboard]
Streams are independent without parent-child relationship. A child
stream's enabled status depends on that of its parent otherwise.

Fig 2 - Cloned streams with and without parent-child relationship

[exchanges on parent-child relationship, enable-disable events]

tim: the worst situation is starting up the call, adding a video
stream, then an audio stream, then something else.

cullen: then unplug a camera.

hta: it's disabling tracks. addStream is the one thing you cannot do
with enabling/disabling.

anant: we need some kind of grouping for synchronization.

adam: difference between mute and disabled. If I disable on my side,
track is muted on your side.
... Should the track be removed from your side?
... I might re-enable it later on.

anant: should we have the concept of who owns the MediaStream?
... I think the WhatWG thought about it, leading to the concept of
local MediaStream.
... You are in control of local MediaStreams, not on remote
MediaStreams.

hta: it makes a lot of sense to disable things received from a
remote peer.
... When you're changing from disabled to enabled, you're telling
the other side that you'll do something with the data if you receive
it.

<juberti> francois: would you mind dialing the polycom when you get
a chance?

anant: It's possible for a track to be enabled without receiving any
data. That makes sense.

<juberti> i can hear the room well, thanks

anant: no real difference between mute and disabled.

cullen: I was assuming you were sending data in the case of mute,
and not in the case of disabled.

anant: that's just a waste of bandwidth.

<juberti> there are good reasons to send zeroes.

anant: any sensible scenario?

cullen: no idea.

<juberti> some gateways will disconnect if data is not provided.

richt: perhaps if re-negotiation is noe possible.

ileana: there are cases when you press mute and send music.

anant: does that level of semantic need to be exposed to JS?
... does it have to be a change of state in the object?
... Use cases?

<juberti> Call waiting is the main use case for "hold".

adam: we need to have some sense of "mute" for streams you don't
have control over. Not sure about the best word.

anant: I would assume that you can remove tracks from a stream.

<juberti> Local muting (sending zeroes) has obvious use cases.

[discussion on adding/removing tracks in SDP]

adam: for removing tracks, disconnecting a camera is a good use
case.
... Symmetry for adding tracks sounds important.

francois: same discussion in HTML WG through Web and TV IG concerns
on adding/removing, enabling/disabling, tracks to the video element.
It seems important to end up with the same interface for both
contexts.

stefan: in the end, the first bullet point is removed here
('immutable MediaStream objects')

adam: and we'll make updates for the remaining points.
... moving on to getUserMedia
... only difference between local and not is you can stop on
localMediaStream.

anant: why do you need to have a stop method?

tim: if I cloned a track in different streams, you have to stop each
track individually. With stop, that's taken care of.

hta: now you're introducing a dependency that I thought we had
agreed to drop.

adam: I think "stop" on a local MediaStream shouldn't affect other
MediaStreams.
... you need "stop" to say you're not going to use the object
anymore.

anant: I don't understand why it's not possible with remote media
streams.

hta: I guess for the time being, the problem is that "stop" is not
clearly defined.

adam: moving on to recording.
... MediaStream.record(). You get the data via an async callback.

cullen: what format does the recorded data follows?

adam: that's an interesting question.
... when that got specified, I think there was this romantic idea
about a simple unique format.

anant: getRecordedData means stop the recording and give me the
data?

adam: not that much.

cullen: it's awful.

hta: yes, I suggested to delete it while we find something working.

burn: what does it have to do with WebRTC, sending data from one
peer to another?

anant: we need to define getUserMedia, so that's one use case we
need to tackle.

cullen: we should start from scratch and design that thing. Need to
specify format, resolution, etc.
... Why don't we start by sending some good list of requirements to
this working group?

hta: At a minimum, a recorder that returns blobs has to specify what
the format of the blog is.

anant: Additionally, we could use hints here, so that caller
suggests what he wants.

hta: given the reality of memory, we need chunks, so that running a
recorder during two hours doesn't mean it will try to buffer
everything in memory.

anant: I don't get the use case for getting the recorded data back.
It should go to a file.

cullen: somebody else might want to access the data to do other
things such as barcode recognition.

anant: not the recording API, that's the MediaStream API.

hta: you might want to move backward, e.g. in the hockey game to
move back to an action.
... My suggestion is to nuke it from the draft for the time being.
Once we have MediaStream, we can figure out how to record.

richt: you can call record multiple times. Does that reset?

adam: you get the delta from last time

tim: possibly with some overlap depending on the codec

burn: maybe check with Audio WG, because they are also doing
processing with audio streams. Recording is just one example of what
you can do with a media (audio) stream.

richt: we're going to record to formats we can playback in our video
implementation today, and I guess other browsers will probably do
the same, so no common format.

[summary of the discussion: scrap the part on recording for the time
being, trying to gather requirements for this, to be addressed later
on]

adam: on to "cloning".
... With the MediaStream constructor, you can clone and create new
MediaStream objects without prompting user.
... If you have one MediaStream object, you can only control it in
one way. Muting mutes tracks everywhere.
... With cloning, you can mute tracks individually. Also used for
self-view ("hair check screen")
... Pretty nice thing.
... Cloning is creating a new MediaStream from existing tracks)

hta: if you have one MediaStream and another one that you want to
send over the same PeerConnection, you can create a MediaStream that
includes both sets of tracks.

adam: composition: same approach. Used to pick tracks from different
MediaStreams.
... e.g. to record local and remote audio tracks in a conversation.
... Discussion about this: obvious question is synchronization when
you combine tracks from different MediaStream. Combining local voice
track with remote voice track, how do you do that?

cullen: if we have grouping for sync, can't we reuse it? If same
group, sync'ed, if not, no sync.
... We're synchronizing at global times, so no big deal if local and
remote streams.

hta: works if different clocks?

cullen: that's what RTP does. Works as well as in RTP.
... I think it's taken care of by whatever mechanism we have for
synchronization.
... another thing on cloning and authorization. If authorization on
identity, we might need to re-authorize on the clone if destination
is not the same.
... May not bring changes to the API.

hta: that is one of the reasons why authorization on destination is
problematic.
... If it's irritating, people will work around it.

tim: If I want to authorize sending to Bob, I don't want JS to be
able to clone.

hta: what you care about is that it doesn't send it to Mary.

anant: the data can go anywhere, there's nothing preventing that.
... getUserMedia allows the domain to access camera/microphone, not
tied up to a particular destination.

hta: I think ekr's security draft goes to some details on what we
can do and limitations. Identity providers can mint names that look
close to the names the user might expect to see.

adam: we need to solve authorization without cloning anyway, because
of addStream.

hta: if you want to do that, you have to link your implementation of
PeerConnection with the list of authorized destinations.

[more exchanges on destinations]

DanD: depending on where you're attaching to, you can go on
different levels of authorizations. If I have a level of trust, then
I can say I'm ok with giving full permissions.

anant: problem is where you do permissions. PeerConnection level?

cullen: implementation may check that crypto identity matches the
one on authorized lists of identities that can access the camera.

Travis_MSFT: we still need authorization for getUserMedia.

richt: this authorization stuff on PeerConnection is quite similar
on submitting a form where you don't know where the submitted form
is going.
... Not convinced about this destination authorization.

<burn> Scribe: burn

adambe: (missed this)

anant: how do you correlate different media streams

hta: with only local media it's easy, but mixer has media from
multiple sources and they will be unsynced
... a single media stream, everything in it is by definition synced.
but across media streams no guarantee of sync

fluffy: makes sense. could get cname from them, but doesn't
necessarily tell you how they sync

<Milan> scribe: Milan

PeerConnection (Cullen)

Slides: [15]WebRTC PeerConnection

[15] http://www.w3.org/2011/04/webrtc/wiki/images/e/ea/PeerConnection_v3.pptx

fluffy: Split work into interface types
... current spec isn't very extensible
... wrt sdp
... propose wrapping sdp in json object
... need to differentiate offer from answer
... so add this to sdp JSON representation

fluffy: offer answer pair contribute bits to caller session id and
callee session id
... answer always have an implicit offer
... offers can result in more than one answer
... each media addition results in a new offer/answer pair
... replaces previous
... increment sequence id to note replacement without setting up
entirely new call

Anant: If sdp doesn't result in new call, does it require user
consent?

Fluffy: Yes, depends upon the class of change. EG new video camera?
... require and OK to the offer answer
... can only have one offer outstanding at a time
... need to know when can re-offer (timeout)

Stefan: Also need to know when offer accepted for rendering RTP
streams

Anant: Why sequence ids?

Fluffy: need to suppress duplicates

Anant: Isn't websockets a reliable protocol?

Harold: Not really
... because http can close connection at any time

Stefan: If A makes new offer, when would B side be able to make new
offer?

Fluffy: Depends on order of delivery
... so OK required

Suresh: Are these formats in scope for peer discussion?

Fluffy: Yes, APIs will define how to process these messages. Work
performed by browser vendors.
... ICE is slow
... it is a strategy for finding optimal paths (uses pinholes)
... as such can't send packets too fast
... start processing as early as possible. But need to wait for user
authorization on IP address
... amount of time deciding to accept call should mask ICE delay
... so need a tentative answer followed by final answer which was
confirmed by user
... this is implemented by adding another flag to sdp/json object.
"morecoming" (see [16]Fig 3)
Answerer may partially answer an offer with a first ROAP response,
provided it sets the flag 'morecoming' in the first answer.

Fig 3 - "More coming" temp answer flow in ROAP

DanD: Why not implement as a state machine?

Anant: Why can't OK contain SDP?

DanD: How much time does it buy you?

Fluffy: Time is function of user input
... and also if IP address is protected information

DanD: Maybe browser should delay response

Fluffy: This flag is optional
... this should be part of the hint structure

Harold: User interaction must be between offer/answer regardless

RichT: Maybe we could use the 180 in place

Fluffy: Might confuse forking issue and early media

If you get back calllee sessionid that is different, how do you know
that should result in same or replacement?

Fluffy: This is same question as to allow forking

Harold: If you get back calllee sessionid that is different, how do
you know that should result in same or replacement?

Fluffy: Don't like passing STUN and TURN credentials
... would rather get an offical API from IEFT

Eric: Proposed format is not strongly typed
... makes extensibility difficult

harold: For example whether to provide IP address quickly
... Action add hint structure to peer connection constructor

Fluffy: Turn servers require credentials
... so need to pass them around in JS

Anant: username/password@domain

Richt: Should this be mandatory parameter, or can browser
substitute?

Anant: Browser should provide default

Fluffly: Common to force all media traffic through single turn
server
... option for browser to be configured for turn server
... ice will choose best
... Also need application to provide prefered turn servers

Harold: Google does http transaction before setting up turn service
... need to be able to stop serving dead users
... so need a provisioning step
... so this scheme is consistent
... must pass password at start of peer connection

Anant: If JS object, then it's in developer control

Fluffy: Lable useage is broken

Harold: Thought was that each label was a new RTP session?
... Have need for sending information peer-to-peer for nominating
tracks as being in focus

Stefan: Metadata much also be passed as well as lable
... for example the meaning of the microphone channel
... need to interop with legacy equiptment

DanB: How many thinkgs do we want to label?

Fluffy: I want identifiers on the channels

Tim: Should the list of channels in a given track be mutable?

<juberti> label usage is broken

Harold: Track is defined by its properties

DanB: Haven't decided on the definition of a track

Fluffy: ROAP deals with glare
... SIP works on waits
... but can take too much time
... better solution based on random timeouts where larger wins

Fluffy: need DTMF for legacy systems

Anant: Should live in a seperate track in same stream

Harold: Use case for receiving DTMF?

Fluffy: Yes

Harold: What about analog DTMF?

Fluffy: That is translated by the gateway

Milan: What about SIT tones?

Fluffy: Seems reasonable if easy

RichT: Could this fit into proposal where inject additional media
into stream?

Fluffy: Injecting crappy sound is cool, but not a general solution
for DTMF

Anant: Way to handle that is to define alternate track types

Francios: How do we integrate ROAP into current API?

Harold: Easy. Just replace sendSDP with sendROAP

Status of Audio WG

Have Al & Chris from Audio WG with us

Al: Few proposal roc's work and Web Audio API

Chris: Web Audio API deals with arbitrary media graphs

… we may be interested in basic convolution, equalization, analysis
etc. other effects

Chris: Robert's work provided other cases including remote peer to
peer

… Chris proposed some code examples of how to integrate with rtcweb

spec at
[17]https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification
.html

[17] https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html

… examples at
[18]https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/webrtc-integr
ation.html

[18] https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/webrtc-integration.html

Chris: Audio API is implemented but connection with RTCWeb not done
yet but confident we could implement this

Anant: What does createMediaStreamSource do ?

Chris: It creates a media stream that can be added to the peer (more
to answer I missed)
... stream can be more or one channel but defaults to two

Harald: If you have a bunch of microphones in room - are these 3
channels in a stream or ...?

Chris: Can process each stream through a separate filter.

anant: Did it have any idea this is 4 channels

Chris: probably need something more like createMediaTrack

anant: We see it as we have streams, that have a set of tracks, that
have one or more channels

Chris: The connections between the nodes are sort of like a webrtc
track. The connection between the audioNodes is a bit like a RTCweb
track
... have direct way ways to get the channels / tracks and combine
into a single stream
... Makes more sense to do these operations on tracks than on
streams

anant: Do you have code showing operating on channels instead of
tracks

Chris: See
[19]https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification
.html#AudioChannelSplitter-section
... The splitter and merger can be used to extract the channels from
a track or recombine

[19] https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioChannelSplitter-section

anant: Can the channels be represented by JS objects

Chris: no - (liberally interpreted by scribe) more the processing is
represented by objects and the connections between are more what
RTCWeb thinks of as tracks / channels. The channels are accessed by
index number
... can combine with other media, mix, and send to far side
... Dynamo (sp?) did apps that send information over werbsockets
that allows people to create music locally using this

anant: how do you deal with codecs / representation

Chris: linear PCM with floating point using typed array
... want clean integration with graphics and webrtc
... most expensive operation is convolution to simulate sound of a
concert hall

… this is implemented multi threaded

… multiple implementations of different FFTs and such

anant: Processing in JS?

Chris: much of the processing done in native code

… Can create node with custom processing with JS

Cullen: have you looked at echo cancelation

Chris: not yet but have code from the GIPS stuff
... can use audio tag or video tag and connect with this

Stefan: Have 4 requirements. F13 to F15 and F17

Chris: F13 can do with placement node

… F14 classes RealTimeAudeiNode - probably better to make new
AudioLevel node that just did this simple function. This is easy and
we plan to do it.

… F15 Can use the AudioGain node

… F17 yes can be deal with Mixer

… that happens automatically when you connect multiple things into
an input

fluffy: Question about mixing spoken audio to bit 3 of top speakers

Chris: could do this

Harald: Can you create events in JS from detecting things in audio

Chris: could build something to add an event to a level filter or
something
... no gate, but have compression that is similar

Doug: How does this compare to ROC proposal with respect to library
?

Chirs: ROC proposal has all audio processing in JS

Chris: there is a JS node that allows you to create processing
effects in just JS, you can associate an event this way by writing
some code to detect a level and trigger an event

Chris: simple common audio , EQ, dynamics etc are very common
… library provides nodes for these
… for specialized, have a customer JS node

Harald: concern with JS is worry about latency

Chris: agree and very concerned about latency in games
… for something like guitar to midi to audio, need very low latency

Doug: with voice, don't want that lag effect where you have talk
over

Chirs: JS would add to the latency
… for something things such as trigger on signal level, latency
would not matter as much
… could have latency in the 100 to 150 ms range

fluffy: what sort of latency would we typically see on some EQ

Chirs: working on 3.3 ms frames of latency

anant: which API is likely to be accepted?

Doug: plan to publish both as first public drafts (a very mature one
and less matter one)
… It is shipping in nightly of webkit
… If Doug had to bet, since we have mature spec, something that
works well, have specs, have implementation, Doug would guess lots
of interested in web audio API
… sounds like existing spec can easily be made to meet needs of
RTCWeb

Tim: roc code runs on worker thread so latency is far less than 100
ms in most cases

Eric: In resource constrained environment, better to have processing
done in native where it can take advantage of hardware resources of
the host

Doug: the difference at an overly simplified level is do you do
mostly JS with some extensions for native speedup or do you do
mostly native with an escape hatch to do things in JS

Chris: procession in a web work still have dangers - such as garbage
collection stalls
… garbage collection stalls can be in excess of 100 ms. Depends on
the JS engine

Harald: Is the web work stable and widely implemented?

MikeSmith: in all major browsers and has not changed in long time
… does not mean Hixie won't make changes.

Stefan: should we bring over some requirements to the WG

Doug: Send us the minutes

anant: Need to make sure the representations of the audio can be the
same and we can make this work together (media steams in one need to
match up with the other)

Chris: have similar problem linking to audio tag

Doug: Good to have doc how how to bridge gaps between the two and
make is easy for someone authoring to these two things

Chris: hopefully have concrete implementation where we can link this
together with peer connection over upcoming months
... have working code in webkit. Gnome GTK port has working code.
Can share code with Mozilla to help move the code in
… Bindings of IDL and such will be some tedious work but tractable
problem

Chris: the DSP code is C++ code
… on mac can use accelerate framework
… Convolution use intel mpl, have ffpt gmpl, have ffmepg fft, call
apples.

Harald: Have fair picture of where we are. Thank you. When getting
specs settled down, can use this. Hope things finished on right
schedule

Doug: having concrete proposal from roc will help resole that
… pace will accelerate dramatically
… will keep coordinating

Implementation status

Adam (Ericsson): Prototype implementation has been out for a while

Adam: We have a MediaStream implementation, getUserMedia with a
dialog, PeerConnection which currently opens a new port for every
new media (no multiplexing)
... We could do a demo, perhaps

anant: any particular challenges so far?

Adam: with the gstreamer backend, directly hooked up directly with
video source in webkit. this means we cannot branch out from this
pipeline to go in other directions

fluffy: has anyone done ICE? caps negotiation; dtls, strp, how far
along on these?

adam: using libnice for ICE based on gobject, part of the collabra
project
... no encryption or renegotiation
... prioritized codecs that we need to test with. using motion-jpeg,
h.264, theora

anant: are you looking to implement your prototype from a browser
perspective?

adam: yes

stefan: also looking to interoperate with internal SIP systems

harald: public initial release for chrome in november
... main missing pieces were a few patches waiting to land in
webkit, and some audio hacks to connect a media stream from the
peerconnection into an audio element (little cheating here and
there)

<fluffy> Public estimate of release is in November. Main bits
missing were some patches in webkit. Still have audio hack simple
put object from PeerConnection to speakers. Have ICE have encrypting
with lib jingle. Negoatiatoine crypto key. Have implemetation of
DTLS-SRTP but not yet hooked up

harald: we do have ICE, we do have encryption, using libjingle and
can negotiate crypto keys. have an implementation of dtls/strp but
not hooked it up into our webrtc implementation yet
... we are usign VP8 for video codec, and using tons of audio codecs
... not hooking up opus until we know what the IPR implications are

fluffy: SDP?

harald: using the old version of somewhat SDP proposed earlier

<fluffy> using VP8, waiting for IPR clarification before adding OPUS

harald: chrome on desktop only, not working on android yet

prefixing?

harald: yes, -webkit prefix
... patches are landing on the central webkit repository, as well as
chrome
... interesting trying to keep versioning straight amongst all these
repos

<juberti> i think the bulk of the webrtc code is in the embedder
section of webkit, so not directly included in what apple would use

On to tim, for mozilla

Tim's slides: [20]WebRTC implementation status in Firefox

[20] http://www.w3.org/2011/04/webrtc/wiki/images/4/42/Webrtc-mozilla-impl-status-20111101.pdf

tim: slides are similar to presented in quebec
... waiting for reviews before we land new libcubeb backend
... roc has an implementation of an audio API that plays silence,
which is progress since last update
... made more progress on the actual code
... list of open bugs on slide, using google webrtc code drop. work
on integration into that codebase
... no longer doing an add-on, doing a branch of firefox instead
... camera support working on linux and mac, build system issue
preventing windows for now
... dtls, strp are all getting started, moving echo cancellation to
hardware etc. no patches but hopefully soon
... Q1 2012 for a functional test build

???: thought about mobile?

tim: we have, number of challenges for mobile. no great audio API on
mobile, not low latency. need help from google before we can build
something people can actually use

harald: we (google) can't have a low latency API without low latency
support from android either

tim: we will probably hook up with a sound API initially (a demo)
that it's possible

opera: different timeline. we are pushing getUserMeida separately
from PeerConnection. labs build that implements getUserMedia is out
there, timeline to implement PeerConnection is not in the next 6
months

Incoming notifications

DanD: New agenda topic, seeking feedback on an open issue
... boils down to notifications.

DanD: You want to be called on your browser but your browser is not
up and running.
... Want to understand how big of a need this is.
... Need a platform-independent commonly supported push notification
framework.
... How does such a push notification reach a browser that isn't
running, and how does it start the browser and the correct web
application.
... If this is solved somewhere else in W3C, we need to have a note
that references it.
... May need something in the API to indicate if you want to send
such a notification when you try to start a session.

DanD draws diagram on the white board ([21]Fig 4).]
4 possible scenarios for incoming calls: 1) Web app is running, 2)
Web app is running but out of focus, 3) Web app is not running but
browser is running, 4) Web browser is not running.

Fig 4 - Incoming notifications

anant: Essentially related to the open question from yesterday: do
we allow incoming calls even when the user is not on a particular
website, but is online?
... Can we require the web app to be running in some form or
another?

DanD: Want it to work even if it's not running.

anant: Actually, I would advise against that.
... I want to be in a state where I don't want to receive calls.

DanD: But you set up your browser to receive notifications.

anant: But that's the same as always being signed into gmail. It's
an explicit user action.

fluffy: When you have a cellphone and you turn it on, it will always
accept incoming calls.
... I'm not suggesting the model of the past is the model of the
future, but I think we're moving more and more towards devices that
are always on.

anant: It's easy to build automated stuff on the web, so you're
going to get in trouble with spam.
... Just a note of caution.

francois: I don't get the difference between having a setting in the
browser and running a web app.

DanD: This is why I was drawing this diagram.
... If none of these things are running and I send a notification, I
want to be able to launch the browser and have contextual
information to say that a user of this browser wants to be able to
talk to you on app A.

francois: So you're outside of the context of the browser.

DanD: The browser may be in the background, but it might be not
running altogether.
... If I'm truly going to implement an incoming call use case,
there's no point if I can't answer that call.

RichT: There are four scenarios
... 1) The web app is running in the browser and focused
... 2) The web app is running but not focused
... 3) The browesr is open but not the web app
... 4) The browser is not running.

anant: You always have to have something, a small daemon listening
for notifications.

fluffy: I want to talk about the first one first.
... I think if you have 10 apps all running at the same time, and
they all have a TCP connection back to the webserver and they're on
a mobile connection
... you have to do a liveness check fairly frequently
... so your battery life is abysmal.
... If you do that in an app on the Apple app store, they will ban
your app and make you use their single global connection.

DanD: I captured that by saying that case 1 is solved, but
inefficient.

RichT: We have web notifications come out of a different working
group.
... if an app is not focused there should be an even that an
incoming call is received that can trigger a web notification.

anant: That part is missing from the API right now, and we should
add that.
... We need an API to register a callback when an incoming call
occurs

francois: It could be an event.

anant: Should it receive a PeerConnection object in that callback?
... Depends on whether we want to start doing ICE and reveal IP
address before answering.

RichT: We could trigger an OS level notification, but require the
user to click on it to take them to the webapp.

anant: Could we apply the same thing for case 3?
... I was thinking that when the OS notification is generated, you
click on it, and the web app is opened in a new tab?

hta: Is that in the spec?

anant: I'm just brainstorming ideas.

hta: I was wondering whether the notification API was part of the
callback API you were describing or if it was new things that you
think we should put into it.

anant: I think the notification API is open-ended.

hta: That kind of open-endedness is a null specification. It doesn't
tell me whether it's going to work or not.

DanD: The moment you rely on something that is OS dependent, how do
you know that it works interoperably?
... What is the level of support of this web notification?

RichT: It's still a draft, but it's part of the web, and should work
on all devices.

francois: In the 3rd case, the app is not running?

RichT: It's headless, so there's no UI.

DanD: There is still the situation where you don't have any headless
thing running.

francois: If nothing is running, how do you know where to dispatch
the incoming notification?

DanD: It has to be dispatched to the default runtime to kick start
the process of running the web app.

francois: Which app?

DanD: That has to be part of the notification.

hta: An incoming notification to my computer has no business knowing
which browser I'm running.

DanD: I agree with you. It should invoke the default web browser.

anant: What if that browser doesn't support WebRTC?

fluffy: Then you need to uninstall it :).

DanD: But I said in the notification I need to know what application
to launch to get back into focus.
... otherwise I don't know what JS to load.

hta: Basically you need a URL.

anant: But to generate that notification you're relying on non-web
technology.

francois: It means your listening to notifications that could come
from anywhere.
... Maybe it could be restricted from known domain names.
... Otherwise you could flood the user.

Stefan: I think you could register your web application with web
intents to be notified.
... something is left running even if you close the browser.
... To get from the server side to the browser we have something to
send events and web sockets, so that part is solved.
... Some part of the device must be awake and listening to pushed
events.

hta: Some devices like Chromebooks don't have the concept of
browsers not running.

fluffy: If you work in a web framework that's generic, you want to
be able to optimize it for lower power situations.

Stefan: I think web intents is similar to this notification idea.
You actually rely on the OS to wake up your platform, but don't
specify how it's done.

DanD: The only reason I wanted to have this discussion was not to
solve it.
... but to understand the reasoning, if it's a technology we have to
support, what the options are, and what the dependencies are for
future work.

Vidhya: Is one of the conclusions we're coming to, do we need an
application ID associated with the notification?

francois: No.

Stefan: We haven't gone into that level of detail.

DanD: We don't need an application ID, we need a way for the end
parties to register with a web server.
... There is some sort of a presence notion.
... How do you convey, "I know where you were registered last time,
how do I send you an alert?"
... If I go to Facebook, I know how my friends are, but I don't care
what application they use, as long as they're registered with a
service that will do the handshake.
... It's not that webapp A will send the notification, but there
will be a mediation framework that optimizes the notifications like
fluffy said and packages them to send the results.
... to determine if you need the user or the application or you need
both.

Vidhya: I feel uneasy because that ties you to a specific server.
This should be open.

DanD: I'll give you an exact use case. I called my mom from my
browser.
... She was on a physical phone, and she hangs up, and then wants to
call me back.
... Where is she going to call?

hta: It's not at all clear that she should be able to.

fluffy: We have this odd tradeoff between one connection for
everything you'd ever want to connect to.
... There's different levels you can aggregate on, from one
connection per site to only one connection total.
... They all make me very uncomfortable.

Vidhya: Once I got my iPad I noticed certain things that I couldn't
do without an iPad or without being connected to Apple, and this was
supposed to be web technology.

RichT: I don't think it's tied to a particular server.
... Browser B has established a connection to a provider, and the
provider knows where the browser is.
... the message is received with a particular context, and the
context decides what to do.
... I think it's just going to be a process of that out-of-band
channel.

DanD: It has one big assumption: that the browser is up and running.

RichT: If the browser isn't running, but we're still running a
context, we're fine.
... A JS context would provide the same functionality.

DanD: I'd like to learn more about this headless thing.
... From what I'm hearing you're saying it covers all the scenarios.
... It goes back to fluffy's point that these things may not be
efficient.

francois: What is not efficient may be the way that you push the
notification.
... Which is why I was talking about the server-sent event, which is
an ongoing draft.
... It's precisely meant for mobile devices.

DanD: The conclusion is we have a need, we may have solutions, and
we need to capture how we can deal with these scenarios.
... I'm trying to understand if there's anything we need to put in
the API to indicate if I want to send notifications.

anant: I was saying we need a callback for receiving them.

francois: Is that up to the app?

DanD: I'm done.

Stefan: I think we need to describe the use case.

DanD: I'd like to see some links to some of the works on IRC.

anant: Web notifications is still in editor's draft, so they're even
farther behind than we are.

<francois> [22]Server-Sent Events

[22] http://dev.w3.org/html5/eventsource/

<francois> [23]Web Notifications (exists both as public working
draft and editor's draft)

[23] http://www.w3.org/TR/notifications/

RichT: Should notifications be explicitly asked for, or something
that is always done?
... At the moment it's going to be an extra thing that developers do
voluntarily, but perhaps it has to be implicit.

DanD: From what I understand there's nothing that has to be built
into the browser to receive notifications.

anant: Web notifications needs to be implemented in the UA to be
useful.

DanD: So the work that is planned for web notifications is the hool
that is needed.

anant: I think they are a fairly good match.

hta: I think that the next thing that needs to happen is that
someone needs to look at these mechanisms and try to build what we
need.
... if that covers the use cases then we're done.
... if not we have to take notes of that.

Stefan: Are you volunteering DanD?

DanD: I don't have a working demo.

hta: I don't think you need a working demo of WebRTC, just to be
able to get web notifications going.

Stefan: Or just look at the specs and see if they do what we need.

DanD: I can definitely look into that.

RichT: If web notifications were to work, you're going to click this
thing and get back into the tab, but then get back into the tab and
show the normal accept UI.
... should we have a way to do this in one step?

hta: That depends on how much richness you're going to have in the
notification.

anant: You may need to be able to refuse, pick cameras, check your
hair, etc.

fluffy: When you're trying to filter spam, the consent message
itself can deliver the spam.

anant: You definitely have to think about spam.

fluffy: I don't know the answer, but it's a great point.

RichT: It's not perfect, but it does allow the focus to go back
naturally for the webapp.

anant: For the good calls, but we want to separate them from the bad
calls.

Any other business?

hta: We've gotten through some fairly hefty discussions.
... we have a few action items that are reflected in the minutes,
and some that are not.
... including an action item to come up with a specific proposal for
what a track is.

anant: One of the editors.

hta: We did not get around to saying how tracks, MediaStreams map
onto RTP sessions and all that.

Stefan: That's for the IETF.

hta: As long as they know what things we think they should be
mapping on to, they might have a better chance of hitting them.
... In two weeks it's Taipei, and we'll revisit some of these issues
at the IETF meeting.
... What other things have we forgotten?

fluffy: Just putting my IETF chair hat on, what things are useful
that came out of this meeting?
... We're okay with this ROAP model?
... We seem to be okay with DTLS-SRTP, we're still discussing
permission models.
... we talked about codecs.
... The long-term permission model.
... IETF is working on the assumption that we need the long-term
permission model, so that doesn't change.

anant: There's also the issue of identity.

hta: We have raised the concept that buying into a specific identity
scheme is probably premature.

anant: Agree.

hta: We seem to think that interfaces that allow us to tie into
identiy schemes are a good thing when we need them.

fluffy: If that's possible. Did that make it into the minutes?

derf: Yes.

anant: I think the right answer is a generic framework that you can
hook up any identity service, it was just motivated by browser ID.

fluffy: And ekr's draft gives two examples.

RichT: I was thinking that this could be done with DNS.

fluffy: So you call fluffy@cisco.com?
... I have some browser code for you...

anant: ikran. It's just SIP.
... Facebook may not have their user names in the form of
username@facebook.com.
... If we can build a more generic thing, then we can get buy-in
from more services.
... It's a balance between short-term how many people can we get to
use this service vs. having something interoperable, where the
e-mail address format is much better.

fluffy: I don't think it's the form of the identifier, but it's
clear identity systems are rapidly evolving.
... ekr has not claimed that we can have this pluggable identity
model.
... but he thinks it's possible and wants to go do the work, and
he's already shown it works for two.

anant: But that is very much an IETF issue, so definitely worth
discusssing in Taipei.
... though there's some overlap with W3C.

fluffy: IETF may say we're very happy to do the comsec security
issues, but we don't want to touch how do you display a prompt to
ask for a password, etc.?

hta: I thought it totally destroyed that distinction with oauth.

fluffy: I think it took the power of Lisa to make oauth happen at
IETF.

burn: It also needs to blend well with what web developers do.
... if they have to track too many identities it's a mess.

anant: If I want to write chatroulette.com, I don't want any
identity.
... and that should work, but if you want identity, you should be
able to get some assurance.

burn: But it should still be able to play well with existing methods
of identity on the web.

DanD: But if users say I want to have the identity of the other
party verified, they'll got with whatever solution is most
trustworthy.

anant: It depends on much trust the user has.
... The places where it may not be possible to use existing identity
schemes is when both parties don't trust the signaling service.
... Which is where we may not be able to use the existing identity
systems.

fluffy: That model is common (see OpenID, etc.).
... I think it's also interesting in oauth most of things you're
trying to solve can be solved with SAML, but the web hates SAML.

DanD: Most of the problems can be solved with long and short-term
permissions.

DanD: you might actually use oauth as a method of permission
granting.

Stefan: Any other final business?

hta: Thank you all for coming and well met at our next meeting?

DanD: When is the next meeting?

hta: After Taipei.
... is it reasonable to aim for a phone meeting in December?

fluffy: That sounds reasonable. I will be in Australia.
... I'm going to be up very, very late, aren't I.

hta: Somehow I doubt a doodle will end up with a convenient time for
Australia.

burn: Shall we take a poll to see how many people are sympathetic?

hta: Okay, so next meeting sometime in December timeframe.
... at that time we might have more input from the IETF and new
public WG drafts.
... we'll continue to make forward progress.

burn: They might not be public drafts, but that's more for
convenience and simplicity.

francois: Now that you've published your first public working draft,
you'll want to publish public drafts regularly.

DanD: Will the meeting notes be published?

francois: I'll clean them up and send them to the mailing list.
... I took pictures of the diagrams.

anant: I noticed the Audio WG was using hg.
... I'd like to switch to that, if it's not a big deal.

burn: We could put the draft on github.

fluffy: Are you okay with github, Adam?

AdamB: Yeah, yeah. I'm basically in favor of the systems in reverse
order of age.

RichT: You'll publish details on the list?

fluffy: This is just where we merge things together before putting
them in CVS, but we'll send details.

burn: Public drafts still need to use CVS, right?

francois: Yes, but that's not on your plate. That's internal W3C
business, because you don't have access to that CVS area.

hta: The editors will have to sort that out.

[meeting adjourned]

Summary of Action Items

[End of minutes]

Received on Tuesday, 8 November 2011 15:39:48 UTC