Re: Integration of Stats API v2 from Harald Alvestrand on 2013-01-27 (public-webrtc@w3.org from January 2013)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Sun, 27 Jan 2013 23:30:43 +0100
To: public-webrtc@w3.org
Message-ID: <5105AA93.20108@alvestrand.no>
On 01/23/2013 11:44 AM, Adam Bergkvist wrote:
> Hi
>
> I'm currently working on putting version 2 of the stats into the spec 
> as decided on the last telco. Overall I think this works, but always 
> getting objects by id makes the code more complicated. There are 
> situations where ids are needed (e.g., to correlate stats objects 
> between getStats() calls), but I think we should get objects directly 
> when possible.
>
> Some comments:
>
> The current approach to gathering stats is to gather all possible 
> kinds of stats for a chosen selector and then, after you get the 
> report and the stats objects, pick the interesting parts. (I know that 
> Ekr commented this on the last telco.)
>
> I propose we do the filtering at getStats()-time instead of after the 
> report is created. The result would be more conservative stats 
> gathering and smaller reports.

This is why the selector argument was specified as open-ended.
There are two reasons for allowing a "get all" function:

- Exploratory stats that want to know "what's there"
- Stats processors that may need info from multiple parts of the stats 
structure, in a not easily expressed pattern.

An example of the latter is a stats collector that throws up some 
indicator whenever an RTP stream moves to a different candidate and 
records stats on how much data was passed on the now-unused candidate; 
it needs all the candidates and the relevant RTP streams snapshotted at 
the same time.

My thought is that this is better thought of as more, perhaps orthogonal 
directions in which to expand the selector argument than as a separate 
argument. And, as much as possible, something that we can do when we 
know what we need.


>
> void getStats (MediaStreamTrack? selector,
>                RTCStatsCategory[]? categories,
>                RTCStatsCallback successCallback,
>                RTCPeerConnectionErrorCallback failureCallback);
>
> // names taken from the stats dictionaries in the proposal
> enum RTCStatsCategory {
>     "rtp-stream",
>     "outgoing-rtp-stream",
>     "incoming-rtp-stream",
>     "incoming-local-rtp-stream",
>     "rtp-session",
>     "candidate",
>     "candidate-pair"
>     // ...
> };
>
>
> The same enum would also be reused as argument values the objects() 
> method.
>
> > interface RTCStatsReport {
> >   sequence<DOMString> objects(DOMString type?);
> >   RTCStatsObject get(DOMString id);
> > }
>
> > interface RTCStatsObject {
> >    ...
> >    readonly DOMString id;
> >    ...
> > };
>
> It's not obvious what the objects() method returns; the name suggests 
> stat objcets but the return value is a sequence of DOMStrings. From 
> the example it seems like it returns ids. This seems a bit 
> complicated. Why not let it return a sequence of RTCStatObjects (of a 
> certain type) directly as the name implies instead of requiring detour 
> with ids and get().

I don't know if ids make life more complicated. If you let objects() 
(which could have had a better name) return objects, not ids, it means 
that it has to put all the stats objects into the RTCStatsObject form 
whether or not it is ever going to be accessed.

I could be argued either way on this one; in the current WebKit 
implementation, I don't think it makes a huge amount of difference.

>
> var statsObjects = report.objects("rtp-stream");
>
> A corresponding stats object from a later report (for the first 
> object) would then be retrieved with:
>
> secondReport.get(statsObjects[0].id);
>
> So from what I understand, the id links stats objects that correspond 
> to the same selector sub-component for a specific stats category 
> between reports. For example, sub component=SSRC when 
> selector=MediaStreamTrack and category=rtp-stream. This makes sense 
> since it isn't practical to use indexes if the number of stats objects 
> differs between the first and the second report.
>
> Could we have a name that emphasizes the difference between this id 
> and ids on MediaStream and MediaStreamTrack? I.e. that two objects 
> with the same id aren't the same object, instedad they are results of 
> reporting stats from the same underlying component. E.g. sourceId, 
> componentId, providerId or something.

Well, in the case of a MediaStream, the id will identify the same 
MediaStream, so it's indeed an id for that mediastream, it's just not 
the id of the mediastream.... statsObjectId is fine with me, just more 
typing.

>
> > In order to have a well known syntax for RTCStats objects, we use
> > Dictionary declarations. This does NOT mean that an RTCStatsObject is
> > or contains a dictionary; due to implementation issues, we still have
> > only the names() and getValue functions as interfaces to the
> > RTCStatsObject.
>
> I'm not sure it's good idea to take a syntax that is well defined is 
> this context and say that it means something else. I think the 
> described dictionaries looks good and it would be more straight 
> forward to use them as dictionaries (and update implementations to 
> support this). RTCStatsObject could also be a dictionary since the two 
> methods would become redundant (names() is replaced by property 
> enumeration and getValue() by direct property access). This would also 
> flatten each level of the stats structure a bit and make it easier to 
> use.

Are you sure you want that?

Some properties of dictionaries:

"Dictionaries are always passed by value. In language bindings where a 
dictionary is represented by an object of some kind, passing a 
dictionary to a platform object will not result in a reference to the 
dictionary being kept by that object. Similarly, any dictionary returned 
from a platform object will be a copy and modifications made to it will 
not be visible to the platform object."

"The order of the dictionary members on a given dictionary is such that 
inherited dictionary members are ordered before non-inherited members, 
and the dictionary members on the one dictionary definition (including 
any partial dictionary definitions) are ordered lexicographically by the 
Unicode codepoints that comprise their identifiers."

"Dictionaries must not be used as the type of an attribute, constant or 
exception field."

My main reason not to suggest using dictionaries is that I don't know 
the full cost of that choice; with the pseudo-dictionary I suggested, I 
know what the cost of implementing it is.
Someone who's actually worked with an implementation that had 
dictionaries as return values is in a better position to comment on the 
cost of using "real" dictionaries".

>
> dictionary RTCStatsObject {
>     readonly attribute long timestamp;
>     readonly DOMString type;
>     readonly DOMString id;
> };
>
> dictionary RTPStream : RTCStatsObject {
>     int ssrc;
>     int? maxBandwidth; // as specified by user, if present
>     ...
> }
> ...
>
> > dictionary RTPStream {
> >     ...
> >     StatsObjectID? otherEndStats; // Reference to the stats signalled
> >                                   // from our partner.
> >     ...
> > }
>
> What's the reason for having an id here and not referencing the 
> otherEndStats directly? Can these stats be found via the object() 
> method as well?

My general dislike for reference-by-object. If you look through the 
defs, there are a number of StatsObjectID and StatsObjectIdList members; 
those return strings or sequence<string>, respectively.

>
> > interface RTCStatsObject {
> >     readonly attribute long timestamp;
> >     ...
>
> We should use DOMTimeStamp here as defined by WebIDL [1].

I prefer just plain old Date (implementation: a Double), but that's a 
debate we've had earlier.
If you want DOMTimeStamp (implementation: an int64), you get to propose 
language saying what epoch we relate it to, and why.

>
> ============================================================
>
> The changes proposed above would make the API a bit easier to use by 
> working directly with objcets and with less "get by id" operations.
>
> (new example)
> pc.getStats(selector, ["rtp-stream"], ...);
>
> // proccess stats
> var objects = secondReport.objects();
> for (i = 0; i < objects.length; i++) {
>     second = objects[i];
>     // get the corresponding stats object from the first report
>     first = firstReport.get(second.id);
>
>     if (first && first.remote) {
>         var packetsSent = first.packetsSsent - second.packetsSent;
>         var packetsReceived = first.remote.packetsReceived -
>                 second.remote.packetsReceived;
>
>         // if fractionLost is > 0.3, we have probably found the culprit
>         var fractionLost = (packetsSent - packetsReceived) /
>                 packetsSent;
>     }
> }
>
> ------------------------------------------------------------
>
> (existing example)
> pc.getStats(selector, ...);
>
> // proccess stats
> var ssrcIds = now.objects("RTPStream");
> for (i = 0; i < ssrcIds.length; i++) {
>     var ssrcStatsId = ssrcIds[i];
>     nowState = now.get(ssrcStatsId);
>     prevState = baseline.get(ssrcStatsId);
>     remoteStatsId = nowstate.get("otherEndStats");
>     remoteNowState = now.get(remoteStatsId);
>     remotePrevState = baseline.get(remoteStatsId);
>
>     if (prevstate && remotePrevState) {
>         var packetsSent = nowstate.get(packetsSent)
>                 - prevstate.get(packetsSent);
>         var packetsReceived = remoteNowState.get(packetsReceived)
>                 - remotePrevState.get(packetsReceived);
>
>         // if fractionLost is > 0.3, we have probably found the culprit
>         var fractionLost = (packetsSent - packetsReceived) /
>                 packetsSent;
>     }
> }
>
>
> /Adam
>
> [1] http://dev.w3.org/2006/webapi/WebIDL/#common-DOMTimeStamp
>
>
>
Received on Sunday, 27 January 2013 22:30:27 UTC