Re: Integration of Stats API v2

On 2013-01-27 23:30, Harald Alvestrand wrote:
> On 01/23/2013 11:44 AM, Adam Bergkvist wrote:
>> Hi
>>
>> I'm currently working on putting version 2 of the stats into the spec
>> as decided on the last telco. Overall I think this works, but always
>> getting objects by id makes the code more complicated. There are
>> situations where ids are needed (e.g., to correlate stats objects
>> between getStats() calls), but I think we should get objects directly
>> when possible.
>>
>> Some comments:
>>
>> The current approach to gathering stats is to gather all possible
>> kinds of stats for a chosen selector and then, after you get the
>> report and the stats objects, pick the interesting parts. (I know that
>> Ekr commented this on the last telco.)
>>
>> I propose we do the filtering at getStats()-time instead of after the
>> report is created. The result would be more conservative stats
>> gathering and smaller reports.
>
> This is why the selector argument was specified as open-ended.
> There are two reasons for allowing a "get all" function:
>
> - Exploratory stats that want to know "what's there"
> - Stats processors that may need info from multiple parts of the stats
> structure, in a not easily expressed pattern.
>

I'm not arguing for removing the "get all" functionality, but rather for 
making stats gathering more precise if the app knows what it needs. 
Having a more predictable results from the stats gathering would be more 
efficient use of resources compared to the "see what you get and hope 
there's code to handle it"-approach.

> An example of the latter is a stats collector that throws up some
> indicator whenever an RTP stream moves to a different candidate and
> records stats on how much data was passed on the now-unused candidate;
> it needs all the candidates and the relevant RTP streams snapshotted at
> the same time.
>
> My thought is that this is better thought of as more, perhaps orthogonal
> directions in which to expand the selector argument than as a separate
> argument. And, as much as possible, something that we can do when we
> know what we need.

Yes, I think it can be seen as extra precision to the selector.

In your example, you filter the stats with "RTPStream" after the stats 
have been gathered so the filtering functionality is already there in 
some sense.

>
>>
>> void getStats (MediaStreamTrack? selector,
>>                RTCStatsCategory[]? categories,
>>                RTCStatsCallback successCallback,
>>                RTCPeerConnectionErrorCallback failureCallback);
>>
>> // names taken from the stats dictionaries in the proposal
>> enum RTCStatsCategory {
>>     "rtp-stream",
>>     "outgoing-rtp-stream",
>>     "incoming-rtp-stream",
>>     "incoming-local-rtp-stream",
>>     "rtp-session",
>>     "candidate",
>>     "candidate-pair"
>>     // ...
>> };
>>
>>
>> The same enum would also be reused as argument values the objects()
>> method.
>>
>> > interface RTCStatsReport {
>> >   sequence<DOMString> objects(DOMString type?);
>> >   RTCStatsObject get(DOMString id);
>> > }
>>
>> > interface RTCStatsObject {
>> >    ...
>> >    readonly DOMString id;
>> >    ...
>> > };
>>
>> It's not obvious what the objects() method returns; the name suggests
>> stat objcets but the return value is a sequence of DOMStrings. From
>> the example it seems like it returns ids. This seems a bit
>> complicated. Why not let it return a sequence of RTCStatObjects (of a
>> certain type) directly as the name implies instead of requiring detour
>> with ids and get().
>
> I don't know if ids make life more complicated. If you let objects()
> (which could have had a better name) return objects, not ids, it means
> that it has to put all the stats objects into the RTCStatsObject form
> whether or not it is ever going to be accessed.

I don't know if saving resources by delaying creation of the 
RTCStatsObjects is a good enough reason to complicate the API. If we 
want to save resources we shouldn't gather stats that the application 
doesn't have code to handle (as discussed above).

>
> I could be argued either way on this one; in the current WebKit
> implementation, I don't think it makes a huge amount of difference.
>

The difference is not primarily in the implementation, but when you use 
the API. See the examples in the end.

>>
>> var statsObjects = report.objects("rtp-stream");
>>
>> A corresponding stats object from a later report (for the first
>> object) would then be retrieved with:
>>
>> secondReport.get(statsObjects[0].id);
>>
>> So from what I understand, the id links stats objects that correspond
>> to the same selector sub-component for a specific stats category
>> between reports. For example, sub component=SSRC when
>> selector=MediaStreamTrack and category=rtp-stream. This makes sense
>> since it isn't practical to use indexes if the number of stats objects
>> differs between the first and the second report.
>>
>> Could we have a name that emphasizes the difference between this id
>> and ids on MediaStream and MediaStreamTrack? I.e. that two objects
>> with the same id aren't the same object, instedad they are results of
>> reporting stats from the same underlying component. E.g. sourceId,
>> componentId, providerId or something.
>
> Well, in the case of a MediaStream, the id will identify the same
> MediaStream, so it's indeed an id for that mediastream, it's just not
> the id of the mediastream.... statsObjectId is fine with me, just more
> typing.

But I wanted to get away from the id being something that identified the 
JavaScript stats object itself and clarify that the id identifies the 
underlying stats provider. But, perhaps it's just a matter of documentation.

>
>>
>> > In order to have a well known syntax for RTCStats objects, we use
>> > Dictionary declarations. This does NOT mean that an RTCStatsObject is
>> > or contains a dictionary; due to implementation issues, we still have
>> > only the names() and getValue functions as interfaces to the
>> > RTCStatsObject.
>>
>> I'm not sure it's good idea to take a syntax that is well defined is
>> this context and say that it means something else. I think the
>> described dictionaries looks good and it would be more straight
>> forward to use them as dictionaries (and update implementations to
>> support this). RTCStatsObject could also be a dictionary since the two
>> methods would become redundant (names() is replaced by property
>> enumeration and getValue() by direct property access). This would also
>> flatten each level of the stats structure a bit and make it easier to
>> use.
>
> Are you sure you want that?
>
> Some properties of dictionaries:
>
> "Dictionaries are always passed by value. In language bindings where a
> dictionary is represented by an object of some kind, passing a
> dictionary to a platform object will not result in a reference to the
> dictionary being kept by that object. Similarly, any dictionary returned
> from a platform object will be a copy and modifications made to it will
> not be visible to the platform object."

I think we want this. The stats should be a snapshot of what's going on 
in the platform. Nothing should be communicated in the opposite direction.

>
> "The order of the dictionary members on a given dictionary is such that
> inherited dictionary members are ordered before non-inherited members,
> and the dictionary members on the one dictionary definition (including
> any partial dictionary definitions) are ordered lexicographically by the
> Unicode codepoints that comprise their identifiers."

The order is irrelevant for the stats case isn't it?

>
> "Dictionaries must not be used as the type of an attribute, constant or
> exception field."

Works for our case.

>
> My main reason not to suggest using dictionaries is that I don't know
> the full cost of that choice; with the pseudo-dictionary I suggested, I
> know what the cost of implementing it is.
> Someone who's actually worked with an implementation that had
> dictionaries as return values is in a better position to comment on the
> cost of using "real" dictionaries".
>

Yes, it would be interesting to have that kind of feedback.

The implementation would need to be able to construct a JS object out of 
a set of key-value pairs and return that object to JS. The reverse of 
parsing an incoming dictionary argument.

>>
>> dictionary RTCStatsObject {
>>     readonly attribute long timestamp;
>>     readonly DOMString type;
>>     readonly DOMString id;
>> };
>>
>> dictionary RTPStream : RTCStatsObject {
>>     int ssrc;
>>     int? maxBandwidth; // as specified by user, if present
>>     ...
>> }
>> ...
>>
>> > dictionary RTPStream {
>> >     ...
>> >     StatsObjectID? otherEndStats; // Reference to the stats signalled
>> >                                   // from our partner.
>> >     ...
>> > }
>>
>> What's the reason for having an id here and not referencing the
>> otherEndStats directly? Can these stats be found via the object()
>> method as well?
>
> My general dislike for reference-by-object. If you look through the
> defs, there are a number of StatsObjectID and StatsObjectIdList members;
> those return strings or sequence<string>, respectively.
>

Can you elaborate on the reference-by-object thing? I get the 
id-approach for correlating stats objects between different reports 
(created at different times), but not for directly reading "local" 
properties.

>>
>> > interface RTCStatsObject {
>> >     readonly attribute long timestamp;
>> >     ...
>>
>> We should use DOMTimeStamp here as defined by WebIDL [1].
>
> I prefer just plain old Date (implementation: a Double), but that's a
> debate we've had earlier.
> If you want DOMTimeStamp (implementation: an int64), you get to propose
> language saying what epoch we relate it to, and why.
>

This is not a big deal. The predefined DOMTimeStamp type just seems 
suitable for our purposes. We could simply chose the same epoch as Date 
so a Date object can be constructed with new Date(timestamp) if needed.

>>
>> ============================================================
>>
>> The changes proposed above would make the API a bit easier to use by
>> working directly with objcets and with less "get by id" operations.
>>
>> (new example)
>> pc.getStats(selector, ["rtp-stream"], ...);
>>
>> // proccess stats
>> var objects = secondReport.objects();
>> for (i = 0; i < objects.length; i++) {
>>     second = objects[i];
>>     // get the corresponding stats object from the first report
>>     first = firstReport.get(second.id);
>>
>>     if (first && first.remote) {
>>         var packetsSent = first.packetsSsent - second.packetsSent;
>>         var packetsReceived = first.remote.packetsReceived -
>>                 second.remote.packetsReceived;
>>
>>         // if fractionLost is > 0.3, we have probably found the culprit
>>         var fractionLost = (packetsSent - packetsReceived) /
>>                 packetsSent;
>>     }
>> }
>>
>> ------------------------------------------------------------
>>
>> (existing example)
>> pc.getStats(selector, ...);
>>
>> // proccess stats
>> var ssrcIds = now.objects("RTPStream");
>> for (i = 0; i < ssrcIds.length; i++) {
>>     var ssrcStatsId = ssrcIds[i];
>>     nowState = now.get(ssrcStatsId);
>>     prevState = baseline.get(ssrcStatsId);
>>     remoteStatsId = nowstate.get("otherEndStats");
>>     remoteNowState = now.get(remoteStatsId);
>>     remotePrevState = baseline.get(remoteStatsId);
>>
>>     if (prevstate && remotePrevState) {
>>         var packetsSent = nowstate.get(packetsSent)
>>                 - prevstate.get(packetsSent);
>>         var packetsReceived = remoteNowState.get(packetsReceived)
>>                 - remotePrevState.get(packetsReceived);
>>
>>         // if fractionLost is > 0.3, we have probably found the culprit
>>         var fractionLost = (packetsSent - packetsReceived) /
>>                 packetsSent;
>>     }
>> }
>>
>>
>> /Adam
>>
>> [1] http://dev.w3.org/2006/webapi/WebIDL/#common-DOMTimeStamp
>>
>>
>>
>

Received on Monday, 28 January 2013 12:09:35 UTC