Re: Generalizing the stats hierarchy from Harald Alvestrand on 2012-09-28 (public-webrtc@w3.org from September 2012)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Fri, 28 Sep 2012 11:54:39 +0200
To: Martin Thomson <martin.thomson@gmail.com>
CC: public-webrtc@w3.org
Message-ID: <506573DF.4000504@alvestrand.no>
On 09/27/2012 06:35 PM, Martin Thomson wrote:
> On 27 September 2012 07:49, Harald Alvestrand <harald@alvestrand.no> wrote:
>> (back to being serious on this thread, and top-posting because I want to
>> talk about general principles rather than details of a proposal..)
>>
>> My experience with stats systems (mostly SNMP, but also mrtg, nagios and a
>> couple of Google-internal systems) is that they generally descend to work on
>> basic items, and offer powerful graphing, summarization and alarming
>> features that work from these primitive values - but rarely, if ever, have I
>> found an interface where it's comfortable to work with structured values
>> directly.
> That is a concern that I can understand.  The only systems that I've
> seen work for any amount of structure are in reality tightly coupled,
> even if the interface is MIB.  The problem as I see it arises from
> having too large a surface area, which inevitably results in a
> mismatch between what the two sides need.  Compromises follow.
>
>> I *like* structured objects. They are extremely useful in many ways,
>> especially when thinking about things, but they are hard to handle. In
>> particular, in the WebIDL type system, having functions that return "either
>> an object or an array or a dictionary" is tricky.
> Actually that's untrue:
>    any getValue(DOMString key);
Except that Webkit's IDL doesn't support "any" in any meaningful sense...
>
> Or with union types:
>    (DOMString or double) getValue(DOMString key);
>
> That might have consequences for a strongly typed language that
> implements the IDL.  But let's be honest, we're talking about
> Javascript here - it's just not a problem.
>
>> Martin's mention of "JSON pointer" (I assume that this is
>> draft-ietf-appsawg-json-pointer-03.txt) makes me think that we might be able
>> to have our cake and eat it... if we define just 2 operations:
>>
>> - getValue("identifier") -> primitive object
>> - getNames("identifier") -> list of names (sequence<DOMString>) that are
>> valid for the next level down
> I assume that getValue("") would enumerate the top of the tree (""
> being the empty JSON pointer that identifies the root, as opposed to
> "/" which identifies a node with a key of the empty string)?
Are you assuming that getValue("") would function like getNames("")?
I'm not sure what you mean here.
>
> The problem that I have with this is that - at a practical level - I
> can't distinguish between:
>
>     object.getValue("/a/b/3"); // === 12
>     object.getNames("/a/b").map(object.getValue.bind(object)); // === [
> "A", "b", "C", "d" ]
> and
>     object["a"]["b"][3]; // === 12
>     object["a"]["b"]; // === [ "A", "b", "C", "d" ]
>
> Note the added complexity involved in enumerating values in your proposal.
Adam Barth told me that having attributes as arrays implies that you can 
modify those attributes. They're pass-by-reference, not pass-by-value. 
This isn't what we want; it makes no semantic sense to have stats be 
modifiable.

So my current implementation has gone from arrays to functions returning 
sequence<object> instead.
(The fact that array-valued attributes of complex types don't seem to 
work with the current WebKit is a different matter. I won't explore that 
just now.)

The other thing I don't like about this is


>
> I'm less convinced of the need for JSON pointer in this case.  Most
> uses for JSON pointer involve JSON documents where the pointer exists
> outside the document, or there is a need within the document to have
> DRY (don't repeat yourself) complex relationships (multiple owners for
> the same structure, etc...).  In an object model, you can easily build
> those complex relationships if it is really necessary.
>
> But this is a theoretical discussion, we need the object model before
> we can determine which of the available options are appropriate.  We
> may well disagree on the finer points, but without having the big
> picture laid out, we're really just guessing.
>
>> we can represent any level of complexity, and allow navigation through it,
>> without having to deal with compound values.
>> (Of course, we can also allow getValue to return compound values - if anyone
>> cares enough to get Webkit and friends to understand that.....)
>>
>> The objects I see are:
>>
>> - SSRCs (one or more per MediaStreamTrack - that is, an N:1 mapping)
>> - Transports (1:N mapping to SSRCs)
>> - Components (RTP / RTCP ... where does DTLS fit?)
>> - Candidate pairs (N:1 mapping to transports)
>> - Candidates (N:N mapping to candidate pairs??)
> Building an object model is the first and most important task in any
> operational task.  If the model is poor, then nothing you produce will
> ever really fit and your users will end up having to write more code
> to fix any shortcomings.  This looks like a good start.
>
> DTLS is 1:1 with components.  You missed the mapping from transports
> to components (1:1..2).
Yes, I was thinking of the DTLS data channel, which is multiplexed onto 
a transport.
>
> A candidate pair can only map to a single candidate on each end, so
> your mapping is really just 2 lots of a 1:N mapping (one local, one
> remote).
>
> I'm reluctant to suggest it, but this is really a situation where UML
> really does help.
I'll leave it to you to supply the drawing ....
>
>> Offhand, I don't see a need to have complex stats on any of these, but
>> others might....
> So far, I've seen mention of the fact that this interface will be used
> for more than just statistics.  Certificate data might be presented.
Not sure certificate handling is a stats operation. But if it is....

A certificate is a primitive object, isn't it? The moment you start 
taking it apart, it loses its certificate-ness, since you can't verify 
the signature any more. Can you point to other APIs that handle 
certificates in a sensible manner? I don't want to reinvent anything I 
can avoid reinventing.
> Though everything could ultimately be mapped down to primitives, that
> can end up suffering a lack of expressiveness, or result in a rough
> user experience.
The first iteration of stats will be a rough user experience, because we 
don't know the user yet.
I want to make sure we can get some useful numbers out in the first 
iteration, and then iterate.

>
>> On 09/24/2012 06:05 PM, Martin Thomson wrote:
>>> This makes sense to me.  Far more so than the flat structure.  It
>>> seems clear that the cost of managing structure warrants the
>>> (marginal) extra complexity that this results in.
>>>
>>> Did you consider JSON Pointer as a way to identify nodes in a tree?
>>> Alternatively, you could expose the entire tree in the report without
>>> any need for getValue():
>>>
>>> Your example:
>>>     report.getValue("ICE.0")
>>> JSON pointer:
>>>     report.getValue("/ICE/0")
>>> DIrect:
>>>     report.value.ICE[0]
>>>
>>> These are, after all, just dictionaries and sequences of things.  A
>>> direct approach allows for inspection with things like
>>> hasOwnProperty(), the "in" operator, forEach(), and so forth.  Much
>>> easier to program to.
>>>
>>> On 24 September 2012 08:13, Eric Rescorla <ekr@rtfm.com> wrote:
>>>> Harald,
>>>>
>>>> In draft-alvestrand-rtcweb-stats-registry-00.txt, you observe that
>>>> there are times when a single named statistics value actually
>>>> corresponds to a number of elements and you would like to be able to
>>>> address them individually. You suggest handling this case with the
>>>> convention of appending a ".X" to the stat in question, but
>>>> I think this actually points to the need towards genuinely
>>>> hierarchical stats.
>>>>
>>>> Consider the case where you want to examine every aspect of ICE,
>>>> which I think there is general consensus we need. At this point
>>>> we have the following containment hierarchy:
>>>>
>>>>     - Media Stream  [W3C name: track]
>>>>     - Component     [RTP or RTCP]
>>>>     - Local candidate
>>>>       - State
>>>>       - Check history
>>>>       - Estimated RTT
>>>>
>>>> This seems pretty deep to represent cleanly in the existing hierarchy
>>>> but would fit well into a more generic structure.
>>>>
>>>> Here's a strawman to give you an idea of what I have in mind:
>>>>
>>>> - Instead of being just opaque strings, stats identifiers
>>>>     should be dot-separated strings, with dots separating
>>>>     levels in the hierarchy.
>>>>
>>>> - When registered, each stats identifier must be one of:
>>>>
>>>>     * value -- the value is in the stat itself
>>>>     * array -- the stat contains a list of values in an array
>>>>       (i.e., [])
>>>>     * dictionary -- the stat contains a list of values in a
>>>>       dictionary (i.e., {})
>>>>
>>>> - You can call getValue() at any level in the hierarchy
>>>>     and what you get depends on the identifier type. You
>>>>     can subaddress arrays and dictionaries by including
>>>>     the index/key in the identifier (as shown below).
>>>>
>>>>
>>>> Reworking your ICE example in this fashion would give us something like
>>>> this:
>>>>
>>>> { local: { timestamp: 12345, stats: {
>>>>            SentPackets: 47,
>>>>            SentOctets: 4444,
>>>>            ReceivedPackets: 33,
>>>>            ReceivedOctets: 2346,
>>>>            ICE: [
>>>>              {
>>>>                State: Succeeded
>>>>                Used: True,
>>>>                LocalIpAddr: '129.241.1.99',
>>>>                RemoteIpAddr:'234.978.4.3'
>>>>              },
>>>>              {
>>>>                LocalIPAddr: '10.0.0.1',
>>>>                RemoteIPAddr: '10.0.1.24',
>>>>                State: Succeeded
>>>>                Used: False
>>>>              }
>>>>            ]
>>>> }}}
>>>>
>>>> ISTM that this places things that naturally go together together,
>>>> and also makes it easier to build processing engines without a lot
>>>> of string manipulation.
>>>>
>>>>
>>>> If I am reading the current API correctly, the only way to actually
>>>> get at a statistics value is to do .getValue() on an RTCStatsReport.
>>>> In this case, the code would then be something like this:
>>>>
>>>>      report.getValue('SentPackets') --> 47
>>>>      report.getValue('ICE') -->
>>>>            ICE: [
>>>>              {
>>>>                State: Succeeded
>>>>                Used: True,
>>>>                LocalIpAddr: '129.241.1.99',
>>>>                RemoteIpAddr:'234.978.4.3'
>>>>              },
>>>>              {
>>>>                LocalIPAddr: '10.0.0.1',
>>>>                RemoteIPAddr: '10.0.1.24',
>>>>                State: Succeeded
>>>>                Used: False
>>>>              }
>>>>            ]
>>>>
>>>>
>>>>      report.getValue('ICE.0') -->
>>>>              {
>>>>                State: Succeeded
>>>>                Used: True,
>>>>                LocalIpAddr: '129.241.1.99',
>>>>                RemoteIpAddr:'234.978.4.3'
>>>>              }
>>>>
>>>>      report.getValue('ICE.0.State') --> 'Succeeded'
>>>>
>>>> Thoughts?
>>>> -Ekr
>>>>
>>
Received on Friday, 28 September 2012 09:55:50 UTC