Re: Generalizing the stats hierarchy from Martin Thomson on 2012-09-27 (public-webrtc@w3.org from September 2012)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Thu, 27 Sep 2012 09:35:29 -0700
To: Harald Alvestrand <harald@alvestrand.no>
Cc: public-webrtc@w3.org
Message-ID: <CABkgnnX7tm+fp+iCifODzJkF00GC7OmZHB=28+48ttkRJp0hCg@mail.gmail.com>
On 27 September 2012 07:49, Harald Alvestrand <harald@alvestrand.no> wrote:
> (back to being serious on this thread, and top-posting because I want to
> talk about general principles rather than details of a proposal..)
>
> My experience with stats systems (mostly SNMP, but also mrtg, nagios and a
> couple of Google-internal systems) is that they generally descend to work on
> basic items, and offer powerful graphing, summarization and alarming
> features that work from these primitive values - but rarely, if ever, have I
> found an interface where it's comfortable to work with structured values
> directly.

That is a concern that I can understand.  The only systems that I've
seen work for any amount of structure are in reality tightly coupled,
even if the interface is MIB.  The problem as I see it arises from
having too large a surface area, which inevitably results in a
mismatch between what the two sides need.  Compromises follow.

> I *like* structured objects. They are extremely useful in many ways,
> especially when thinking about things, but they are hard to handle. In
> particular, in the WebIDL type system, having functions that return "either
> an object or an array or a dictionary" is tricky.

Actually that's untrue:
  any getValue(DOMString key);

Or with union types:
  (DOMString or double) getValue(DOMString key);

That might have consequences for a strongly typed language that
implements the IDL.  But let's be honest, we're talking about
Javascript here - it's just not a problem.

> Martin's mention of "JSON pointer" (I assume that this is
> draft-ietf-appsawg-json-pointer-03.txt) makes me think that we might be able
> to have our cake and eat it... if we define just 2 operations:
>
> - getValue("identifier") -> primitive object
> - getNames("identifier") -> list of names (sequence<DOMString>) that are
> valid for the next level down

I assume that getValue("") would enumerate the top of the tree (""
being the empty JSON pointer that identifies the root, as opposed to
"/" which identifies a node with a key of the empty string)?

The problem that I have with this is that - at a practical level - I
can't distinguish between:

   object.getValue("/a/b/3"); // === 12
   object.getNames("/a/b").map(object.getValue.bind(object)); // === [
"A", "b", "C", "d" ]
and
   object["a"]["b"][3]; // === 12
   object["a"]["b"]; // === [ "A", "b", "C", "d" ]

Note the added complexity involved in enumerating values in your proposal.

I'm less convinced of the need for JSON pointer in this case.  Most
uses for JSON pointer involve JSON documents where the pointer exists
outside the document, or there is a need within the document to have
DRY (don't repeat yourself) complex relationships (multiple owners for
the same structure, etc...).  In an object model, you can easily build
those complex relationships if it is really necessary.

But this is a theoretical discussion, we need the object model before
we can determine which of the available options are appropriate.  We
may well disagree on the finer points, but without having the big
picture laid out, we're really just guessing.

> we can represent any level of complexity, and allow navigation through it,
> without having to deal with compound values.
> (Of course, we can also allow getValue to return compound values - if anyone
> cares enough to get Webkit and friends to understand that.....)
>
> The objects I see are:
>
> - SSRCs (one or more per MediaStreamTrack - that is, an N:1 mapping)
> - Transports (1:N mapping to SSRCs)
> - Components (RTP / RTCP ... where does DTLS fit?)
> - Candidate pairs (N:1 mapping to transports)
> - Candidates (N:N mapping to candidate pairs??)

Building an object model is the first and most important task in any
operational task.  If the model is poor, then nothing you produce will
ever really fit and your users will end up having to write more code
to fix any shortcomings.  This looks like a good start.

DTLS is 1:1 with components.  You missed the mapping from transports
to components (1:1..2).

A candidate pair can only map to a single candidate on each end, so
your mapping is really just 2 lots of a 1:N mapping (one local, one
remote).

I'm reluctant to suggest it, but this is really a situation where UML
really does help.

> Offhand, I don't see a need to have complex stats on any of these, but
> others might....

So far, I've seen mention of the fact that this interface will be used
for more than just statistics.  Certificate data might be presented.
Though everything could ultimately be mapped down to primitives, that
can end up suffering a lack of expressiveness, or result in a rough
user experience.

> On 09/24/2012 06:05 PM, Martin Thomson wrote:
>>
>> This makes sense to me.  Far more so than the flat structure.  It
>> seems clear that the cost of managing structure warrants the
>> (marginal) extra complexity that this results in.
>>
>> Did you consider JSON Pointer as a way to identify nodes in a tree?
>> Alternatively, you could expose the entire tree in the report without
>> any need for getValue():
>>
>> Your example:
>>    report.getValue("ICE.0")
>> JSON pointer:
>>    report.getValue("/ICE/0")
>> DIrect:
>>    report.value.ICE[0]
>>
>> These are, after all, just dictionaries and sequences of things.  A
>> direct approach allows for inspection with things like
>> hasOwnProperty(), the "in" operator, forEach(), and so forth.  Much
>> easier to program to.
>>
>> On 24 September 2012 08:13, Eric Rescorla <ekr@rtfm.com> wrote:
>>>
>>> Harald,
>>>
>>> In draft-alvestrand-rtcweb-stats-registry-00.txt, you observe that
>>> there are times when a single named statistics value actually
>>> corresponds to a number of elements and you would like to be able to
>>> address them individually. You suggest handling this case with the
>>> convention of appending a ".X" to the stat in question, but
>>> I think this actually points to the need towards genuinely
>>> hierarchical stats.
>>>
>>> Consider the case where you want to examine every aspect of ICE,
>>> which I think there is general consensus we need. At this point
>>> we have the following containment hierarchy:
>>>
>>>    - Media Stream  [W3C name: track]
>>>    - Component     [RTP or RTCP]
>>>    - Local candidate
>>>      - State
>>>      - Check history
>>>      - Estimated RTT
>>>
>>> This seems pretty deep to represent cleanly in the existing hierarchy
>>> but would fit well into a more generic structure.
>>>
>>> Here's a strawman to give you an idea of what I have in mind:
>>>
>>> - Instead of being just opaque strings, stats identifiers
>>>    should be dot-separated strings, with dots separating
>>>    levels in the hierarchy.
>>>
>>> - When registered, each stats identifier must be one of:
>>>
>>>    * value -- the value is in the stat itself
>>>    * array -- the stat contains a list of values in an array
>>>      (i.e., [])
>>>    * dictionary -- the stat contains a list of values in a
>>>      dictionary (i.e., {})
>>>
>>> - You can call getValue() at any level in the hierarchy
>>>    and what you get depends on the identifier type. You
>>>    can subaddress arrays and dictionaries by including
>>>    the index/key in the identifier (as shown below).
>>>
>>>
>>> Reworking your ICE example in this fashion would give us something like
>>> this:
>>>
>>> { local: { timestamp: 12345, stats: {
>>>           SentPackets: 47,
>>>           SentOctets: 4444,
>>>           ReceivedPackets: 33,
>>>           ReceivedOctets: 2346,
>>>           ICE: [
>>>             {
>>>               State: Succeeded
>>>               Used: True,
>>>               LocalIpAddr: '129.241.1.99',
>>>               RemoteIpAddr:'234.978.4.3'
>>>             },
>>>             {
>>>               LocalIPAddr: '10.0.0.1',
>>>               RemoteIPAddr: '10.0.1.24',
>>>               State: Succeeded
>>>               Used: False
>>>             }
>>>           ]
>>> }}}
>>>
>>> ISTM that this places things that naturally go together together,
>>> and also makes it easier to build processing engines without a lot
>>> of string manipulation.
>>>
>>>
>>> If I am reading the current API correctly, the only way to actually
>>> get at a statistics value is to do .getValue() on an RTCStatsReport.
>>> In this case, the code would then be something like this:
>>>
>>>     report.getValue('SentPackets') --> 47
>>>     report.getValue('ICE') -->
>>>           ICE: [
>>>             {
>>>               State: Succeeded
>>>               Used: True,
>>>               LocalIpAddr: '129.241.1.99',
>>>               RemoteIpAddr:'234.978.4.3'
>>>             },
>>>             {
>>>               LocalIPAddr: '10.0.0.1',
>>>               RemoteIPAddr: '10.0.1.24',
>>>               State: Succeeded
>>>               Used: False
>>>             }
>>>           ]
>>>
>>>
>>>     report.getValue('ICE.0') -->
>>>             {
>>>               State: Succeeded
>>>               Used: True,
>>>               LocalIpAddr: '129.241.1.99',
>>>               RemoteIpAddr:'234.978.4.3'
>>>             }
>>>
>>>     report.getValue('ICE.0.State') --> 'Succeeded'
>>>
>>> Thoughts?
>>> -Ekr
>>>
>
>
Received on Thursday, 27 September 2012 16:35:58 UTC