RE: [UserTiming] Unifying marks and measures from Nic Jansma on 2011-05-24 (public-web-perf@w3.org from May 2011)

From: Nic Jansma <Nic.Jansma@microsoft.com>
Date: Tue, 24 May 2011 20:10:41 +0000
To: Tony Gentilcore <tonyg@google.com>, "public-web-perf@w3.org" <public-web-perf@w3.org>
Message-ID: <F677C405AAD11B45963EEAE5202813BD19DEC8FE@TK5EX14MBXW651.wingroup.windeploy.ntde>
Hi Tony,

Some of our initial thoughts on the advantages of your proposed interface:

*         We like the simplicity of it.

*         Having measures keep track of the associated mark timestamps is a great point (maybe we should add it to the getMeasures() results of the current proposal?)

Thoughts on some potential downsides:

*         We think it will be common practice and useful to be able to separately grab only marks, or only measures, or only measure timestamps.  When they're combined, you need to iterate over the full array to get these smaller subsets of data.

*         Having the mark name also be the measure name and no differentiation between them may be constraining for describing multiple-phase scenarios.

Our approach was to try to use your proposed interface in different scenarios to see if there are any limitations of it over the current draft.

Here are three scenarios that we came up with where the proposed interface leads to more work for developers or potential confusion:

1.       Multi-Phase scenarios

2.       Retrieving only marks or measures

3.       Finding the durations of a specific scenario

1. Multi-Phase scenarios

Some scenarios might not be just "start" and "end" events.  If a scenario can be broken up into multiple phases, it may be nice to have marks describing different milestones and be able to "link" them together later via measure() in a standard way.

Take an email scenario, where there is a start, middle and end milestone:

Current Draft
mark("emailStart");
...
mark("emailMiddle");
...
mark("emailEnd");

// can be done at any time after the mark has been logged
measure("emailFirstHalf", "emailStart", "emailMiddle");
measure("emailSecondHalf", "emailMiddle", "emailEnd");
measure("emailTotal", "emailStart", "emailEnd");

// marks contains a descriptive mark name and timestamps
var marks = getMarks(); // { "emailStart": [0], "emailMiddle": [1], "emailEnd": [2] }

// measures contains all 3 phases nicely described and their durations
var measures = getMeasures(); // { "emailFirstHalf": [1], "emailSecondHalf": [1], "emailTotal": [2] }
var emailScenarioTime = getMeasures("emailTotal")[0];

New Proposal approach #1
mark("email"); // start of email
...
mark("email"); // middle of email
...
var emailEnd = markEnd("email"); // end of email

var marksAndMeasures = getMarks(); // {"email": [{"t": 1}, {"t": 2, "dur": 2}]}
// Constrained by mark and measure name being the same.
// Current array is not as easy to understand - it looks like there were only two timestamps
// (which are the start and middle ones).
// Is "email" a milestone or a duration?
// Analytics scripts won't know how to deal with this.
// The second "mark" also contains the "measure", and to calculate its timestamp, you need to do math (2+2)
// To get the whole duration, you need to do more math (2-1)+2;
var emailScenarioTime = (getMarks("email")[1].t - getMarks("email")[0].t) + getMarks("email")[1].dur;

// now try to do this for all email scenarios:
// foreach getMarks mod 2 ... (n)
// var emailScenarioTime = (getMarks("email")[n].t - getMarks("email")[n-1].t) + getMarks("email")[n-1].dur;

New Proposal approach #2
//
// OR an alternate approach, marking two distinct firstHalf and secondHalf phases and trying to combine them
// You need to log additional whole-scenario marks
//
mark("email"); // to keep track of whole period
mark("emailFirstHalf"); // start of email
...
markEnd("emailFirstHalf"); // middle of email
mark("emailSecondHalf");
...
markEnd("emailSecondHalf");
markEnd("email"); // to keep track of whole period

// Scenario is better described but required extra instrumentation - might not scale
var marksAndMeasures = getMarks();
// = {"emailFirstHalf": [{"t": 1, "dur": 1}], "emailSecondHalf": {"t": 2, "dur": 2}, "email": {"t": 1, "dur": 3}]}

2. Retrieving only marks or measures

To reduce the upload size to their backend server, a user only cares about measures (or only about marks).

i.e. grab all of the measures in the page (and not marks):

Current Draft
// one simple API call for marks and/or measures
var marks = getMarks(); // { "emailStart": [0], "emailMiddle": [1], "emailEnd": [2] }
var measures = getMeasures(); // { "emailFirstHalf": [1], "emailSecondHalf": [1], "email": [2] }

New Proposal
var marksAndMeasures = getMarks(); // {"email": [{"t": 1}, {"t": 2, "dur": 2}]}
var measures = new Array();

// need to do two loops to find just measures
for (var markName in marksAndMeasures) {
   var markData = marksAndMeasures[markName];
   for (var i = 0; i < markData.length; i++) {
      if (markData[i].dur) {
         if (!measures[markName]) {
            measures[markName] = new Array();
         }
         measures[markName].push(markData[i]);
      }
   }
}

// now, measures = ["email": {"t": 2, "dur": 2}]

3. Finding the durations of a specific scenario

This is a variation of #2.

To reduce the upload size to their backend server even further, a user only cares about the timestamps of a specific scenario ("email"):.

i.e. they want just "[1,1,2,1,1,15,1,1,1,1]":

Current Draft
// pretend this has occurred 10 times (the user has sent 10 emails)
mark("startEmail");
mark("endEmail");
measure("email", "startEmail", "endEmail");

// now they want to send back how long it took to send emails to their backend server
var measures = getMeasures("email"); // { "email": [1,1,2,1,1,15,1,1,1,1] }

New Proposal
// pretend this has occurred 10 times (the user has sent 10 emails)
mark("email");
markEnd("email");

// now they want to send back how long it took to send emails to their backend server
var emailMeasures = new Array();
var marksAndMeasuresForEmail = getMarks("email");

// need to iterate over entire array and only add measure times
for (var i = 0; i < marksAndMeasuresForEmail.length; i++) {
   // only find measures
   if (marksAndMeasuresForEmail[i].dur) {
     emailMeasures.push(marksAndMeasuresForEmail[i].dur);
   }
}

// now, emailMeasures = [1,1,2,1,1,15,1,1,1,1]

---
The multi-phase scenario may or may not be a concern or common practice - I don't know.  What I do like about the current draft is the separation of getting marks and measures.  I'd like to allow developers to easily get to one or another without having to iterate over everything.

Maybe a hybrid of our two drafts?

*         Add the timestamps of the two marks to getMeasures() in the current draft?

*         OR go with your proposal and allow for efficient access of just marks/measures (getMarks(), getMeasures(), getMeasureDurations())?

I feel like we're getting close.  I wonder if others on this list have any feedback on the API?

- Nic

From: public-web-perf-request@w3.org [mailto:public-web-perf-request@w3.org] On Behalf Of Tony Gentilcore
Sent: Wednesday, May 18, 2011 4:45 AM
To: public-web-perf@w3.org
Subject: [UserTiming] Unifying marks and measures

As discussed on last week's call, here's one way to tie marks and measures into the same concept while still preserving the functionality of both. It largely boils down to semantics.

Overview

void mark(in DOMString name);
unsigned long long markEnd(in DOMString name);
Object getMarks(in optional DOMString name);
void clearMarks(in optional DOMString name);

Examples to illustrate behavior

1. Measure something:

mark("sleep3");
sleep(3);
markEnd("sleep3");
> 3
getMarks()
> { "sleep3": [ { "t": 1305712151745, "dur": 3 } ] }

2. Clear everything:

clearMarks()
getMarks()
> { }

3. Create a new mark:

mark(performance.MARK_FULLY_LOADED)
getMarks()
> { "fullyLoaded": [ { "t": 1305712151745 } ] }

4. Improper usage (no start):

clearMarks()
markEnd("doesNotExist")
> 0
getMarks()
> { }

5. Improper usage (end twice):

mark("doubleEnd")
sleep(2);
markEnd("doubleEnd")
> 2
sleep(2);
markEnd("doubleEnd")
> 4
getMarks()
> { "doubleEnd": [ { "t": 1305712151745, "dur": 4 } ] }

Advantages over current draft

1. To get the all data, analytics scripts only need to call one method (getMarks) rather than two (getMarks+getMeasures).

2. Previously measures weren't strongly tied to marks so a timeline couldn't be reconstructed without knowledge of the page.

Consider the old case:
mark("foo");
mark("foo");
measure("bar", "foo");
getMarks();
> { "foo": [1305712151745, 1305712151747] }
getMeasures();
> { "bar": [1] }

Based only on the getMarks+getMeasures data, the "bar" measure cannot be placed on a timeline because it isn't known which "foo" it is associated with (or even that it is associated with "foo" at all).

Now the new case:
mark("foo");
mark("foo");
markEnd("foo");
getMarks();
> { "sleep3": [ { "t": 1305712151745 }, { "t": 1305712151747, "dur": 1 } ] }

The getMarks() data now allows complete reconstruction of the timeline.

3. There is no ambiguity about clearing marks vs clearing measures.

Consider the old case:
mark("foo")
measure("bar", "foo")
clearMarks("foo")
// At this point, it may be unclear to the user what getMeasures() should return. Since bar is based on foo and foo was cleared: does that mean bar is now associated with fetchStart or has it been cleared or is it still associated with foo even though foo is gone? I believe we intend the 3rd, but I'm not sure that would be obvious to users.

4. Simpler to use as there is only one verb "mark" and fewer methods to understand. Each method now takes just one argument that is the same across all methods. Previously, it wasn't at all obvious what to pass to measure() and in what order without looking it up.
Thoughts?

-Tony
Received on Tuesday, 24 May 2011 20:11:14 UTC