Recording Telemetry scalars from add-ons

The Go Faster initiative is important as it enables us to ship code faster, using special add-ons, without being strictly tied to the Firefox train schedule. As Georg Fritzsche pointed out in his article, we have two options for instrumenting these add-ons: having probe definitions ride the trains (waiting a few weeks!) or implementing and sending a new custom ping (doing some pipeline work!).

Both solutions are not very appealing when shipping code faster. But hey.. we have plan!

Our current work is focused on extending Telemetry to fill this gap. The first step consisted in enabling add-ons event recording in Firefox 56 (bug) and we recently enabled add-on scalar recording as well (bug)!

… 

 

Getting Firefox data faster: introducing the ‘new-profile’ ping

Let me state this clearly, again: data latency sucks. This is especially true when working on Firefox: a nicely crafted piece of software that ships worldwide to many people. When something affects the experience of our users we need to know and react fast.

The story so far…

We started improving the latency of the data coming from Firefox, in the previous quarters, and got to the point where the majority of pings reach our servers within 1 hour, instead of days (latest Beta only): there’s an extremely satisfying plot by :chutten about that!

However, this change does not help too much with the data latency of users who just installed Firefox (or created a new profile), don’t trigger a subsession split and usually suspend their computer instead of shutting Firefox down. Their first chunk of data would come either at their local midnight or after they wake their computer again. And this could take hours or days (on weekends).

… 

 

Getting Firefox data faster: the shutdown pingsender

The data our Firefox users share with us is the key to identify and fix performance issues that lead to a poor browsing experience. Collecting it is not enough if we don’t manage to receive the data in an acceptable time-frame. My esteemed colleague Chris already wrote about this a couple of times: data latency sucks. But we can fix that.

Why is there latency, anyway?

The bulk of measurements we collect (histograms, scalars, events, …) are sent through the main-ping. This ping is generated at different times during the browsing session, including shutdown. The “shutdown” main-ping, which accounts for about ~80% of all the pings we receive, once generated, is not sent to our servers until the next Firefox restart. Depending on the user habits and the day of the week, this could be anything between a few minutes to a few days (see the CDF plot below): way too much! One of my team’s goal for this year is to reduce this latency, allowing developers to take decisions and iterate quickly.

…