WIP for collecting send-tab telemetry. #3308

mhammond · 2020-07-01T00:25:01Z

[Lina and I chatted about this and decided to collaborate - I can't work out how to change the branch in the other PR, so here we are...]

This is a WIP just to get feedback on the shape of this.

The main thing I'm trying to avoid here is to have the telemetry "pollute" the
public API - the public API should not be aware where and when we send
telemetry.

In this case, the "public API" in question is send-tab, where we can probably
suck up a public API change, but we shouldn't!

The whole "what telemetry API should we use/when can we use glean everywhere"
question is a good one, but one we aren't going resolve here. An ideal world
probably has our rust components directly using glean - in which case the
telemetry data also doesn't pollute the other public APIs - so let's try and
get a little closer to that world.

So the summuary of how this patch hangs together is:

There's a new FxaTelemetry struct which is a grab-bag of all the telemetry
we collect (which isn't much - even desktop today doesn't gather much). This
struct is stored in a RefCell<> inside FirefoxAccount.
There's a new fxa_gather_telemetry() function in the FFI. The idea is that
android-components telemetry code will call this, and translate what it
gets back into the glean calls it needs to make (ie, android-components will
also add a couple of new events to sync-telemetry/metrics.yaml - note that
the send-tab code doesn't really get involved here.
Except that the send-tab code will need to tell the telemetry code that
there's probably telemetry to gather. So while it's not magic or fully
automatic, most of the responsibilities are in the right place and there's
only a few leaky abstractions.

Assuming the "big picture" of this is OK, some "small picture" issues:

I'm not sure if it's OK that fxa_gather_telemetry() returns a JSON string.
The idea is that it should ideally be dynamic - stuff it doesn't understand
should be ignored. Eg, you can imagine a world where some things are only
understood/recorded by, say, iOS. Adding new telemetry shouldn't be a breaking
change to platforms that don't want to explicitly record it.
No tests!
Um - I'm sure I had more than that :) I'm sure you'll find some in the patch.

Pull Request checklist

Quality: This PR builds and tests run cleanly
- automation/all_tests.sh runs to completion and produces no failures
- Note: For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry in CHANGES_UNRELEASED.md or an explanation of why it does not need one
- Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
Dependencies: This PR follows our dependency management guidelines
- Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

rfk

Overall, I like the shape of this and the way that it starts to head in the direction of gleanifying things. We risk dropping metrics if they're not submitted by the calling app, but we have that problem anyway with the "return telemetry over the FFI" approach, so I don't think we'll make that any worse.

components/fxa-client/ffi/src/lib.rs

components/fxa-client/src/commands/send_tab.rs

components/fxa-client/src/send_tab.rs

components/fxa-client/src/telemetry.rs

eoger

This makes sense to me, same as a previous comment I think introducing the "send_tab to multiple devices" might be a bit more work, but nothing crazy.

mhammond · 2020-07-06T07:32:07Z

I added capturing of the 'reason' like bug 1639843. I also renamed fetch_device_command to ios_fetch_device_command to try and discourage it being used anywhere other than iOS, because when I first found that function I assumed it was the preferred way of handling a push message, which it isn't.

There are a few comments above I still need to address and need to add some tests, but this seems to record telemetry correctly in Fenix with mozilla-mobile/android-components#7618

mhammond · 2020-07-21T04:27:00Z

I think this is ready to roll (although I'll get feedback from @grigoryk on mozilla-mobile/android-components#7618 before I actually land it.

linabutler

Looks great to me @mhammond! 🚀

components/fxa-client/src/lib.rs

components/fxa-client/src/commands/send_tab.rs

components/fxa-client/src/device.rs

components/fxa-client/src/send_tab.rs

linabutler · 2020-07-28T01:58:04Z

components/fxa-client/src/telemetry.rs

+// We have a naive strategy to avoid unbounded memory growth - the intention
+// is that if any platform lets things grow to hit these limits, it's probably
+// never going to consume anything - so it doesn't matter what we discard (ie,
+// there's no good reason to have a smarter circular buffer etc)


Should we record an event for when the buffer overflows, to see how often this happens? Glean has special invalid_value and invalid_overflow buckets that it uses when the app records weird stuff (string too long, negative or overflowing counter or enum, unknown discriminant), but I don't know if it's worth doing that here.

I don't think it's worth it TBH (at least not now :) Android will grab this immediately after the tab operation, so should never happen. Is this likely on iOS? IOW, do you think we should get an issue opened for this?

The intent here is that this gets us closer to a glean world, where the telemetry isn't part of the public API. * There's a new `FxaTelemetry` struct which is a grab-bag of all the telemetry we collect (which isn't much - even desktop today doesn't gather much). This struct is stored in a `RefCell<>` inside `FirefoxAccount`. * There's a new `fxa_gather_telemetry()` function in the FFI. The idea is that android-components *telemetry* code will call this, and translate what it gets back into the glean calls it needs to make (ie, android-components will also add a couple of new events to `sync-telemetry/metrics.yaml` - note that the *send-tab* code doesn't really get involved here. * Except that the *send-tab* code will need to tell the telemetry code that there's probably telemetry to gather. So while it's not magic or fully automatic, most of the responsibilities are in the right place and there's only a few leaky abstractions.

codecov-commenter · 2020-07-28T07:25:36Z

Codecov Report

Merging #3308 into main will decrease coverage by 0.04%.
The diff coverage is 31.37%.

@@            Coverage Diff             @@
##             main    #3308      +/-   ##
==========================================
- Coverage   57.28%   57.24%   -0.05%     
==========================================
  Files         230      230              
  Lines       30336    30416      +80     
  Branches     7339     7356      +17     
==========================================
+ Hits        17379    17412      +33     
- Misses       7282     7320      +38     
- Partials     5675     5684       +9

Impacted Files	Coverage Δ
components/fxa-client/ffi/src/lib.rs	`0.00% <0.00%> (ø)`
components/fxa-client/src/device.rs	`69.45% <0.00%> (-1.15%)`	⬇️
components/fxa-client/src/push.rs	`60.40% <0.00%> (-0.41%)`	⬇️
components/fxa-client/src/send_tab.rs	`14.08% <0.00%> (-1.79%)`	⬇️
examples/fxa-client/src/devices-api.rs	`0.00% <0.00%> (ø)`
components/fxa-client/src/telemetry.rs	`77.65% <33.33%> (-10.02%)`	⬇️
components/fxa-client/src/commands/send_tab.rs	`40.29% <67.85%> (+9.66%)`	⬆️
components/fxa-client/src/lib.rs	`82.78% <100.00%> (+0.11%)`	⬆️
components/fxa-client/src/ffi.rs	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2506046...2f1f21a. Read the comment docs.

The intent here is that this gets us closer to a glean world, where the telemetry isn't part of the public API. * There's a new `FxaTelemetry` struct which is a grab-bag of all the telemetry we collect (which isn't much - even desktop today doesn't gather much). This struct is stored in a `RefCell<>` inside `FirefoxAccount`. * There's a new `fxa_gather_telemetry()` function in the FFI. The idea is that android-components *telemetry* code will call this, and translate what it gets back into the glean calls it needs to make (ie, android-components will also add a couple of new events to `sync-telemetry/metrics.yaml` - note that the *send-tab* code doesn't really get involved here. * Except that the *send-tab* code will need to tell the telemetry code that there's probably telemetry to gather. So while it's not magic or fully automatic, most of the responsibilities are in the right place and there's only a few leaky abstractions.

7618: Record send-tab telemetry for Fenix. r=grigoryk a=mhammond application-services needs to be closely involved in creating the data to be recorded in telemetry - however, we are trying to avoid exposing this telemetry directly to the android-components "concepts" or "features" - for example, the part of android-components that sends a tab ideally wouldn't have to deal with any interface changes just because telemetry is being sent or changed - otherwise new/changed telemetry is always a "breaking change" In the future, we expect the rust components to interact directly with Glean, and consumers like android-components wouldn't need to get involved at all. However, until then... The idea with this PR is that there's one new public account function, `processPendingTelemetry()`, which should be called whenever something is done which might record telemetry. However, it doesn't need to know what is actually recorded - that knowledge is only in the telemetry code - the code that is tightly bound to glean. This hasn't been tested yet - but it builds :) I'm soliciting feedback on the general shape of this before I invest too much more in tests etc. I also considered trying to do with Observers(), but that didn't seem any easier. I'm obviously open to alternative approaches! See also mozilla/application-services#3308 Co-authored-by: Mark Hammond <[email protected]>

mhammond requested review from rfk, eoger and linabutler July 1, 2020 00:25

mhammond mentioned this pull request Jul 1, 2020

WIP for collecting send-tab telemetry. #3302

Closed

4 tasks

rfk reviewed Jul 1, 2020

View reviewed changes

mhammond marked this pull request as draft July 2, 2020 05:54

eoger approved these changes Jul 2, 2020

View reviewed changes

mhammond mentioned this pull request Jul 4, 2020

Record send-tab telemetry for Fenix. mozilla-mobile/android-components#7618

Merged

4 tasks

mhammond mentioned this pull request Jul 20, 2020

FXIOS-723 ⁃ Generate and record telemetry flow IDs for Sent Tab mozilla-mobile/firefox-ios#6825

Merged

mhammond force-pushed the sendtab-metrics branch 2 times, most recently from 5122745 to 5630b29 Compare July 21, 2020 04:24

mhammond marked this pull request as ready for review July 21, 2020 04:27

mhammond force-pushed the sendtab-metrics branch from 5630b29 to 9ce0a9e Compare July 23, 2020 05:50

linabutler approved these changes Jul 28, 2020

View reviewed changes

mhammond mentioned this pull request Jul 28, 2020

send-tab needs a "bulk" api to send multiple tabs to multiple clients #3402

Closed

mhammond force-pushed the sendtab-metrics branch from 9ce0a9e to 2f1f21a Compare July 28, 2020 07:14

mhammond merged commit eb92919 into main Jul 28, 2020

mhammond deleted the sendtab-metrics branch July 28, 2020 07:36

linabutler mentioned this pull request Jul 30, 2020

Add Swift bindings for fxa_gather_telemetry #3442

Merged

mhammond added a commit to mhammond/application-services that referenced this pull request Aug 4, 2020

Add notes about mozilla#3308 (send-tab telemetry) to changelog

bf07f31

mhammond added a commit that referenced this pull request Aug 4, 2020

Add notes about #3308 (send-tab telemetry) to changelog (#3452)

5a1c51e

mhammond added a commit that referenced this pull request Aug 4, 2020

Add notes about #3308 (send-tab telemetry) to changelog (#3452)

ebd9ea4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP for collecting send-tab telemetry. #3308

WIP for collecting send-tab telemetry. #3308

Uh oh!

mhammond commented Jul 1, 2020

Uh oh!

rfk left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eoger left a comment

Uh oh!

mhammond commented Jul 6, 2020

Uh oh!

mhammond commented Jul 21, 2020

Uh oh!

linabutler left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linabutler Jul 28, 2020

Uh oh!

mhammond Jul 28, 2020

Uh oh!

codecov-commenter commented Jul 28, 2020

Uh oh!

Uh oh!

WIP for collecting send-tab telemetry. #3308

WIP for collecting send-tab telemetry. #3308

Uh oh!

Conversation

mhammond commented Jul 1, 2020

Pull Request checklist

Uh oh!

rfk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eoger left a comment

Choose a reason for hiding this comment

Uh oh!

mhammond commented Jul 6, 2020

Uh oh!

mhammond commented Jul 21, 2020

Uh oh!

linabutler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linabutler Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

mhammond Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jul 28, 2020

Codecov Report

Uh oh!

Uh oh!