Skip to content

WIP for collecting send-tab telemetry. #3308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 28, 2020
Merged

WIP for collecting send-tab telemetry. #3308

merged 1 commit into from
Jul 28, 2020

Conversation

mhammond
Copy link
Member

@mhammond mhammond commented Jul 1, 2020

[Lina and I chatted about this and decided to collaborate - I can't work out how to change the branch in the other PR, so here we are...]

This is a WIP just to get feedback on the shape of this.

The main thing I'm trying to avoid here is to have the telemetry "pollute" the
public API - the public API should not be aware where and when we send
telemetry.

In this case, the "public API" in question is send-tab, where we can probably
suck up a public API change, but we shouldn't!

The whole "what telemetry API should we use/when can we use glean everywhere"
question is a good one, but one we aren't going resolve here. An ideal world
probably has our rust components directly using glean - in which case the
telemetry data also doesn't pollute the other public APIs - so let's try and
get a little closer to that world.

So the summuary of how this patch hangs together is:

  • There's a new FxaTelemetry struct which is a grab-bag of all the telemetry
    we collect (which isn't much - even desktop today doesn't gather much). This
    struct is stored in a RefCell<> inside FirefoxAccount.

  • There's a new fxa_gather_telemetry() function in the FFI. The idea is that
    android-components telemetry code will call this, and translate what it
    gets back into the glean calls it needs to make (ie, android-components will
    also add a couple of new events to sync-telemetry/metrics.yaml - note that
    the send-tab code doesn't really get involved here.

  • Except that the send-tab code will need to tell the telemetry code that
    there's probably telemetry to gather. So while it's not magic or fully
    automatic, most of the responsibilities are in the right place and there's
    only a few leaky abstractions.

Assuming the "big picture" of this is OK, some "small picture" issues:

  • I'm not sure if it's OK that fxa_gather_telemetry() returns a JSON string.
    The idea is that it should ideally be dynamic - stuff it doesn't understand
    should be ignored. Eg, you can imagine a world where some things are only
    understood/recorded by, say, iOS. Adding new telemetry shouldn't be a breaking
    change to platforms that don't want to explicitly record it.

  • No tests!

  • Um - I'm sure I had more than that :) I'm sure you'll find some in the patch.

Pull Request checklist

  • Quality: This PR builds and tests run cleanly
    • automation/all_tests.sh runs to completion and produces no failures
    • Note: For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGES_UNRELEASED.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

@mhammond mhammond requested review from rfk, eoger and linabutler July 1, 2020 00:25
Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the shape of this and the way that it starts to head in the direction of gleanifying things. We risk dropping metrics if they're not submitted by the calling app, but we have that problem anyway with the "return telemetry over the FFI" approach, so I don't think we'll make that any worse.

@mhammond mhammond marked this pull request as draft July 2, 2020 05:54
Copy link
Contributor

@eoger eoger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, same as a previous comment I think introducing the "send_tab to multiple devices" might be a bit more work, but nothing crazy.

@mhammond
Copy link
Member Author

mhammond commented Jul 6, 2020

I added capturing of the 'reason' like bug 1639843. I also renamed fetch_device_command to ios_fetch_device_command to try and discourage it being used anywhere other than iOS, because when I first found that function I assumed it was the preferred way of handling a push message, which it isn't.

There are a few comments above I still need to address and need to add some tests, but this seems to record telemetry correctly in Fenix with mozilla-mobile/android-components#7618

@mhammond
Copy link
Member Author

I think this is ready to roll (although I'll get feedback from @grigoryk on mozilla-mobile/android-components#7618 before I actually land it.

@mhammond mhammond marked this pull request as ready for review July 21, 2020 04:27
Copy link
Contributor

@linabutler linabutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @mhammond! 🚀

// We have a naive strategy to avoid unbounded memory growth - the intention
// is that if any platform lets things grow to hit these limits, it's probably
// never going to consume anything - so it doesn't matter what we discard (ie,
// there's no good reason to have a smarter circular buffer etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we record an event for when the buffer overflows, to see how often this happens? Glean has special invalid_value and invalid_overflow buckets that it uses when the app records weird stuff (string too long, negative or overflowing counter or enum, unknown discriminant), but I don't know if it's worth doing that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth it TBH (at least not now :) Android will grab this immediately after the tab operation, so should never happen. Is this likely on iOS? IOW, do you think we should get an issue opened for this?

The intent here is that this gets us closer to a glean world, where the
telemetry isn't part of the public API.

* There's a new `FxaTelemetry` struct which is a grab-bag of all the telemetry
  we collect (which isn't much - even desktop today doesn't gather much). This
  struct is stored in a `RefCell<>` inside `FirefoxAccount`.

* There's a new `fxa_gather_telemetry()` function in the FFI. The idea is that
  android-components *telemetry* code will call this, and translate what it
  gets back into the glean calls it needs to make (ie, android-components will
  also add a couple of new events to `sync-telemetry/metrics.yaml` - note that
  the *send-tab* code doesn't really get involved here.

* Except that the *send-tab* code will need to tell the telemetry code that
  there's probably telemetry to gather. So while it's not magic or fully
  automatic, most of the responsibilities are in the right place and there's
  only a few leaky abstractions.
@codecov-commenter
Copy link

Codecov Report

Merging #3308 into main will decrease coverage by 0.04%.
The diff coverage is 31.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3308      +/-   ##
==========================================
- Coverage   57.28%   57.24%   -0.05%     
==========================================
  Files         230      230              
  Lines       30336    30416      +80     
  Branches     7339     7356      +17     
==========================================
+ Hits        17379    17412      +33     
- Misses       7282     7320      +38     
- Partials     5675     5684       +9     
Impacted Files Coverage Δ
components/fxa-client/ffi/src/lib.rs 0.00% <0.00%> (ø)
components/fxa-client/src/device.rs 69.45% <0.00%> (-1.15%) ⬇️
components/fxa-client/src/push.rs 60.40% <0.00%> (-0.41%) ⬇️
components/fxa-client/src/send_tab.rs 14.08% <0.00%> (-1.79%) ⬇️
examples/fxa-client/src/devices-api.rs 0.00% <0.00%> (ø)
components/fxa-client/src/telemetry.rs 77.65% <33.33%> (-10.02%) ⬇️
components/fxa-client/src/commands/send_tab.rs 40.29% <67.85%> (+9.66%) ⬆️
components/fxa-client/src/lib.rs 82.78% <100.00%> (+0.11%) ⬆️
components/fxa-client/src/ffi.rs 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2506046...2f1f21a. Read the comment docs.

@mhammond mhammond merged commit eb92919 into main Jul 28, 2020
@mhammond mhammond deleted the sendtab-metrics branch July 28, 2020 07:36
mhammond added a commit to mhammond/application-services that referenced this pull request Aug 4, 2020
mhammond added a commit that referenced this pull request Aug 4, 2020
The intent here is that this gets us closer to a glean world, where the
telemetry isn't part of the public API.

* There's a new `FxaTelemetry` struct which is a grab-bag of all the telemetry
  we collect (which isn't much - even desktop today doesn't gather much). This
  struct is stored in a `RefCell<>` inside `FirefoxAccount`.

* There's a new `fxa_gather_telemetry()` function in the FFI. The idea is that
  android-components *telemetry* code will call this, and translate what it
  gets back into the glean calls it needs to make (ie, android-components will
  also add a couple of new events to `sync-telemetry/metrics.yaml` - note that
  the *send-tab* code doesn't really get involved here.

* Except that the *send-tab* code will need to tell the telemetry code that
  there's probably telemetry to gather. So while it's not magic or fully
  automatic, most of the responsibilities are in the right place and there's
  only a few leaky abstractions.
bors bot pushed a commit to mozilla-mobile/android-components that referenced this pull request Aug 20, 2020
7618: Record send-tab telemetry for Fenix. r=grigoryk a=mhammond

application-services needs to be closely involved in creating the
data to be recorded in telemetry - however, we are trying to avoid
exposing this telemetry directly to the android-components "concepts"
or "features" - for example, the part of android-components that sends
a tab ideally wouldn't have to deal with any interface changes just because
telemetry is being sent or changed - otherwise new/changed telemetry is always
a "breaking change"

In the future, we expect the rust components to interact directly with Glean,
and consumers like android-components wouldn't need to get involved at all.
However, until then...

The idea with this PR is that there's one new public account function,
`processPendingTelemetry()`, which should be called whenever something is
done which might record telemetry. However, it doesn't need to know
what is actually recorded - that knowledge is only in the telemetry
code - the code that is tightly bound to glean.

This hasn't been tested yet - but it builds :) I'm soliciting feedback on
the general shape of this before I invest too much more in tests etc.

I also considered trying to do with Observers(), but that didn't seem
any easier. I'm obviously open to alternative approaches!

See also mozilla/application-services#3308



Co-authored-by: Mark Hammond <[email protected]>
bors bot pushed a commit to mozilla-mobile/android-components that referenced this pull request Aug 20, 2020
7618: Record send-tab telemetry for Fenix. r=grigoryk a=mhammond

application-services needs to be closely involved in creating the
data to be recorded in telemetry - however, we are trying to avoid
exposing this telemetry directly to the android-components "concepts"
or "features" - for example, the part of android-components that sends
a tab ideally wouldn't have to deal with any interface changes just because
telemetry is being sent or changed - otherwise new/changed telemetry is always
a "breaking change"

In the future, we expect the rust components to interact directly with Glean,
and consumers like android-components wouldn't need to get involved at all.
However, until then...

The idea with this PR is that there's one new public account function,
`processPendingTelemetry()`, which should be called whenever something is
done which might record telemetry. However, it doesn't need to know
what is actually recorded - that knowledge is only in the telemetry
code - the code that is tightly bound to glean.

This hasn't been tested yet - but it builds :) I'm soliciting feedback on
the general shape of this before I invest too much more in tests etc.

I also considered trying to do with Observers(), but that didn't seem
any easier. I'm obviously open to alternative approaches!

See also mozilla/application-services#3308



Co-authored-by: Mark Hammond <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants