-
Notifications
You must be signed in to change notification settings - Fork 240
WIP for collecting send-tab telemetry. #3308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like the shape of this and the way that it starts to head in the direction of gleanifying things. We risk dropping metrics if they're not submitted by the calling app, but we have that problem anyway with the "return telemetry over the FFI" approach, so I don't think we'll make that any worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, same as a previous comment I think introducing the "send_tab to multiple devices" might be a bit more work, but nothing crazy.
I added capturing of the 'reason' like bug 1639843. I also renamed There are a few comments above I still need to address and need to add some tests, but this seems to record telemetry correctly in Fenix with mozilla-mobile/android-components#7618 |
5122745
to
5630b29
Compare
I think this is ready to roll (although I'll get feedback from @grigoryk on mozilla-mobile/android-components#7618 before I actually land it. |
5630b29
to
9ce0a9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me @mhammond! 🚀
// We have a naive strategy to avoid unbounded memory growth - the intention | ||
// is that if any platform lets things grow to hit these limits, it's probably | ||
// never going to consume anything - so it doesn't matter what we discard (ie, | ||
// there's no good reason to have a smarter circular buffer etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we record an event for when the buffer overflows, to see how often this happens? Glean has special invalid_value
and invalid_overflow
buckets that it uses when the app records weird stuff (string too long, negative or overflowing counter or enum, unknown discriminant), but I don't know if it's worth doing that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's worth it TBH (at least not now :) Android will grab this immediately after the tab operation, so should never happen. Is this likely on iOS? IOW, do you think we should get an issue opened for this?
The intent here is that this gets us closer to a glean world, where the telemetry isn't part of the public API. * There's a new `FxaTelemetry` struct which is a grab-bag of all the telemetry we collect (which isn't much - even desktop today doesn't gather much). This struct is stored in a `RefCell<>` inside `FirefoxAccount`. * There's a new `fxa_gather_telemetry()` function in the FFI. The idea is that android-components *telemetry* code will call this, and translate what it gets back into the glean calls it needs to make (ie, android-components will also add a couple of new events to `sync-telemetry/metrics.yaml` - note that the *send-tab* code doesn't really get involved here. * Except that the *send-tab* code will need to tell the telemetry code that there's probably telemetry to gather. So while it's not magic or fully automatic, most of the responsibilities are in the right place and there's only a few leaky abstractions.
9ce0a9e
to
2f1f21a
Compare
Codecov Report
@@ Coverage Diff @@
## main #3308 +/- ##
==========================================
- Coverage 57.28% 57.24% -0.05%
==========================================
Files 230 230
Lines 30336 30416 +80
Branches 7339 7356 +17
==========================================
+ Hits 17379 17412 +33
- Misses 7282 7320 +38
- Partials 5675 5684 +9
Continue to review full report at Codecov.
|
The intent here is that this gets us closer to a glean world, where the telemetry isn't part of the public API. * There's a new `FxaTelemetry` struct which is a grab-bag of all the telemetry we collect (which isn't much - even desktop today doesn't gather much). This struct is stored in a `RefCell<>` inside `FirefoxAccount`. * There's a new `fxa_gather_telemetry()` function in the FFI. The idea is that android-components *telemetry* code will call this, and translate what it gets back into the glean calls it needs to make (ie, android-components will also add a couple of new events to `sync-telemetry/metrics.yaml` - note that the *send-tab* code doesn't really get involved here. * Except that the *send-tab* code will need to tell the telemetry code that there's probably telemetry to gather. So while it's not magic or fully automatic, most of the responsibilities are in the right place and there's only a few leaky abstractions.
7618: Record send-tab telemetry for Fenix. r=grigoryk a=mhammond application-services needs to be closely involved in creating the data to be recorded in telemetry - however, we are trying to avoid exposing this telemetry directly to the android-components "concepts" or "features" - for example, the part of android-components that sends a tab ideally wouldn't have to deal with any interface changes just because telemetry is being sent or changed - otherwise new/changed telemetry is always a "breaking change" In the future, we expect the rust components to interact directly with Glean, and consumers like android-components wouldn't need to get involved at all. However, until then... The idea with this PR is that there's one new public account function, `processPendingTelemetry()`, which should be called whenever something is done which might record telemetry. However, it doesn't need to know what is actually recorded - that knowledge is only in the telemetry code - the code that is tightly bound to glean. This hasn't been tested yet - but it builds :) I'm soliciting feedback on the general shape of this before I invest too much more in tests etc. I also considered trying to do with Observers(), but that didn't seem any easier. I'm obviously open to alternative approaches! See also mozilla/application-services#3308 Co-authored-by: Mark Hammond <[email protected]>
7618: Record send-tab telemetry for Fenix. r=grigoryk a=mhammond application-services needs to be closely involved in creating the data to be recorded in telemetry - however, we are trying to avoid exposing this telemetry directly to the android-components "concepts" or "features" - for example, the part of android-components that sends a tab ideally wouldn't have to deal with any interface changes just because telemetry is being sent or changed - otherwise new/changed telemetry is always a "breaking change" In the future, we expect the rust components to interact directly with Glean, and consumers like android-components wouldn't need to get involved at all. However, until then... The idea with this PR is that there's one new public account function, `processPendingTelemetry()`, which should be called whenever something is done which might record telemetry. However, it doesn't need to know what is actually recorded - that knowledge is only in the telemetry code - the code that is tightly bound to glean. This hasn't been tested yet - but it builds :) I'm soliciting feedback on the general shape of this before I invest too much more in tests etc. I also considered trying to do with Observers(), but that didn't seem any easier. I'm obviously open to alternative approaches! See also mozilla/application-services#3308 Co-authored-by: Mark Hammond <[email protected]>
[Lina and I chatted about this and decided to collaborate - I can't work out how to change the branch in the other PR, so here we are...]
This is a WIP just to get feedback on the shape of this.
The main thing I'm trying to avoid here is to have the telemetry "pollute" the
public API - the public API should not be aware where and when we send
telemetry.
In this case, the "public API" in question is send-tab, where we can probably
suck up a public API change, but we shouldn't!
The whole "what telemetry API should we use/when can we use glean everywhere"
question is a good one, but one we aren't going resolve here. An ideal world
probably has our rust components directly using glean - in which case the
telemetry data also doesn't pollute the other public APIs - so let's try and
get a little closer to that world.
So the summuary of how this patch hangs together is:
There's a new
FxaTelemetry
struct which is a grab-bag of all the telemetrywe collect (which isn't much - even desktop today doesn't gather much). This
struct is stored in a
RefCell<>
insideFirefoxAccount
.There's a new
fxa_gather_telemetry()
function in the FFI. The idea is thatandroid-components telemetry code will call this, and translate what it
gets back into the glean calls it needs to make (ie, android-components will
also add a couple of new events to
sync-telemetry/metrics.yaml
- note thatthe send-tab code doesn't really get involved here.
Except that the send-tab code will need to tell the telemetry code that
there's probably telemetry to gather. So while it's not magic or fully
automatic, most of the responsibilities are in the right place and there's
only a few leaky abstractions.
Assuming the "big picture" of this is OK, some "small picture" issues:
I'm not sure if it's OK that
fxa_gather_telemetry()
returns a JSON string.The idea is that it should ideally be dynamic - stuff it doesn't understand
should be ignored. Eg, you can imagine a world where some things are only
understood/recorded by, say, iOS. Adding new telemetry shouldn't be a breaking
change to platforms that don't want to explicitly record it.
No tests!
Um - I'm sure I had more than that :) I'm sure you'll find some in the patch.
Pull Request checklist
automation/all_tests.sh
runs to completion and produces no failures[ci full]
to the PR title.