Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minifront: retry status stream #2080

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

turbocrime
Copy link
Collaborator

@turbocrime turbocrime commented Feb 25, 2025

Description of Changes

Retry the status stream in minifront, when it fails due to #2074

zquery doesn't really have a stream-retry this feature, so state management in the slice and in the zquery stream callbacks will collect the stream status for interested components, and schedule re-fetch attempts.

Checklist Before Requesting Review

  • I have ensured that any relevant minifront changes do not cause the existing extension to break.

@turbocrime turbocrime requested a review from TalDerei February 25, 2025 01:28
Copy link

changeset-bot bot commented Feb 25, 2025

🦋 Changeset detected

Latest commit: 2da7a35

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
minifront Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from ff631c6 to 6205c07 Compare February 25, 2025 18:03
@turbocrime turbocrime marked this pull request as ready for review February 25, 2025 18:03
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch 3 times, most recently from a218fb9 to 0e136ec Compare February 25, 2025 20:00
@turbocrime turbocrime changed the title minifront retry status stream minifront: retry status stream Feb 25, 2025
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from 41b4184 to b46e852 Compare February 25, 2025 20:20
@turbocrime turbocrime force-pushed the transport-dom-stream-timeout branch from 155418d to fa86717 Compare February 25, 2025 22:07
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from b46e852 to daecbfa Compare February 25, 2025 22:09
@turbocrime turbocrime force-pushed the transport-dom-stream-timeout branch from fa86717 to d68e5fe Compare February 26, 2025 02:33
@turbocrime turbocrime force-pushed the transport-dom-stream-timeout branch 2 times, most recently from 98cdcea to a024bb1 Compare February 26, 2025 03:27
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from daecbfa to f95bbbe Compare February 26, 2025 03:27
@turbocrime turbocrime force-pushed the transport-dom-stream-timeout branch from a024bb1 to 4fd6ac3 Compare February 26, 2025 03:58
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from f95bbbe to 6e16b5a Compare February 26, 2025 03:58
@turbocrime turbocrime force-pushed the transport-dom-stream-timeout branch from 4fd6ac3 to de77c7c Compare February 26, 2025 04:19
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from 6e16b5a to f7fff02 Compare February 26, 2025 04:22
@turbocrime turbocrime force-pushed the transport-dom-stream-timeout branch from de77c7c to 1952085 Compare February 26, 2025 05:14
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from f7fff02 to f358734 Compare February 26, 2025 05:16
Base automatically changed from transport-dom-stream-timeout to main February 26, 2025 05:26
@turbocrime turbocrime force-pushed the minifront-retry-status-stream branch from f358734 to 2da7a35 Compare February 26, 2025 05:29
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial status is now called independently of the status stream, to disambiguate initial (local) status returns and remote status returns.

): AsyncGenerator<PlainMessage<StatusStreamResponse>> {
for await (const item of penumbra
.service(ViewService)
.statusStream({}, { timeoutMs: 15_000, ...opt })) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a timeout of 15 seconds is applied, unless the caller overrides with a different value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: can we add a comment here to the effect of "the timeout specifies how long the client will wait for a response before considering the streaming request failed"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dialog is significantly modified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: the dialogue feels too cramped – can we revert to the previous dialogue (with the sync percentage) while keeping the message "Streaming new blocks to update private state"?


before

Screenshot 2025-02-27 at 12 33 09 PM

after

Screenshot 2025-02-27 at 12 33 15 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts? cc @smmhrdmn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, no aesthetic preference here, i was primarily seeking to have the displayed status reflect the new state information

Copy link
Contributor

@TalDerei TalDerei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticeable zquery improvements! I’ve left some initial observations

const [chainId, setChainId] = useState<string | undefined>();
const initialStatus = useInitialStatus();
const status = useStatus();
const { error: streamError } = useStore(statusStreamStateSelector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: we should abstract these stream stalled and signal timeout errors from surfacing in the modal.


Untitled.mov

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's important to display some kind of failure to the user, when the request has failed.

if you want to conceal the failure message, can you suggest a replacement?

should different failures use different replacements? this quickly gets complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a more general "connection issue detected" message, rather than anything specific to streams?

Comment on lines +33 to +40
if (streamError) {
setDialogText('Retrying...');
} else if (!initialStatus.data) {
setDialogText('Querying local block height...');
} else if (!status.data) {
setDialogText('Fetching remote block height...');
} else if (!isSynced) {
setDialogText('Streaming new blocks to update private state...');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: It switches quickly between "querying local block height" and "fetching remote block height", so I think we should synthesize the messaging for these states rather than distinguishing between them.


Screen.Recording.2025-02-25.at.5.53.11.PM.mov

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are distinct states.

the cases you've tested involve a quick transition, and it is often a quick transition, but the lack of distinction between these states is at the root of the issue in #2081.

specifically, the 'Querying local block height' state is the 'initial status' unary request that will pend and possibly time out if the extension is connected, but taking a long time to respond to RPC queries.

i don't think making these states ambiguous is a good idea. there is no downside to making these states distinct. the upside is that users are more informed of information that is highly useful for issue reports and investigation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm disinclined to make this change. please suggest a different dialog message if you think it could be better

{dialogText}
{!!streamError && (
<Text technical color={theme => theme.caution.main} as='div'>
{streamError instanceof Error ? streamError.message : String(streamError)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: clearing the cache surfaces some interesting errors


Screen.Recording.2025-02-26.at.6.18.15.PM.mov

Copy link
Collaborator Author

@turbocrime turbocrime Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearing the cache through this menu will restart the extension service worker. once the extension service worker is restarted, the page content scripts lose context and it's not possible to recover from that without a reload.

ideally this loss of context would be transmitted to the page, but the data clone error you see is the #2070 issue resolved by #2071 so unless your prax build in this review included those changes, that is expected to be present as a data clone error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this should be addressed with a more comprehensive attempt at addressing context loss.

i would like to consider this case out of scope for this PR, because no part of minifront handles this case well, and it's a significant effort to address, with many considerations.

fullSyncHeight?: bigint;
latestKnownBlockHeight?: bigint;
// Time in milliseconds to wait before attempting to reconnect the status stream
const RECONNECT_TIMEOUT = 5_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: do we wants a fixed connection timeout rather than an exponential backoff? I think the former may be more preferable here as implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempting an exponential backoff implementation here would become significantly more complex, due to the fact that the query is not a single awaited promise over which we have direct control, but a zquery stream.

i agree backoff would be better, but this PR is essentially a spike to land a single case of stream re-request without getting deep into zquery. any more than this becomes significant zquery work.

i would like to consider new zquery features to be out of scope for this PR.

these are low-cost queries, so i don't think it's a big deal.

): AsyncGenerator<PlainMessage<StatusStreamResponse>> {
for await (const item of penumbra
.service(ViewService)
.statusStream({}, { timeoutMs: 15_000, ...opt })) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: can we add a comment here to the effect of "the timeout specifies how long the client will wait for a response before considering the streaming request failed"?

});
},

scheduleRefetch: () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: potential race condition here? scheduleRefetch checks if streamState.running is false before attempting reconnection, however what if the stream status changes between when the timeout is set and when it executes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the stream is running when the timeout executes, a reconnect is not necessary.

...item,
};
},
onError: (prevData, error) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: what's the difference here between onError versus onEnd in terms of reconnections?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to zquery documentation, onEnd executes on every stream end. onError executes before onEnd, if there is an error from the fetch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another method, onAbort, which i investigated but it does not seem to be involved in the cases we care about here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants