-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minifront: retry status stream #2080
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 2da7a35 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
ff631c6
to
6205c07
Compare
a218fb9
to
0e136ec
Compare
41b4184
to
b46e852
Compare
155418d
to
fa86717
Compare
b46e852
to
daecbfa
Compare
fa86717
to
d68e5fe
Compare
98cdcea
to
a024bb1
Compare
daecbfa
to
f95bbbe
Compare
a024bb1
to
4fd6ac3
Compare
f95bbbe
to
6e16b5a
Compare
4fd6ac3
to
de77c7c
Compare
6e16b5a
to
f7fff02
Compare
de77c7c
to
1952085
Compare
f7fff02
to
f358734
Compare
f358734
to
2da7a35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial status is now called independently of the status stream, to disambiguate initial (local) status returns and remote status returns.
): AsyncGenerator<PlainMessage<StatusStreamResponse>> { | ||
for await (const item of penumbra | ||
.service(ViewService) | ||
.statusStream({}, { timeoutMs: 15_000, ...opt })) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a timeout of 15 seconds is applied, unless the caller overrides with a different value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: can we add a comment here to the effect of "the timeout specifies how long the client will wait for a response before considering the streaming request failed"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dialog is significantly modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts? cc @smmhrdmn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, no aesthetic preference here, i was primarily seeking to have the displayed status reflect the new state information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticeable zquery improvements! I’ve left some initial observations
const [chainId, setChainId] = useState<string | undefined>(); | ||
const initialStatus = useInitialStatus(); | ||
const status = useStatus(); | ||
const { error: streamError } = useStore(statusStreamStateSelector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: we should abstract these stream stalled and signal timeout errors from surfacing in the modal.
Untitled.mov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's important to display some kind of failure to the user, when the request has failed.
if you want to conceal the failure message, can you suggest a replacement?
should different failures use different replacements? this quickly gets complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a more general "connection issue detected" message, rather than anything specific to streams?
if (streamError) { | ||
setDialogText('Retrying...'); | ||
} else if (!initialStatus.data) { | ||
setDialogText('Querying local block height...'); | ||
} else if (!status.data) { | ||
setDialogText('Fetching remote block height...'); | ||
} else if (!isSynced) { | ||
setDialogText('Streaming new blocks to update private state...'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: It switches quickly between "querying local block height" and "fetching remote block height", so I think we should synthesize the messaging for these states rather than distinguishing between them.
Screen.Recording.2025-02-25.at.5.53.11.PM.mov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are distinct states.
the cases you've tested involve a quick transition, and it is often a quick transition, but the lack of distinction between these states is at the root of the issue in #2081.
specifically, the 'Querying local block height' state is the 'initial status' unary request that will pend and possibly time out if the extension is connected, but taking a long time to respond to RPC queries.
i don't think making these states ambiguous is a good idea. there is no downside to making these states distinct. the upside is that users are more informed of information that is highly useful for issue reports and investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm disinclined to make this change. please suggest a different dialog message if you think it could be better
{dialogText} | ||
{!!streamError && ( | ||
<Text technical color={theme => theme.caution.main} as='div'> | ||
{streamError instanceof Error ? streamError.message : String(streamError)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: clearing the cache surfaces some interesting errors
Screen.Recording.2025-02-26.at.6.18.15.PM.mov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearing the cache through this menu will restart the extension service worker. once the extension service worker is restarted, the page content scripts lose context and it's not possible to recover from that without a reload.
ideally this loss of context would be transmitted to the page, but the data clone error you see is the #2070 issue resolved by #2071 so unless your prax build in this review included those changes, that is expected to be present as a data clone error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually this should be addressed with a more comprehensive attempt at addressing context loss.
i would like to consider this case out of scope for this PR, because no part of minifront handles this case well, and it's a significant effort to address, with many considerations.
fullSyncHeight?: bigint; | ||
latestKnownBlockHeight?: bigint; | ||
// Time in milliseconds to wait before attempting to reconnect the status stream | ||
const RECONNECT_TIMEOUT = 5_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: do we wants a fixed connection timeout rather than an exponential backoff? I think the former may be more preferable here as implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attempting an exponential backoff implementation here would become significantly more complex, due to the fact that the query is not a single awaited promise over which we have direct control, but a zquery stream.
i agree backoff would be better, but this PR is essentially a spike to land a single case of stream re-request without getting deep into zquery. any more than this becomes significant zquery work.
i would like to consider new zquery features to be out of scope for this PR.
these are low-cost queries, so i don't think it's a big deal.
): AsyncGenerator<PlainMessage<StatusStreamResponse>> { | ||
for await (const item of penumbra | ||
.service(ViewService) | ||
.statusStream({}, { timeoutMs: 15_000, ...opt })) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: can we add a comment here to the effect of "the timeout specifies how long the client will wait for a response before considering the streaming request failed"?
}); | ||
}, | ||
|
||
scheduleRefetch: () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: potential race condition here? scheduleRefetch
checks if streamState.running
is false before attempting reconnection, however what if the stream status changes between when the timeout is set and when it executes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the stream is running when the timeout executes, a reconnect is not necessary.
...item, | ||
}; | ||
}, | ||
onError: (prevData, error) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: what's the difference here between onError
versus onEnd
in terms of reconnections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to zquery documentation, onEnd
executes on every stream end. onError
executes before onEnd
, if there is an error from the fetch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is another method, onAbort
, which i investigated but it does not seem to be involved in the cases we care about here.
Description of Changes
Retry the status stream in minifront, when it fails due to #2074
zquery doesn't really have a stream-retry this feature, so state management in the slice and in the zquery stream callbacks will collect the stream status for interested components, and schedule re-fetch attempts.
Checklist Before Requesting Review