Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codex crashes when stream is closed during download #986

Closed
benbierens opened this issue Nov 4, 2024 · 4 comments · Fixed by #1151
Closed

Codex crashes when stream is closed during download #986

benbierens opened this issue Nov 4, 2024 · 4 comments · Fixed by #1151
Assignees
Labels
bug Something isn't working Client See https://miro.com/app/board/uXjVNZ03E-c=/ for details

Comments

@benbierens
Copy link
Contributor

Seen in SwarmTest[20,20] (https://github.com/codex-storage/cs-codex-dist-tests/blob/master/Tests/CodexTests/DownloadConnectivityTests/SwarmTests.cs) on v0.1.8

Modification: Logging for nodes was turned up:

            var nodes = StartCodex(numNodes, s => s.WithLogLevel(CodexPlugin.CodexLogLevel.Trace,
                new CodexPlugin.CodexLogCustomTopics(CodexPlugin.CodexLogLevel.Warn, CodexPlugin.CodexLogLevel.Trace)));

Test runs 20 nodes with a 20MB file.
15 nodes successfully download the file.
4 nodes crash with the following error:

WRN 2024-11-04 08:52:59.762+00:00 Excepting streaming blocks                 topics="codex restapi" tid=1 exc="Unable to send response data, reason: Stream finished or remote side dropped connection" count=646927
INF 2024-11-04 08:52:59.762+00:00 Sent bytes                                 topics="codex restapi" tid=1 cid=zDv*Q223mw bytes=7667712 count=646928
ERR 2024-11-04 08:52:59.762+00:00 Unhandled exception in async proc, aborting topics="codex" tid=1 msg="/src/vendor/nim-chronos/chronos/apps/http/httpserver.nim(1422, 15) `currentState`gensym835 == HttpResponseState.Empty` Response body was already sent [Failed]" count=646929

It's possible but not likely this is caused by the download stream being interrupted. In either case the node should not crash.

@gmega
Copy link
Member

gmega commented Feb 19, 2025

I have the same problem. I was indeed closing the download stream due to a misconfigured socket read timeout, and this indeed causes Codex to crash pretty consistently.

@gmega gmega changed the title Crash - httpserver can't send response Codex crashes when stream is closed during download Feb 19, 2025
@gmega gmega added bug Something isn't working Client See https://miro.com/app/board/uXjVNZ03E-c=/ for details labels Feb 19, 2025
@2-towns
Copy link
Contributor

2-towns commented Feb 19, 2025

That could be related with the error Connection was closed before full request has been made in CI: https://github.com/codex-storage/nim-codex/actions/runs/13388233002/job/37389790340.

@2-towns 2-towns pinned this issue Feb 19, 2025
@gmega gmega self-assigned this Feb 21, 2025
@gmega
Copy link
Member

gmega commented Mar 6, 2025

OK I think I know why this is happening, will get PR soon.

@gmega gmega marked this as a duplicate of #1149 Mar 7, 2025
@gmega
Copy link
Member

gmega commented Mar 7, 2025

OK, looks like this might be a bug in presto, or in the way we're using the API. Will need to dig deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Client See https://miro.com/app/board/uXjVNZ03E-c=/ for details
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants