Skip to content

fix: reject non-200 download responses#9085

Open
VectorPeak wants to merge 2 commits into
AstrBotDevs:masterfrom
VectorPeak:fix/download-file-http-errors
Open

fix: reject non-200 download responses#9085
VectorPeak wants to merge 2 commits into
AstrBotDevs:masterfrom
VectorPeak:fix/download-file-http-errors

Conversation

@VectorPeak

@VectorPeak VectorPeak commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

This PR fixes download_file() treating unsuccessful HTTP responses as successful local downloads.

Modifications / 改动点

download_file() is a shared helper for downloading a remote response into a caller-provided local path. Before this PR, the helper noticed non-200 HTTP responses but did not stop the write path. The old control flow was effectively:

aiohttp session.get(url)
  -> receive HTTP response
  -> if status != 200: log an error
  -> still open(path, "wb")
  -> stream resp.content into the target file
  -> return normally to the caller

That means a 404, 403, 500, or other error response could be persisted as a successful downloaded file. The caller would then continue with a local path that exists, but whose bytes are actually an error page/body rather than the requested file.

Changes

This change moves the status check in front of the file write in both download branches:

  • normal TLS download path: session.get(url, timeout=1800) now calls _raise_for_download_status(resp, url) before open(path, "wb")
  • insecure SSL fallback path: session.get(url, ssl=ssl_context, timeout=120) uses the same status check before opening the target file
  • successful 200 responses continue through the shared _download_response_to_file() writer, preserving chunked writes, progress output, and progress callbacks
  • non-200 responses raise DownloadFileHTTPError, a RuntimeError subclass, so existing RuntimeError handling remains compatible while callers can still distinguish HTTP download failures if needed

Successful downloads keep the existing behavior. The changed behavior is only that failed HTTP responses are rejected before they can create or overwrite the destination file.

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Evidence

Focused reproduction covered by test_download_file_rejects_non_200_response:

Fake HTTP response:
- status: 404
- body: not found
- destination: missing.bin

Before:
- an error was logged
- missing.bin could still be created
- missing.bin contained: not found
- the caller saw a normal return path

After:
- the non-200 response raises a download failure
- the failed response body is not written as a successful downloaded file

Successful download behavior is covered by test_download_file_writes_successful_response. The SSL fallback rejection path is covered by test_download_file_rejects_non_200_response_after_ssl_fallback.

Validation run locally after addressing review comments:

uv run pytest tests/unit/test_io_download_file.py -q
3 passed in 0.67s

uv run pytest tests/unit/test_file_message_component.py -q
1 passed in 0.73s

uv run ruff check astrbot/core/utils/io.py tests/unit/test_io_download_file.py
All checks passed!

uv run ruff format --check astrbot/core/utils/io.py tests/unit/test_io_download_file.py
2 files already formatted

git diff --check
passed

Local focused pytest evidence:

Focused pytest evidence for PR 9085

Additional related check attempted before the review updates:

uv run pytest tests/unit/test_file_message_component.py tests/test_media_utils.py -q

This command failed in an existing Windows file URI assertion unrelated to this change:

tests/test_media_utils.py::test_file_uri_to_path_supports_localhost_and_encoded_paths
expected: C:\Users\...\voice note.wav
actual  : \\localhostC:\Users\...\voice note.wav

Affected call chain / impact

download_file() sits below several user-facing and maintenance flows. Representative callers include:

File message component with remote URL
  -> astrbot/core/message/components.py::File._download_file(...)
  -> download_file(self.url, temp_file_path)
  -> _raise_for_download_status(resp, url)
  -> _download_response_to_file(...) only for HTTP 200

Remote media resolution
  -> astrbot/core/utils/media_utils.py::resolve_media_source(...)
  -> download_file(media_ref, target_path)
  -> media parsing / MIME detection uses the downloaded bytes

Platform attachment downloads
  -> DingTalk / Telegram adapter receives a platform file URL
  -> download_file(download_url or file.file_path, temp_path)
  -> downstream message handling reads the local file

Core/plugin update downloads
  -> updater selects a release or plugin archive URL
  -> _download_file(...)
  -> download_file(url, archive_path)
  -> unzip / validation uses the downloaded archive

Before this fix, any of those paths could receive a real local file path after an HTTP error response and fail later while parsing, registering, converting, or unzipping the error body. After this fix, the failure stays at the download boundary: non-200 responses raise before the destination file is opened, while successful 200 responses keep the same write and progress behavior.

Boundary note: this PR intentionally keeps the behavior change inside download_file(). It only makes that shared helper reject non-200 HTTP responses before opening or writing the destination file, including the SSL fallback path. It does not change callers, caller retry/error handling, media/file message flows, updater logic, or other download/media helpers; those paths are mentioned only to explain the impact of fixing the shared download boundary.

download_image_by_url() is a sibling download helper in astrbot/core/utils/io.py, but it is not changed here. Any equivalent status-handling change for that image-specific helper should be evaluated separately with its own GET/POST behavior and callers.


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / No new feature is added; this is a bug fix.

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / Focused pytest, related file message component test, Ruff check, Ruff format check, and git diff --check were run locally after addressing review comments. One broader related media test command was also attempted and the unrelated failure is documented above.

  • 🧐 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / No new dependencies are introduced.

  • 😇 My changes do not introduce malicious code.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels Jun 30, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds HTTP status checks to raise a RuntimeError when downloading files fails (non-200 status codes) in both the standard and SSL-disabled fallback paths of download_file. It also introduces unit tests to verify this behavior. The reviewer suggests refactoring the duplicated response processing and file-writing logic between the two download paths into a shared helper function to improve maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread astrbot/core/utils/io.py Outdated

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The non-200 handling logic is duplicated in both the normal TLS and insecure SSL branches of download_file; consider extracting a small helper or shared block to keep behavior in sync and reduce future drift.
  • Raising a bare RuntimeError for download failures makes it harder for callers to distinguish this case from other runtime issues; consider introducing or reusing a more specific exception type for HTTP download errors.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The non-200 handling logic is duplicated in both the normal TLS and insecure SSL branches of `download_file`; consider extracting a small helper or shared block to keep behavior in sync and reduce future drift.
- Raising a bare `RuntimeError` for download failures makes it harder for callers to distinguish this case from other runtime issues; consider introducing or reusing a more specific exception type for HTTP download errors.

## Individual Comments

### Comment 1
<location path="tests/unit/test_io_download_file.py" line_range="60-63" />
<code_context>
+    )
+
+
+@pytest.mark.asyncio
+async def test_download_file_rejects_non_200_response(monkeypatch, tmp_path):
+    target_path = tmp_path / "missing.bin"
+    _patch_download_session(
+        monkeypatch,
+        _FakeResponse(status=404, chunks=[b"not found"]),
+    )
+
+    with pytest.raises(RuntimeError, match="HTTP status code: 404"):
+        await io.download_file("https://example.test/missing", str(target_path))
+
+    assert not target_path.exists()
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that exercises the `insecure_ssl=True` branch to ensure non-200 responses are rejected consistently there as well.

Right now we only exercise the default `insecure_ssl=False` path. Since this change is meant to apply the same non-200 rejection to the insecure SSL fallback, please add a test that calls `download_file(..., insecure_ssl=True)` and asserts that a non-200 (e.g., 404) raises `RuntimeError` and does not create the destination file, using the branch that configures `ssl.create_default_context()` with `ssl.CERT_NONE`. You can reuse the existing fake response/session setup.

```suggestion
    with pytest.raises(RuntimeError, match="HTTP status code: 404"):
        await io.download_file("https://example.test/missing", str(target_path))

    assert not target_path.exists()


@pytest.mark.asyncio
async def test_download_file_rejects_non_200_response_insecure_ssl(monkeypatch, tmp_path):
    target_path = tmp_path / "missing.bin"
    _patch_download_session(
        monkeypatch,
        _FakeResponse(status=404, chunks=[b"not found"]),
    )

    with pytest.raises(RuntimeError, match="HTTP status code: 404"):
        await io.download_file(
            "https://example.test/missing",
            str(target_path),
            insecure_ssl=True,
        )

    assert not target_path.exists()
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread tests/unit/test_io_download_file.py Outdated
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jun 30, 2026
Comment thread astrbot/core/utils/io.py Fixed
@VectorPeak VectorPeak force-pushed the fix/download-file-http-errors branch from 317d29e to 403bbb6 Compare June 30, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants