Skip to content

Make --progress and --checkpoint strictly by-statement#1753

Open
rolandwalker wants to merge 1 commit intomainfrom
RW/progress-checkpoint-by-statement
Open

Make --progress and --checkpoint strictly by-statement#1753
rolandwalker wants to merge 1 commit intomainfrom
RW/progress-checkpoint-by-statement

Conversation

@rolandwalker
Copy link
Copy Markdown
Contributor

Description

Previously --progress and --checkpoint were influenced by linebreaks to some extent: multiline queries were correctly joined and counted/dispatched/checkpointed as one query, but multiple queries on a single line were dispatched together.

That means that the progress estimation could be thrown off somewhat, depending on the file contents, and more importantly means that a statement which was part of line with more than one statement might fail to be written to the line-influenced checkpoint file if that particular query succeeded, but a subsequent query on the same line failed.

This subtlety is important if we are to use the checkpoint file to resume scripts, though in general it would be best when running scripts to avoid all of these corner cases by having one statement per line.

We pull in sqlparse in addition to sqlglot, because sqlparse has the feature of preserving the input literally when splitting multi-statement lines.

This also fixes a bug: the generator here named batch_gen was recreated in the --progress loop, which didn't matter before this change since iterating over a filehandle covered up the issue.

Tests are added for statements_from_filehandle(), which had no coverage before.

Incidentally

  • fix missing changelog entry
  • fix whitespace in a comment
  • remove a backslash by double-quoting a string which contains a single quote

Checklist

  • I added this contribution to the changelog.md file.
  • I added my name to the AUTHORS file (or it's already there).
  • To lint and format the code, I ran
    uv run ruff check && uv run ruff format && uv run mypy --install-types .

@rolandwalker rolandwalker self-assigned this Mar 28, 2026
@github-actions
Copy link
Copy Markdown

No correctness or security issues stood out in the PR-scoped changes.

Residual risk / test gap:

  1. I could not run the targeted tests locally because uv is not available in this environment (uv: command not found), so this review is static-only.
  2. Coverage is improved for statement splitting and --progress, but there is still no end-to-end main.py test for --batch (without --progress) with multiple statements on one line; adding one would better guard regression in the non-progress path.

Previously --progress and --checkpoint were influenced by linebreaks to
some extent: multiline queries were correctly joined and counted/
dispatched/checkpointed as one query, but multiple queries on a single
line were dispatched together.

That means that the progress estimation could be thrown off somewhat,
depending on the file contents, and more importantly means that a
statement which was part of line with more than one statement might fail
to be written to the line-influenced checkpoint file if that particular
query succeeded, but a subsequent query on the same line failed.

This subtlety is important if we are to use the checkpoint file to
resume scripts, though in general it would be best when running scripts
to avoid all of these corner cases by having one statement per line.

We pull in sqlparse in addition to sqlglot, because sqlparse has the
feature of preserving the input literally when splitting multi-statement
lines.

This also fixes a bug: the generator here named batch_gen was recreated
in the --progress loop, which didn't matter before this change since
iterating over a filehandle covered up the issue.

Tests are added for statements_from_filehandle(), which had no coverage
before.

Incidentally

 * fix missing changelog entry
 * fix whitespace in a comment
 * remove a backslash by double-quoting a string which contains a
   single quote
@rolandwalker rolandwalker force-pushed the RW/progress-checkpoint-by-statement branch from 0ef70d1 to a0c50b8 Compare March 28, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants