Skip to content

update from Gluejar#94

Open
eshellman wants to merge 1122 commits into
EbookFoundation:masterfrom
Gluejar:master
Open

update from Gluejar#94
eshellman wants to merge 1122 commits into
EbookFoundation:masterfrom
Gluejar:master

Conversation

@eshellman

Copy link
Copy Markdown

No description provided.

and don' call distinct unless needed
update tests, fix slow OPDS, optimize queryset access
rdhyee and others added 30 commits May 13, 2026 17:26
Switches the pyoai pin from infrae/pyoai (last release March 2022,
explicitly unmaintained per its own README) to our fork carrying the
RateLimitedError patch:

  EbookFoundation/pyoai @ bf709d26ae6e4b34b9b0ca726e0f032d97f0bd38
  Branch: fix/expose-429-as-rate-limited-error

The fork raises a structured RateLimitedError on HTTP 429 with
Retry-After parsed per RFC 9110, instead of leaving callers to parse
HTTPError.headers themselves. An upstream PR to eth-library/oaipmh
(the maintained fork; infrae/pyoai is no longer accepting changes)
is pending; we'll un-pin to that fork when it lands.

load_doab_oai now catches RateLimitedError first; the existing
HTTPError-with-code-429 path is kept as a runtime fallback for
deployments that revert the pin or run against stock pyoai. A
never-raised placeholder class handles the ImportError case so the
except clause is always syntactically valid.

Step C of today's DOAB rate-limit plan (after #1143/#1144 OAI
sentinel and #1147 bitstream breaker).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin pyoai to EbookFoundation fork; consume RateLimitedError
Resolve settings/dev.py conflict: keep Celery 5.x setting renames from this
branch (CELERY_BEAT_SCHEDULE, CELERY_WORKER_HIJACK_ROOT_LOGGER) plus master's
ADMINS email update; both sides drop the send_test_email schedule line.

Brings in 16 master commits since fork (DOAB 429 handling, pyoai EbookFoundation
fork pin, notarobot int-guard, SEND_TEST_EMAIL_JOB removal, admin-email rate limit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Compute the publication year range from a single evaluated edition
queryset instead of two separate (asc + desc) queries. The previous
code initialized `latest_publication` to None and set it only from the
second query; when the dated-edition set changed between the two reads
(e.g. concurrent edition loading) the second query could find no
truthy dates, leaving `latest_publication` None and crashing on
`earliest_publication + "-" + latest_publication`.

Also switch to save(update_fields=['publication_range']) so this
nominally-read property doesn't persist other (possibly stale)
in-memory Work fields.

Adds WorkTests regression coverage: mixed null/blank/valid dates,
single-year vs range, all-blank, and a query-count guard proving the
single-query shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The api `widget` view treated any non-"featured", non-10/13-char token
as a work id and called safe_get_work() without catching
Work.DoesNotExist, so /api/widget/<bad-id>/ returned 500 instead of the
existing empty-widget response. Two related defects fixed at the same
time:

- convert_10_to_13() returns None for an invalid ISBN-10, after which
  `len(isbn)` raised TypeError. Guard with `if isbn and len(isbn)==13`.
- widget.html renders "...ISBN {{isbn}}..." but the work=None paths did
  not pass `isbn`. Pass it into every render path.

Wrap safe_get_work() in try/except Work.DoesNotExist -> render empty
widget, matching the existing Identifier.DoesNotExist handling.

Adds api ApiTests regression coverage: non-numeric token, numeric
unknown id, and invalid ISBN-10 all return 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keeps the Python 'if date' guard as belt-and-suspenders: the structural
invariant (years contains only truthy strings) stays enforced locally,
independent of the queryset.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fix #1155: TypeError in Work.publication_date from two-query race
Fix #1156: widget endpoint 500s on unknown/non-numeric/invalid ids
Staging boxes restored from a prod snapshot keep prod's Site row
(domain='unglue.it'), so every emailed link (password-reset, notices,
etc.) points at prod instead of the staging box's own host.

This command updates Site.objects.get_current() (the SITE_ID row) to
the supplied domain (and optional name).  It is idempotent: if the row
already matches, it prints a no-op message and exits cleanly.

Used by the provisioning repo's post-deploy Ansible task to localise
the Site to the box's own server_name on every deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mechanical, no-meaning-change corrections to the FAQ page surfaced by the
CC+Codex copy review on #1165. Typos, a broken URL, proper-noun/acronym
fixes, subject-verb agreement, and site-name casing — nothing factual.

- "that why" → "that's why"; "an non-profit" → "a non-profit"
- "do well be selling" → "by selling"; "the the copyright" → "the copyright"
- "right holder tools" → "rights holder tools"; "some interested" → "some interest"
- broken Facebook URL "facebook/com" → "facebook.com"
- "Wikisources/Hathi Trust/Github" → "Wikisource/HathiTrust/GitHub"
- "page.You'll" → "page. You'll"; mid-sentence "Let" → "let"
- "cannot not be obtained" → "cannot be obtained"; "They does" → "They do"
- "Authors' Guild" → "Authors Guild"; CC license styling "NoDerivatives, NonCommercial"
- site-name casing unglue.it → Unglue.it; "Thanks for Ungluing" → "Thanks-for-Ungluing"

Factual/staleness/voice items (fees, payouts, sender email, campaign
retirement, etc.) are handled separately in the judgment-call PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FAQ copy: objective typo & grammar fixes (#1165)
Mechanical, no-meaning-change corrections to the FAQ page surfaced by the
CC+Codex copy review on #1165. Typos, a broken URL, proper-noun/acronym
fixes, subject-verb agreement, and site-name casing — nothing factual.

- "that why" → "that's why"; "an non-profit" → "a non-profit"
- "do well be selling" → "by selling"; "the the copyright" → "the copyright"
- "right holder tools" → "rights holder tools"; "some interested" → "some interest"
- broken Facebook URL "facebook/com" → "facebook.com"
- "Wikisources/Hathi Trust/Github" → "Wikisource/HathiTrust/GitHub"
- "page.You'll" → "page. You'll"; mid-sentence "Let" → "let"
- "cannot not be obtained" → "cannot be obtained"; "They does" → "They do"
- "Authors' Guild" → "Authors Guild"; CC license styling "NoDerivatives, NonCommercial"
- site-name casing unglue.it → Unglue.it; "Thanks for Ungluing" → "Thanks-for-Ungluing"

Factual/staleness/voice items (fees, payouts, sender email, campaign
retirement, etc.) are handled separately in the judgment-call PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 0c43233)
… not a hard fail

Codex review of #1164 fix: a fresh/scrubbed DB without a Site row for SITE_ID
would crash the post-deploy task with Site.DoesNotExist. get_or_create makes it
self-healing while staying idempotent on existing rows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add set_site_domain management command (fix #1164)
Staging boxes restored from a prod snapshot keep prod's Site row
(domain='unglue.it'), so every emailed link (password-reset, notices,
etc.) points at prod instead of the staging box's own host.

This command updates Site.objects.get_current() (the SITE_ID row) to
the supplied domain (and optional name).  It is idempotent: if the row
already matches, it prints a no-op message and exits cleanly.

Used by the provisioning repo's post-deploy Ansible task to localise
the Site to the box's own server_name on every deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… not a hard fail

Codex review of #1164 fix: a fresh/scrubbed DB without a Site row for SITE_ID
would crash the post-deploy task with Site.DoesNotExist. get_or_create makes it
self-healing while staying idempotent on existing rows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1170 (#1171)

Align master with deployed Django 4.2 (prod-green @ 751f781) — refs #1170
libraryauth defined its AppConfig (with ready() -> `from . import signals`) in
__init__.py and relied on `default_app_config`. Django 4.1 REMOVED
default_app_config, and an AppConfig in __init__.py is not auto-discovered
(Django only scans <app>/apps.py). So since the 2026-06-17 Django 4.2 cutover,
LibraryAuthConfig.ready() never ran, signals.py was never imported, and the
`@receiver(user_activated) handle_same_email_account` (same-email account dedup
on registration activation) was silently disconnected in production.

Fix: move LibraryAuthConfig to libraryauth/apps.py (auto-discovered) and drop the
dead default_app_config. Backward-compatible (valid on 4.2 and 5.2).

Proven empirically (minimal repro): with the config in __init__.py + no apps.py,
ready() does NOT fire on either Django 4.2.21 or 5.2.15; adding apps.py restores
it on both.

Scope: swept all first-party apps — only `core` and `libraryauth` define ready();
core already has apps.py (fine). libraryauth was the sole casualty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ
Per Codex review of #1176: assert LibraryAuthConfig is the active app config
(so ready() runs) and that handle_same_email_account is connected to the
user_activated signal. Guards against the config drifting back out of apps.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ
Fix libraryauth signals: move AppConfig to apps.py so ready() fires
recommended_user (frontend/views/__init__.py:637) is a QuerySet
(User.objects.filter(...)). Passing it to the exact related lookup
wishlists__user=recommended_user raised, since Django 4.x:
  ValueError: The QuerySet value for an exact lookup must be limited to
  one result using slicing.
Django 1.11 tolerated a QuerySet here, so /lists/recommended has returned
HTTP 500 since the 1.11->4.2 cutover (2026-06-17).

Fix: wishlists__user__in=recommended_user. Behavior-preserving for the
intended single 'unglueit' user, and degrades gracefully (empty result,
not 500) if that user is absent. Valid on both Django 4.2 and 5.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ
…et-lookup

Fix /lists/recommended 500: use __in for QuerySet-valued lookup (fixes #1179)
…lean_email in django-registration 3.x)

RegistrationFormNoDisposableEmail.clean_email called
super().clean_email(), but django-registration 3.x removed clean_email
from RegistrationForm/RegistrationFormUniqueEmail (unique-email check is
now a field validator added in __init__). So every POST to
/accounts/register/ raised:
  AttributeError: 'super' object has no attribute 'clean_email'
Registration has been fully broken since the 1.11->4.2 cutover.

Fix: read self.cleaned_data['email'] directly. Django's _clean_fields
populates cleaned_data[name] (running field validators, incl. the unique
and confusable-email checks) BEFORE calling clean_<name>, so the disposable
check still runs on the already-validated value. Behavior-preserving;
valid on django-registration 3.4 under both Django 4.2 and 5.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ
…-super

Fix /accounts/register/ 500: read cleaned_data['email'] (django-registration 3.x) — fixes #1182
…1185)

Acq/Campaign/UserProfile .objects.get(<int>) raised
'TypeError: cannot unpack non-iterable int object' (verified live) on every
call, and the except DoesNotExist could not catch it. These tasks are actively
invoked, so the features failed silently:
- watermark_acq  -> ebook watermarking on borrow/acquire
- process_ebfs   -> rights-holder ebook processing
- ml_subscribe_task -> mailing-list subscribe

Fix: .get(id=...). Pre-existing (not cutover-specific); surfaced by the
post-cutover sweep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ
…(refs #1185)

A1 MsgForm.full_clean: (1) used bare ValidationError, not imported in this module
-> NameError -> 500; AND (2) it raised from an overridden full_clean(), which
propagates out of is_valid() as a 500 even with a proper ValidationError (verified
empirically). Fixed by using self.add_error(None, ...) so the form is marked
invalid cleanly, and by catching ValueError/TypeError so a non-numeric supporter/
work id from POST doesn't crash the int lookup. Triggers on the 'message a
supporter' POST (frontend/views:1722) with missing/invalid supporter or work.

A2 EbookForm.set_provider: read self.cleaned_data['url'] unconditionally; when
clean_url() raises (e.g. duplicate URL) that key is removed -> KeyError -> 500 on
ebook add/edit with a duplicate URL. Use .get('url') and skip provider inference
when url is absent so the field error is reported normally.

Behavior-preserving for valid input; valid on Django 4.2 and 5.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ
core/tasks.py: positional .get() -> get(id=...) (3 tasks) — refs #1185
frontend/forms: stop 500s on invalid input (MsgForm, EbookForm) — refs #1185
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants