update from Gluejar by eshellman · Pull Request #94 · EbookFoundation/regluit

eshellman · 2018-11-03T20:50:14Z

No description provided.

2023 final

add liege

update handling of DOAB coers

fix all the null doab covers!

and don' call distinct unless needed

update tests, fix slow OPDS, optimize queryset access

muse, ubiquity hosts

tecnum, update de gruyter

springer, sciello and cmp

Maintenance 2024

Switches the pyoai pin from infrae/pyoai (last release March 2022, explicitly unmaintained per its own README) to our fork carrying the RateLimitedError patch: EbookFoundation/pyoai @ bf709d26ae6e4b34b9b0ca726e0f032d97f0bd38 Branch: fix/expose-429-as-rate-limited-error The fork raises a structured RateLimitedError on HTTP 429 with Retry-After parsed per RFC 9110, instead of leaving callers to parse HTTPError.headers themselves. An upstream PR to eth-library/oaipmh (the maintained fork; infrae/pyoai is no longer accepting changes) is pending; we'll un-pin to that fork when it lands. load_doab_oai now catches RateLimitedError first; the existing HTTPError-with-code-429 path is kept as a runtime fallback for deployments that revert the pin or run against stock pyoai. A never-raised placeholder class handles the ImportError case so the except clause is always syntactically valid. Step C of today's DOAB rate-limit plan (after #1143/#1144 OAI sentinel and #1147 bitstream breaker). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pin pyoai to EbookFoundation fork; consume RateLimitedError

Resolve settings/dev.py conflict: keep Celery 5.x setting renames from this branch (CELERY_BEAT_SCHEDULE, CELERY_WORKER_HIJACK_ROOT_LOGGER) plus master's ADMINS email update; both sides drop the send_test_email schedule line. Brings in 16 master commits since fork (DOAB 429 handling, pyoai EbookFoundation fork pin, notarobot int-guard, SEND_TEST_EMAIL_JOB removal, admin-email rate limit). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Compute the publication year range from a single evaluated edition queryset instead of two separate (asc + desc) queries. The previous code initialized `latest_publication` to None and set it only from the second query; when the dated-edition set changed between the two reads (e.g. concurrent edition loading) the second query could find no truthy dates, leaving `latest_publication` None and crashing on `earliest_publication + "-" + latest_publication`. Also switch to save(update_fields=['publication_range']) so this nominally-read property doesn't persist other (possibly stale) in-memory Work fields. Adds WorkTests regression coverage: mixed null/blank/valid dates, single-year vs range, all-blank, and a query-count guard proving the single-query shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The api `widget` view treated any non-"featured", non-10/13-char token as a work id and called safe_get_work() without catching Work.DoesNotExist, so /api/widget/<bad-id>/ returned 500 instead of the existing empty-widget response. Two related defects fixed at the same time: - convert_10_to_13() returns None for an invalid ISBN-10, after which `len(isbn)` raised TypeError. Guard with `if isbn and len(isbn)==13`. - widget.html renders "...ISBN {{isbn}}..." but the work=None paths did not pass `isbn`. Pass it into every render path. Wrap safe_get_work() in try/except Work.DoesNotExist -> render empty widget, matching the existing Identifier.DoesNotExist handling. Adds api ApiTests regression coverage: non-numeric token, numeric unknown id, and invalid ISBN-10 all return 200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Keeps the Python 'if date' guard as belt-and-suspenders: the structural invariant (years contains only truthy strings) stays enforced locally, independent of the queryset. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…into dj42.unglue.it

…to dj42.unglue.it

Fix #1155: TypeError in Work.publication_date from two-query race

Fix #1156: widget endpoint 500s on unknown/non-numeric/invalid ids

Staging boxes restored from a prod snapshot keep prod's Site row (domain='unglue.it'), so every emailed link (password-reset, notices, etc.) points at prod instead of the staging box's own host. This command updates Site.objects.get_current() (the SITE_ID row) to the supplied domain (and optional name). It is idempotent: if the row already matches, it prints a no-op message and exits cleanly. Used by the provisioning repo's post-deploy Ansible task to localise the Site to the box's own server_name on every deploy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mechanical, no-meaning-change corrections to the FAQ page surfaced by the CC+Codex copy review on #1165. Typos, a broken URL, proper-noun/acronym fixes, subject-verb agreement, and site-name casing — nothing factual. - "that why" → "that's why"; "an non-profit" → "a non-profit" - "do well be selling" → "by selling"; "the the copyright" → "the copyright" - "right holder tools" → "rights holder tools"; "some interested" → "some interest" - broken Facebook URL "facebook/com" → "facebook.com" - "Wikisources/Hathi Trust/Github" → "Wikisource/HathiTrust/GitHub" - "page.You'll" → "page. You'll"; mid-sentence "Let" → "let" - "cannot not be obtained" → "cannot be obtained"; "They does" → "They do" - "Authors' Guild" → "Authors Guild"; CC license styling "NoDerivatives, NonCommercial" - site-name casing unglue.it → Unglue.it; "Thanks for Ungluing" → "Thanks-for-Ungluing" Factual/staleness/voice items (fees, payouts, sender email, campaign retirement, etc.) are handled separately in the judgment-call PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

FAQ copy: objective typo & grammar fixes (#1165)

Mechanical, no-meaning-change corrections to the FAQ page surfaced by the CC+Codex copy review on #1165. Typos, a broken URL, proper-noun/acronym fixes, subject-verb agreement, and site-name casing — nothing factual. - "that why" → "that's why"; "an non-profit" → "a non-profit" - "do well be selling" → "by selling"; "the the copyright" → "the copyright" - "right holder tools" → "rights holder tools"; "some interested" → "some interest" - broken Facebook URL "facebook/com" → "facebook.com" - "Wikisources/Hathi Trust/Github" → "Wikisource/HathiTrust/GitHub" - "page.You'll" → "page. You'll"; mid-sentence "Let" → "let" - "cannot not be obtained" → "cannot be obtained"; "They does" → "They do" - "Authors' Guild" → "Authors Guild"; CC license styling "NoDerivatives, NonCommercial" - site-name casing unglue.it → Unglue.it; "Thanks for Ungluing" → "Thanks-for-Ungluing" Factual/staleness/voice items (fees, payouts, sender email, campaign retirement, etc.) are handled separately in the judgment-call PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 0c43233)

… not a hard fail Codex review of #1164 fix: a fresh/scrubbed DB without a Site row for SITE_ID would crash the post-deploy task with Site.DoesNotExist. get_or_create makes it self-healing while staying idempotent on existing rows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add set_site_domain management command (fix #1164)

Staging boxes restored from a prod snapshot keep prod's Site row (domain='unglue.it'), so every emailed link (password-reset, notices, etc.) points at prod instead of the staging box's own host. This command updates Site.objects.get_current() (the SITE_ID row) to the supplied domain (and optional name). It is idempotent: if the row already matches, it prints a no-op message and exits cleanly. Used by the provisioning repo's post-deploy Ansible task to localise the Site to the box's own server_name on every deploy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… not a hard fail Codex review of #1164 fix: a fresh/scrubbed DB without a Site row for SITE_ID would crash the post-deploy task with Site.DoesNotExist. get_or_create makes it self-healing while staying idempotent on existing rows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…1170 (#1171) Align master with deployed Django 4.2 (prod-green @ 751f781) — refs #1170

libraryauth defined its AppConfig (with ready() -> `from . import signals`) in __init__.py and relied on `default_app_config`. Django 4.1 REMOVED default_app_config, and an AppConfig in __init__.py is not auto-discovered (Django only scans <app>/apps.py). So since the 2026-06-17 Django 4.2 cutover, LibraryAuthConfig.ready() never ran, signals.py was never imported, and the `@receiver(user_activated) handle_same_email_account` (same-email account dedup on registration activation) was silently disconnected in production. Fix: move LibraryAuthConfig to libraryauth/apps.py (auto-discovered) and drop the dead default_app_config. Backward-compatible (valid on 4.2 and 5.2). Proven empirically (minimal repro): with the config in __init__.py + no apps.py, ready() does NOT fire on either Django 4.2.21 or 5.2.15; adding apps.py restores it on both. Scope: swept all first-party apps — only `core` and `libraryauth` define ready(); core already has apps.py (fine). libraryauth was the sole casualty. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ

Per Codex review of #1176: assert LibraryAuthConfig is the active app config (so ready() runs) and that handle_same_email_account is connected to the user_activated signal. Guards against the config drifting back out of apps.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ

Fix libraryauth signals: move AppConfig to apps.py so ready() fires

recommended_user (frontend/views/__init__.py:637) is a QuerySet (User.objects.filter(...)). Passing it to the exact related lookup wishlists__user=recommended_user raised, since Django 4.x: ValueError: The QuerySet value for an exact lookup must be limited to one result using slicing. Django 1.11 tolerated a QuerySet here, so /lists/recommended has returned HTTP 500 since the 1.11->4.2 cutover (2026-06-17). Fix: wishlists__user__in=recommended_user. Behavior-preserving for the intended single 'unglueit' user, and degrades gracefully (empty result, not 500) if that user is absent. Valid on both Django 4.2 and 5.2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ

…et-lookup Fix /lists/recommended 500: use __in for QuerySet-valued lookup (fixes #1179)

…lean_email in django-registration 3.x) RegistrationFormNoDisposableEmail.clean_email called super().clean_email(), but django-registration 3.x removed clean_email from RegistrationForm/RegistrationFormUniqueEmail (unique-email check is now a field validator added in __init__). So every POST to /accounts/register/ raised: AttributeError: 'super' object has no attribute 'clean_email' Registration has been fully broken since the 1.11->4.2 cutover. Fix: read self.cleaned_data['email'] directly. Django's _clean_fields populates cleaned_data[name] (running field validators, incl. the unique and confusable-email checks) BEFORE calling clean_<name>, so the disposable check still runs on the already-validated value. Behavior-preserving; valid on django-registration 3.4 under both Django 4.2 and 5.2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ

…-super Fix /accounts/register/ 500: read cleaned_data['email'] (django-registration 3.x) — fixes #1182

…1185) Acq/Campaign/UserProfile .objects.get(<int>) raised 'TypeError: cannot unpack non-iterable int object' (verified live) on every call, and the except DoesNotExist could not catch it. These tasks are actively invoked, so the features failed silently: - watermark_acq -> ebook watermarking on borrow/acquire - process_ebfs -> rights-holder ebook processing - ml_subscribe_task -> mailing-list subscribe Fix: .get(id=...). Pre-existing (not cutover-specific); surfaced by the post-cutover sweep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ

…(refs #1185) A1 MsgForm.full_clean: (1) used bare ValidationError, not imported in this module -> NameError -> 500; AND (2) it raised from an overridden full_clean(), which propagates out of is_valid() as a 500 even with a proper ValidationError (verified empirically). Fixed by using self.add_error(None, ...) so the form is marked invalid cleanly, and by catching ValueError/TypeError so a non-numeric supporter/ work id from POST doesn't crash the int lookup. Triggers on the 'message a supporter' POST (frontend/views:1722) with missing/invalid supporter or work. A2 EbookForm.set_provider: read self.cleaned_data['url'] unconditionally; when clean_url() raises (e.g. duplicate URL) that key is removed -> KeyError -> 500 on ebook add/edit with a duplicate URL. Use .get('url') and skip provider inference when url is absent so the field error is reported normally. Behavior-preserving for valid input; valid on Django 4.2 and 5.2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011dumTGwdDfpJJMGC4ThisJ

core/tasks.py: positional .get() -> get(id=...) (3 tasks) — refs #1185

frontend/forms: stop 500s on invalid input (MsgForm, EbookForm) — refs #1185

eshellman force-pushed the master branch from 3d40651 to 6535505 Compare November 3, 2018 21:29

eshellman added 29 commits December 28, 2023 14:12

Merge branch 'master' into 2023-final

1b3a580

remove travis speciific stuff

11a1a01

Merge pull request #1023 from Gluejar/2023-final

24578d9

2023 final

fix rare scraper bug

b29cb42

add liege (pressbooks)

bcf760a

new chrome UA

e42e845

Merge branch 'master' into maintenance-2023

28e4060

Merge pull request #1024 from Gluejar/maintenance-2023

c94e233

add liege

update handling of DOAB coers

023345f

Merge pull request #1025 from Gluejar/maintenance-2023

2fb5457

update handling of DOAB coers

fix all the null doab covers!

2cd1e28

Merge pull request #1026 from Gluejar/maintenance-2024

7edddde

fix all the null doab covers!

don't check truth of querysets

7fca123

and don' call distinct unless needed

optimize getting first entry of a queryset

444e851

fix tests with old data

5889d09

more optimized queryset access

9351f2b

Merge pull request #1027 from Gluejar/maintenance-2024

e661216

update tests, fix slow OPDS, optimize queryset access

add ubiquity hosts

9de45ca

muse

c1e97e3

Merge pull request #1028 from Gluejar/maintenance-2024

027da7c

muse, ubiquity hosts

tecnum, update de gruyter

408aabb

Merge pull request #1029 from Gluejar/maintenance-2024

b31f660

tecnum, update de gruyter

simplify springer

2624299

scielo

010fa25

improve cmp

c38ceb6

Merge pull request #1030 from Gluejar/maintenance-2024

a94a12d

springer, sciello and cmp

more useragent params

dd512cf

useragent

2bdc75e

Merge pull request #1031 from Gluejar/maintenance-2024

28051bd

Maintenance 2024

rdhyee and others added 30 commits May 13, 2026 17:26

Merge pull request #1149 from Gluejar/fix/use-rate-limited-error

42a984c

Pin pyoai to EbookFoundation fork; consume RateLimitedError

Merge remote-tracking branch 'origin/fix/1155-publication-date-race' …

a8759dc

…into dj42.unglue.it

Merge remote-tracking branch 'origin/fix/1156-widget-doesnotexist' in…

13150c9

…to dj42.unglue.it

Merge pull request #1157 from Gluejar/fix/1155-publication-date-race

6f6f528

Fix #1155: TypeError in Work.publication_date from two-query race

Merge pull request #1158 from Gluejar/fix/1156-widget-doesnotexist

481d9ac

Fix #1156: widget endpoint 500s on unknown/non-numeric/invalid ids

Merge pull request #1167 from Gluejar/fix/faq-copy-typos

f2e3ec3

FAQ copy: objective typo & grammar fixes (#1165)

Merge pull request #1166 from Gluejar/fix/1164-set-site-domain

6ebe539

Add set_site_domain management command (fix #1164)

Align master with deployed Django 4.2 (prod-green @ 751f781) — refs #…

cf97ab3

…1170 (#1171) Align master with deployed Django 4.2 (prod-green @ 751f781) — refs #1170

Merge pull request #1176 from Gluejar/hotfix/libraryauth-appconfig-ready

3df41af

Fix libraryauth signals: move AppConfig to apps.py so ready() fires

Merge pull request #1180 from Gluejar/hotfix/lists-recommended-querys…

c6c583e

…et-lookup Fix /lists/recommended 500: use __in for QuerySet-valued lookup (fixes #1179)

Merge pull request #1183 from Gluejar/hotfix/registration-clean-email…

c99d0f9

…-super Fix /accounts/register/ 500: read cleaned_data['email'] (django-registration 3.x) — fixes #1182

Merge pull request #1186 from Gluejar/hotfix/tasks-get-by-id

c8b8635

core/tasks.py: positional .get() -> get(id=...) (3 tasks) — refs #1185

Merge pull request #1187 from Gluejar/hotfix/form-clean-guards

3aaafb5

frontend/forms: stop 500s on invalid input (MsgForm, EbookForm) — refs #1185

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update from Gluejar#94

update from Gluejar#94
eshellman wants to merge 1122 commits into
EbookFoundation:masterfrom
Gluejar:master

eshellman commented Nov 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eshellman commented Nov 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants