Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/more fields for bulk add by party form #2496

Merged
merged 12 commits into from
Mar 31, 2025

Conversation

awdem
Copy link
Contributor

@awdem awdem commented Mar 10, 2025

This PR adds more fields the bulk add by party form. For person identifiers, I ended up created a some custom MultiValueFields and corresponding MultiWidgets because it made more sense to me than a huge and confusing copy-paste. They still duplicate a lot of validation methods in a way that could be much cleaner, but as we've discussed, it's too much work to do before SOPN day

There are a few outstanding issues that I'm not sure how to address yet:

  1. Elections with lots of ballots raise an error. This is a big one as I don't think the form will be that useful otherwise:

    • The number of GET/POST parameters exceeded settings.DATA_UPLOAD_MAX_NUMBER_FIELDS.
    • an example: localhost8080:/bulk_adding/party/local.buckinghamshire.2025-05-01/PP52/
      • Looking at the source html, there are 792 input attributes on the page. I thought the max was 1000, so I'm not sure what's going here.
  2. The way I've saving person identifiers is not captured by the person versioning system. I tried to adapt the method to record the person version but haven't figured out a way to make it work.

There are also a few things I haven't implemented yet. However, I think there's already enough here that I could use a review even before I add these:

  • Adding a logged action for when a valid form is submitted
  • I still need to check how need to check how elections that use party lists appear. Not for the upcoming elections, but for future use.

Some things that I'd like to improve in the future:

  • The error messages could appear closer to their respective fields
  • Elections with lots of ballots have very long forms. It would be nice to write some javascript to make the per ballot forms collapsible so they don't appear as large initially.

Useful elections for local testing:

local.swansea.2025-03-27 - one ballot, one seat
local.glasgow-city.2025-03-20 - two ballots, one seat each (needs to be unlocked)
local.city-of-london.2025-03-20 - many ballots with more than one seat


@awdem awdem force-pushed the feat/more-fields-for-bulk-add-by-party-form branch 2 times, most recently from b7e79e0 to 26be521 Compare March 17, 2025 14:53
@awdem awdem force-pushed the feat/more-fields-for-bulk-add-by-party-form branch 2 times, most recently from 058d08e to c8fc8ea Compare March 25, 2025 18:16
@awdem awdem force-pushed the feat/more-fields-for-bulk-add-by-party-form branch 3 times, most recently from 6ba7224 to 7ca84df Compare March 26, 2025 15:07
@@ -115,20 +115,154 @@ def test_submit_name_for_area(self):

form = response.forms[1]
# Now submit the valid form
with self.assertNumQueries(FuzzyInt(49, 54)):
with self.assertNumQueries(FuzzyInt(49, 72)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know what this assertion is for, I just made the numbers go up until the tests wouldn't fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because in Django it's quite easy to write an N+1 problem. The assertion is making sure we're not calling the database loads of times when we show a page.

I've not checked fully here, but I suspect we're going to see more DB activity when writing more data to the database.

It would be good to raise the other side of the fuzzy int though. That class is a range of min/max to deal with e.g tests running in a different order. I'd expect them to be within about 5 queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bit of tweaking in 61fb1aa

Comment on lines 266 to 283
def save_person_identifiers(self, pids_dict, person):
# TODO: Saving PIDs like this doesn't record a person version
for pid_type, pid in pids_dict.items():
PersonIdentifier.objects.update_or_create(
person=person,
value_type=pid_type,
defaults={"value": pid},
)
Copy link
Contributor Author

@awdem awdem Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried:

    def update_person_identifiers(self, pids_dict, person, request, source):
        for pid_type, pid in pids_dict.items():
            PersonIdentifier.objects.update_or_create(
                person=person,
                value_type=pid_type,
                defaults={"value": pid},
            )
        change_metadata = get_change_metadata(request, source)
        person.record_version(change_metadata)
        person.save()

but 'record_version' wasn't finding the newly created/updated person_identifiers. I couldn't quite figure out why; I think it has something to do with the fact that get_person_as_version_data prefetches the person_identifiers? When I commented out the following line:

prefetch_related_objects([person], "tmp_person_identifiers")

Then it successfully created a record. However, it meant if you made a new person. there would be 3 initial version records (one from add_person, one from save_person_identifiers, and one from update_person) so I'm not convinced that this is the best way to do it anyway.

I thought about just passing the pids_dict to update person instead of adding the save_person_identifiers method but the update person is already doing a bunch of stuff that I'm wary of adding more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a read more, but I think this needs to be done right at the end of the process. We'll need to save a new person first, add the PIs then get the version data and save again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have implemented what you suggested in Slack in 9ef60a1 with one tweak. calling refresh_from_db() after set_person_fields() was undoing the latter because those those fields aren't saved until the person object is saved in update_person(). So I just moved set_person_fields() to be after the refresh_from_db().

Comment on lines +219 to +221
self.fields["previous_party_affiliations"] = (
PreviousPartyAffiliationsField(membership=self.instance)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snuck in from a ruff format that I ran.

Comment on lines +112 to +131
URLValidator()(pid)

# Validate pid:
try:
if pid_type == "instagram_url":
clean_instagram_url(pid)
elif pid_type == "linkedin_url":
clean_linkedin_url(pid)
elif pid_type == "mastodon_url":
clean_mastodon_username(pid)
elif pid_type == "twitter_url":
clean_twitter_username(pid)
elif pid_type == "wikidata_id":
clean_wikidata_id(pid)
elif pid_type == "email":
clean_email(pid)
except ValueError as e:
raise ValidationError(e)
Copy link
Contributor Author

@awdem awdem Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to port all the existing validation to this field, but I may have missed a few so it would be good to know if you see any obvious ones. Facebook, for example?

@awdem awdem requested a review from symroe March 26, 2025 16:52
@awdem awdem marked this pull request as ready for review March 26, 2025 16:52
@awdem
Copy link
Contributor Author

awdem commented Mar 27, 2025

Update to the The number of GET/POST parameters exceeded settings.DATA_UPLOAD_MAX_NUMBER_FIELDS error:

I tried setting DATA_UPLOAD_MAX_NUMBER_FIELDS to None in my local settings and I was able to submit the form for local.buckinghamshire.2025-05-01.

I checked the length of the request object and it was 1168.

From looking at the POST request in the debug toolbar:

fields_per_ballot = 10 * n + 4 where n is the number of seats. The 4 extra fields are from the formset management form.

I can't think of a clever way to cut this down, short of removing some fields. I could only have two person_identifier fields instead of 3? Or I could remove the some/all of the demographic fields.

Or we could set DATA_UPLOAD_MAX_NUMBER_FIELDS to a number that is higher than the upcoming election with the most amount of fields?

fields_per_election = 10 * n + 4 * b
n = total seats contested
b = total # ballots

However, neither of these choices are futureproof since their may be future elections with an amount of ballots/seats greater than whatever 'DATA_UPLOAD_MAX_NUMBER_FIELDS' is set to (unless it's set to None).

@chris48s
Copy link
Member

Drive-by comment..

Just another quick thought on request size: If we're making forms that allow you to fill in 50 candidates worth of data or whatever, we should check submitting those with long data in every single field. I am pretty sure that as well as the number of form fields in a single post request, there will also be some arbitrary limits on things like the size (in bytes) of a single POST request and the execution time we can run for creating all those DB objects (and related objects).

Being able to create in bulk is a big time-saver if it works, but imagine how frustrating it would be to fill in data for ~50 ballots and then get a 500 at the end of it.

@awdem awdem force-pushed the feat/more-fields-for-bulk-add-by-party-form branch from 3c355fd to 90d3d86 Compare March 27, 2025 16:54
@awdem
Copy link
Contributor Author

awdem commented Mar 27, 2025

Drive-by comment..

Just another quick thought on request size: If we're making forms that allow you to fill in 50 candidates worth of data or whatever, we should check submitting those with long data in every single field. I am pretty sure that as well as the number of form fields in a single post request, there will also be some arbitrary limits on things like the size (in bytes) of a single POST request and the execution time we can run for creating all those DB objects (and related objects).

Being able to create in bulk is a big time-saver if it works, but imagine how frustrating it would be to fill in data for ~50 ballots and then get a 500 at the end of it.

Yeah that's a fear of mine ha. I've tried to create a stress test in 90d3d86. It creates an election with 100 ballots each with a post then it fills in the all the fields in the form and submits it. Currently it's failing, but I did get it to pass when I added DATA_UPLOAD_MAX_NUMBER_FIELDS = None to settings/testing.py. It was very slow, though.

I'm also unsure if using WebTest like I am is the best way to do this type of test?

@awdem awdem force-pushed the feat/more-fields-for-bulk-add-by-party-form branch 3 times, most recently from 7f189c0 to 1d290e0 Compare March 31, 2025 15:27
awdem added 4 commits March 31, 2025 16:40
because we now iterate over all the form fields wehn we render them, the party list field was showing up twice
This is so we don't error on bulk add by party forms with lots of ballots
@awdem awdem force-pushed the feat/more-fields-for-bulk-add-by-party-form branch from 1d290e0 to 7eac450 Compare March 31, 2025 15:42
@awdem awdem merged commit 8caca57 into master Mar 31, 2025
6 checks passed
@awdem awdem deleted the feat/more-fields-for-bulk-add-by-party-form branch March 31, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants