-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip whitespace from beginning and end of all records #31
Comments
@dlebauer et al.: If you want normalized white space to be a database invariant for any particular columns, I strongly recommend implementing this at the database level with a trigger function. Then, no matter from what pathway the value of a column is inserted or updated, one can be assured that the white space adheres to some normal form. For an example of how this might be done, you can look at the PL/pgSQL trigger function A major motivation for this was to give the uniqueness constraint |
Hi @dlebauer, |
@shivanshu1086 that would be great - step 1: @gsrohde - based on your comment above, my understanding is that if you try to put a cultivars.csv with a white space added, the white space should automatically be stripped when it is inserted. (is that correct?). I am pretty sure that this didn't happen, which is why I wrote up this issue. please let me know if the following sounds reasonable: @shivanshu1086 you could start by creating a reproducible error - see if you can use curl to insert this file cultivars_whitespace_test.csv into the database. If the strip whitespace function is working, it should fail because it violates a uniqueness constraint. If it works, you can query the cultivars table to ensure that the whitespace was stripped (e.g. the record should have the trailing whitespace removed and be converted from If that works, you can also check that the uniqueness constraint works by trying to upload this file cultivars_uniqueness_test.csv and expect that it will fail with an error that says it violates uniqueness constraint. All of that just confirms the current behavior. Next step would be to follow the pattern here: to implement normalize_name_whitespace() on additional fields. If you can submit a pull request that applies this pattern to the name field of the treatments table as a first step, then during the pull request review we can discuss internally which fields and tables we want to apply this function to. also, I've removed the "good first issue" tag since it was originally concieved of as working in the python / Flask API rather than as a database trigger function. |
@dlebauer I tested this out on the bety7 machine with update cultivars set name = ' Caddo lots of spaces more spaces ' where id = 7000000503; (I probably don't have access to whatever machine you use.) The resulting row looked like this 7000000503 | 2588 | Caddo lots of spaces more spaces | | | 2020-03-02 20:03:57.120819 | 2020-03-02 20:19:39.908616 | You might want to try doing Indexes:
...
"unique_name_per_species" UNIQUE CONSTRAINT, btree (name, specie_id)
...
Check constraints:
"normalized_names" CHECK (is_whitespace_normalized(name::text))
...
Triggers:
normalize_cultivar_names BEFORE INSERT OR UPDATE ON cultivars FOR EACH ROW EXECUTE PROCEDURE normalize_name_whitespace()
... |
ok @dlebauer , working on that! |
Hey @dlebauer @gsrohde ,
|
I have done with that error @dlebauer, Please have a look at what I have got with that. That works, it has successfully inserted the cultivars_whitespace_test.csv into the database. Then I tried to check the uniqueness constrains by adding the rows into cultivars_whitespace_test.csv from cultivars_uniqueness_test.csv. Then I came up with this. |
@gsrohde the database I was using has those triggers installed. Very mysterious. One could try debugging it by putting output statements in the function called by the trigger function, making sure it gets called upon a qualifying event and making sure it does what it is supposed to do. |
remove leading and trailing whitespace,
eg. this:
ABC DEF ,Lactuca sativa,,
should becomeABC DEF,Lactuca sativa,,
for all fields
The text was updated successfully, but these errors were encountered: