-
-
Notifications
You must be signed in to change notification settings - Fork 30
Add converters benchmark and add Bitarray column test for votable #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks! Note: Follow-up of #141 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems ok but comment below on filename - I'm also not sure it should even be a separate file
benchmarks/converters.py
Outdated
@@ -0,0 +1,36 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should probably be called votable_converters if we want to have it as a separate file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've renamed it to votable_converters
. I had it as part of votable
initially, but I felt like the votable parsing is a different test from the more targeted method testing done here. Would it perhaps make to either move them to a sub-directory:
/votable
/converters.py
/parsing.py
Or shall I leave it as is and perhaps also rename vot
able to votable_parsing.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrofrog does this look good now? Please advise. Thanks!
85781c5
to
f76268c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about astropy.io.votable
to review the big picture, but I can review implementation details.
benchmarks/votable.py
Outdated
@@ -20,21 +17,33 @@ | |||
id_data = np.arange(LARGE_SIZE, dtype=np.int64) | |||
flag_data = np.random.choice([True, False], LARGE_SIZE) | |||
quality_data = np.random.randint(0, 256, LARGE_SIZE, dtype=np.uint8) | |||
bool_data = np.random.randint(0, 2, LARGE_SIZE).astype(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this code using legacy random generation? I am only highlighting it here, but it is a problem throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I change this for all instances of this in votable.py
for this PR, or shall I do so in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not bikeshed this PR. Benchmark code is old and lack of maintenance. We cannot expect it to look all shiny. For this PR, if you want to modernize just the stuff you touch, that is fine. But I wouldn't start cleaning the entire code base. Thanks for your patience!
benchmarks/votable.py
Outdated
first_table = votable.get_first_table() | ||
for field in first_table.fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_table = votable.get_first_table() | |
for field in first_table.fields: | |
for field in votable.get_first_table().fields: |
benchmarks/votable.py
Outdated
|
||
short_names = np.array([f"OBJ_{i:08d}" for i in range(LARGE_SIZE)]) | ||
filter_names = np.random.choice(['u', 'g', 'r', 'i', 'z', 'Y'], LARGE_SIZE) | ||
filter_names = np.random.choice(["u", "g", "r", "i", "z", "Y"], LARGE_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the pull request cluttered with unrelated formatting changes? I am only highlighting it here, but it is a problem throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted those changes, I can put in a separate PR to address the PEP8 issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be simplest to enforce Ruff formatting everywhere, but that deserves a separate pull request. In this pull request it would be best to apply Ruff formatting only to the lines that are being edited anyways.
benchmarks/votable.py
Outdated
[ | ||
ra_data[:LARGE_SIZE], | ||
dec_data[:LARGE_SIZE], | ||
mag_data[:LARGE_SIZE], | ||
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | ||
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | ||
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | ||
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | ||
], | ||
names=[ | ||
"ra", | ||
"dec", | ||
"mag", | ||
"detected", | ||
"saturated", | ||
"edge_pixel", | ||
"cosmic_ray", | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ | |
ra_data[:LARGE_SIZE], | |
dec_data[:LARGE_SIZE], | |
mag_data[:LARGE_SIZE], | |
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | |
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | |
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | |
np.random.randint(0, 2, LARGE_SIZE).astype(bool), | |
], | |
names=[ | |
"ra", | |
"dec", | |
"mag", | |
"detected", | |
"saturated", | |
"edge_pixel", | |
"cosmic_ray", | |
], | |
{ | |
"ra": ra_data[:LARGE_SIZE], | |
"dec": dec_data[:LARGE_SIZE], | |
"mag": mag_data[:LARGE_SIZE], | |
"detected": rng.randint(0, 2, LARGE_SIZE).astype(bool), | |
"saturated": rng.randint(0, 2, LARGE_SIZE).astype(bool), | |
"edge_pixel": rng.randint(0, 2, LARGE_SIZE).astype(bool), | |
"cosmic_ray": rng.randint(0, 2, LARGE_SIZE).astype(bool), | |
} |
where rng
is a numpy
random number generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be rng.integers
if we use random.default_rng()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check what the function name is.
92bcf89
to
1b89d4c
Compare
1b89d4c
to
ad449b0
Compare
ad449b0
to
3247d53
Compare
) | ||
|
||
self.binary_bitarray_8_data = create_votable_bytes( | ||
table, "binary", "8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some new code here that hasn't been formatted with Ruff. New code should be formatted with Ruff so that when Ruff is adopted in this repository then the patch to update the formatting would be smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't core library. We never even discussed PEP 8 here, much less ruff. So I wouldn't suddenly enforce that now as a rule that would block merge here.
Style stuff aside, I see a comment from astrofrog above awaiting his reply. Anything else not related to style blocking merge here? |
Any updates on this? Do we think the PR looks good as it is or is there anything else I should change? |
Since @astrofrog requested changes, would be nice if he can re-review and approve. Thanks, all! |
Description
In the context of a couple of PRs in astropy which are looking into improving the parsing performance of VOTables
io.votable
Boolean array conversions astropy#18455This PR does the following in attempt to help with benchmarking the changes:
bool_to_bitarray
andbitarray_to_bool
methods.