Skip to content

Add converters benchmark and add Bitarray column test for votable #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stvoutsin
Copy link
Contributor

@stvoutsin stvoutsin commented Jul 29, 2025

Description

In the context of a couple of PRs in astropy which are looking into improving the parsing performance of VOTables

This PR does the following in attempt to help with benchmarking the changes:

  • Adds a separate converters.py module which has a test for benchmarking performance of the bool_to_bitarray and bitarray_to_bool methods.
  • Adds a test to the votable.py test module for benchmarking BitArray columns
  • Fixes a couple of PEP8 issues.

@pllim pllim requested a review from mhvk July 31, 2025 17:12
@pllim
Copy link
Member

pllim commented Jul 31, 2025

Thanks!

Note: Follow-up of #141

@pllim pllim requested a review from eerovaher July 31, 2025 17:13
Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok but comment below on filename - I'm also not sure it should even be a separate file

@@ -0,0 +1,36 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be called votable_converters if we want to have it as a separate file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've renamed it to votable_converters. I had it as part of votable initially, but I felt like the votable parsing is a different test from the more targeted method testing done here. Would it perhaps make to either move them to a sub-directory:

/votable 
    /converters.py
    /parsing.py

Or shall I leave it as is and perhaps also rename votable to votable_parsing.py ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog does this look good now? Please advise. Thanks!

@stvoutsin stvoutsin force-pushed the votable-benchmarks-bitarray branch from 85781c5 to f76268c Compare July 31, 2025 20:50
Copy link
Member

@eerovaher eerovaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about astropy.io.votable to review the big picture, but I can review implementation details.

@@ -20,21 +17,33 @@
id_data = np.arange(LARGE_SIZE, dtype=np.int64)
flag_data = np.random.choice([True, False], LARGE_SIZE)
quality_data = np.random.randint(0, 256, LARGE_SIZE, dtype=np.uint8)
bool_data = np.random.randint(0, 2, LARGE_SIZE).astype(bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code using legacy random generation? I am only highlighting it here, but it is a problem throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I change this for all instances of this in votable.py for this PR, or shall I do so in a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not bikeshed this PR. Benchmark code is old and lack of maintenance. We cannot expect it to look all shiny. For this PR, if you want to modernize just the stuff you touch, that is fine. But I wouldn't start cleaning the entire code base. Thanks for your patience!

Comment on lines 42 to 43
first_table = votable.get_first_table()
for field in first_table.fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
first_table = votable.get_first_table()
for field in first_table.fields:
for field in votable.get_first_table().fields:


short_names = np.array([f"OBJ_{i:08d}" for i in range(LARGE_SIZE)])
filter_names = np.random.choice(['u', 'g', 'r', 'i', 'z', 'Y'], LARGE_SIZE)
filter_names = np.random.choice(["u", "g", "r", "i", "z", "Y"], LARGE_SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the pull request cluttered with unrelated formatting changes? I am only highlighting it here, but it is a problem throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted those changes, I can put in a separate PR to address the PEP8 issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simplest to enforce Ruff formatting everywhere, but that deserves a separate pull request. In this pull request it would be best to apply Ruff formatting only to the lines that are being edited anyways.

Comment on lines 210 to 227
[
ra_data[:LARGE_SIZE],
dec_data[:LARGE_SIZE],
mag_data[:LARGE_SIZE],
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
],
names=[
"ra",
"dec",
"mag",
"detected",
"saturated",
"edge_pixel",
"cosmic_ray",
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[
ra_data[:LARGE_SIZE],
dec_data[:LARGE_SIZE],
mag_data[:LARGE_SIZE],
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
np.random.randint(0, 2, LARGE_SIZE).astype(bool),
],
names=[
"ra",
"dec",
"mag",
"detected",
"saturated",
"edge_pixel",
"cosmic_ray",
],
{
"ra": ra_data[:LARGE_SIZE],
"dec": dec_data[:LARGE_SIZE],
"mag": mag_data[:LARGE_SIZE],
"detected": rng.randint(0, 2, LARGE_SIZE).astype(bool),
"saturated": rng.randint(0, 2, LARGE_SIZE).astype(bool),
"edge_pixel": rng.randint(0, 2, LARGE_SIZE).astype(bool),
"cosmic_ray": rng.randint(0, 2, LARGE_SIZE).astype(bool),
}

where rng is a numpy random number generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be rng.integers if we use random.default_rng() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check what the function name is.

@stvoutsin stvoutsin force-pushed the votable-benchmarks-bitarray branch 2 times, most recently from 92bcf89 to 1b89d4c Compare July 31, 2025 22:09
@stvoutsin stvoutsin marked this pull request as draft July 31, 2025 22:13
@stvoutsin stvoutsin force-pushed the votable-benchmarks-bitarray branch from 1b89d4c to ad449b0 Compare July 31, 2025 22:20
@stvoutsin stvoutsin marked this pull request as ready for review July 31, 2025 22:31
@stvoutsin stvoutsin force-pushed the votable-benchmarks-bitarray branch from ad449b0 to 3247d53 Compare July 31, 2025 22:35
)

self.binary_bitarray_8_data = create_votable_bytes(
table, "binary", "8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some new code here that hasn't been formatted with Ruff. New code should be formatted with Ruff so that when Ruff is adopted in this repository then the patch to update the formatting would be smaller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't core library. We never even discussed PEP 8 here, much less ruff. So I wouldn't suddenly enforce that now as a rule that would block merge here.

@pllim
Copy link
Member

pllim commented Aug 4, 2025

Style stuff aside, I see a comment from astrofrog above awaiting his reply. Anything else not related to style blocking merge here?

@stvoutsin
Copy link
Contributor Author

Style stuff aside, I see a comment from astrofrog above awaiting his reply. Anything else not related to style blocking merge here?

Any updates on this? Do we think the PR looks good as it is or is there anything else I should change?

@pllim
Copy link
Member

pllim commented Aug 12, 2025

Since @astrofrog requested changes, would be nice if he can re-review and approve. Thanks, all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants