Add converters benchmark and add Bitarray column test for votable #142

stvoutsin · 2025-07-29T17:54:30Z

Description

In the context of a couple of PRs in astropy which are looking into improving the parsing performance of VOTables

This PR does the following in attempt to help with benchmarking the changes:

Adds a separate converters.py module which has a test for benchmarking performance of the bool_to_bitarray and bitarray_to_bool methods.
Adds a test to the votable.py test module for benchmarking BitArray columns
Fixes a couple of PEP8 issues.

pllim · 2025-07-31T17:13:45Z

Thanks!

Note: Follow-up of #141

astrofrog

This seems ok but comment below on filename - I'm also not sure it should even be a separate file

astrofrog · 2025-07-31T18:34:27Z

benchmarks/converters.py

@@ -0,0 +1,36 @@
+import numpy as np


This file should probably be called votable_converters if we want to have it as a separate file?

Ok I've renamed it to votable_converters. I had it as part of votable initially, but I felt like the votable parsing is a different test from the more targeted method testing done here. Would it perhaps make to either move them to a sub-directory:

/votable /converters.py /parsing.py

Or shall I leave it as is and perhaps also rename votable to votable_parsing.py ?

@astrofrog does this look good now? Please advise. Thanks!

eerovaher

I don't know enough about astropy.io.votable to review the big picture, but I can review implementation details.

eerovaher · 2025-07-31T21:07:28Z

benchmarks/votable.py

@@ -20,21 +17,33 @@
 id_data = np.arange(LARGE_SIZE, dtype=np.int64)
 flag_data = np.random.choice([True, False], LARGE_SIZE)
 quality_data = np.random.randint(0, 256, LARGE_SIZE, dtype=np.uint8)
+bool_data = np.random.randint(0, 2, LARGE_SIZE).astype(bool)


Why is this code using legacy random generation? I am only highlighting it here, but it is a problem throughout.

Shall I change this for all instances of this in votable.py for this PR, or shall I do so in a separate PR?

Let's not bikeshed this PR. Benchmark code is old and lack of maintenance. We cannot expect it to look all shiny. For this PR, if you want to modernize just the stuff you touch, that is fine. But I wouldn't start cleaning the entire code base. Thanks for your patience!

eerovaher · 2025-07-31T21:08:49Z

benchmarks/votable.py

+        first_table = votable.get_first_table()
+        for field in first_table.fields:


Suggested change

first_table = votable.get_first_table()

for field in first_table.fields:

for field in votable.get_first_table().fields:

eerovaher · 2025-07-31T21:10:25Z

benchmarks/votable.py


 short_names = np.array([f"OBJ_{i:08d}" for i in range(LARGE_SIZE)])
-filter_names = np.random.choice(['u', 'g', 'r', 'i', 'z', 'Y'], LARGE_SIZE)
+filter_names = np.random.choice(["u", "g", "r", "i", "z", "Y"], LARGE_SIZE)


Why is the pull request cluttered with unrelated formatting changes? I am only highlighting it here, but it is a problem throughout.

I've reverted those changes, I can put in a separate PR to address the PEP8 issues.

It would be simplest to enforce Ruff formatting everywhere, but that deserves a separate pull request. In this pull request it would be best to apply Ruff formatting only to the lines that are being edited anyways.

eerovaher · 2025-07-31T21:12:53Z

benchmarks/votable.py

+            [
+                ra_data[:LARGE_SIZE],
+                dec_data[:LARGE_SIZE],
+                mag_data[:LARGE_SIZE],
+                np.random.randint(0, 2, LARGE_SIZE).astype(bool),
+                np.random.randint(0, 2, LARGE_SIZE).astype(bool),
+                np.random.randint(0, 2, LARGE_SIZE).astype(bool),
+                np.random.randint(0, 2, LARGE_SIZE).astype(bool),
+            ],
+            names=[
+                "ra",
+                "dec",
+                "mag",
+                "detected",
+                "saturated",
+                "edge_pixel",
+                "cosmic_ray",
+            ],


Suggested change

[

ra_data[:LARGE_SIZE],

dec_data[:LARGE_SIZE],

mag_data[:LARGE_SIZE],

np.random.randint(0, 2, LARGE_SIZE).astype(bool),

np.random.randint(0, 2, LARGE_SIZE).astype(bool),

np.random.randint(0, 2, LARGE_SIZE).astype(bool),

np.random.randint(0, 2, LARGE_SIZE).astype(bool),

],

names=[

"ra",

"dec",

"mag",

"detected",

"saturated",

"edge_pixel",

"cosmic_ray",

],

{

"ra": ra_data[:LARGE_SIZE],

"dec": dec_data[:LARGE_SIZE],

"mag": mag_data[:LARGE_SIZE],

"detected": rng.randint(0, 2, LARGE_SIZE).astype(bool),

"saturated": rng.randint(0, 2, LARGE_SIZE).astype(bool),

"edge_pixel": rng.randint(0, 2, LARGE_SIZE).astype(bool),

"cosmic_ray": rng.randint(0, 2, LARGE_SIZE).astype(bool),

}

where rng is a numpy random number generator.

Should that be rng.integers if we use random.default_rng() ?

I didn't check what the function name is.

eerovaher · 2025-08-04T13:19:47Z

benchmarks/votable.py

+        )
+
+        self.binary_bitarray_8_data = create_votable_bytes(
+            table, "binary", "8")


There's some new code here that hasn't been formatted with Ruff. New code should be formatted with Ruff so that when Ruff is adopted in this repository then the patch to update the formatting would be smaller.

This isn't core library. We never even discussed PEP 8 here, much less ruff. So I wouldn't suddenly enforce that now as a rule that would block merge here.

pllim · 2025-08-04T15:27:40Z

Style stuff aside, I see a comment from astrofrog above awaiting his reply. Anything else not related to style blocking merge here?

stvoutsin · 2025-08-12T16:33:04Z

Style stuff aside, I see a comment from astrofrog above awaiting his reply. Anything else not related to style blocking merge here?

Any updates on this? Do we think the PR looks good as it is or is there anything else I should change?

pllim · 2025-08-12T17:23:26Z

Since @astrofrog requested changes, would be nice if he can re-review and approve. Thanks, all!

pllim requested a review from mhvk July 31, 2025 17:12

pllim added the new-benchmark label Jul 31, 2025

pllim requested a review from eerovaher July 31, 2025 17:13

astrofrog requested changes Jul 31, 2025

View reviewed changes

Add converters benchmark and add Bitarray column test for votable

f76268c

stvoutsin force-pushed the votable-benchmarks-bitarray branch from 85781c5 to f76268c Compare July 31, 2025 20:50

eerovaher reviewed Jul 31, 2025

View reviewed changes

stvoutsin force-pushed the votable-benchmarks-bitarray branch 2 times, most recently from 92bcf89 to 1b89d4c Compare July 31, 2025 22:09

stvoutsin marked this pull request as draft July 31, 2025 22:13

stvoutsin force-pushed the votable-benchmarks-bitarray branch from 1b89d4c to ad449b0 Compare July 31, 2025 22:20

stvoutsin marked this pull request as ready for review July 31, 2025 22:31

PR Review changes - Remove formatting changes, change to use np rng

3247d53

stvoutsin force-pushed the votable-benchmarks-bitarray branch from ad449b0 to 3247d53 Compare July 31, 2025 22:35

eerovaher reviewed Aug 4, 2025

View reviewed changes

		first_table = votable.get_first_table()
		for field in first_table.fields:

	first_table = votable.get_first_table()
	for field in first_table.fields:
	for field in votable.get_first_table().fields:

Uh oh!

Add converters benchmark and add Bitarray column test for votable #142

Are you sure you want to change the base?

Add converters benchmark and add Bitarray column test for votable #142

Uh oh!

Conversation

stvoutsin commented Jul 29, 2025 • edited by pllim Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

pllim commented Jul 31, 2025

Uh oh!

astrofrog left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eerovaher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pllim commented Aug 4, 2025

Uh oh!

stvoutsin commented Aug 12, 2025

Uh oh!

pllim commented Aug 12, 2025

Uh oh!

Uh oh!

stvoutsin commented Jul 29, 2025 •

edited by pllim

Loading