-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster FragmentAnnotation
equality check
#73
Comments
Please don't hijack issues with unrelated questions. We are busily preparing additional information for interested GSoC contributors. |
hi @bittremieux ,i have understood all your sayings and it will not be repeated in future.however ,for the time being can i contribute in this issue |
Yes, this would certainly be a good issue to start contributing. |
Hi @bittremieux I have changed the code from str to comparing directly with its relevant attributes import timeit
from spectrum_utils.fragment_annotation import FragmentAnnotation
fa1 = FragmentAnnotation(ion_type="y", neutral_loss="-H2O", isotope=1, charge=2, adduct="[M+H+Na]", analyte_number=4, mz_delta=(0.1, "Da"))
fa2 = FragmentAnnotation(ion_type="y", neutral_loss="-H2O", isotope=1, charge=2, adduct="[M+H+Na]", analyte_number=4, mz_delta=(0.1, "Da"))
def old_eq(fa1, fa2):
return str(fa1) == str(fa2)
def new_eq(fa1, fa2):
if not isinstance(fa2, FragmentAnnotation):
return False
return (
fa1.ion_type == fa2.ion_type
and fa1.neutral_loss == fa2.neutral_loss
and fa1.isotope == fa2.isotope
and fa1.charge == fa2.charge
and fa1.adduct == fa2.adduct
and fa1.analyte_number == fa2.analyte_number
and fa1.mz_delta == fa2.mz_delta
)
old_time = timeit.timeit(lambda: old_eq(fa1, fa2), number=100000)
new_time = timeit.timeit(lambda: new_eq(fa1, fa2), number=100000)
print(f"Old time: {old_time:.4f} seconds")
print(f"New time: {new_time:.4f} seconds") Here is the results Please let me know if its proper and i will open a PR |
hi @bittremieux ,i have also benchmark the resullts in past using pytest‑benchmark fixture but those results are a bit different from what @mandeepnh5 is getting using timeit module,in my results i was getting the old implementation performance better than the new one (direct equality check) >
|
@mandeepnh5 Yes, that looks appropriate for a PR, please go ahead. 🙂 @dikshant182004 The discrepancy is indeed weird. I'm not familiar with pytest-benchmark, is it possible that there is some overhead influencing the results? What is your specific implementation? Ultimately we want to use to implementation that gives us the best performance, which also needs to be benchmarked. |
Thanks I will create a pr |
hi @bittremieux ,i was not able to find my previous implementation so i tried it once again by just fixing some parts of the @mandeepnh5 code using pytest-benchmark fixture .here is the code. import pytest
from spectrum_utils.fragment_annotation import FragmentAnnotation
def new_equality_check(fa1, fa2):
if not isinstance(fa2, FragmentAnnotation):
return False
return (fa1.ion_type == fa2.ion_type
and fa1.neutral_loss == fa2.neutral_loss
and fa1.isotope == fa2.isotope
and fa1.charge == fa2.charge
and fa1.adduct == fa2.adduct
and fa1.analyte_number == fa2.analyte_number
and fa1.mz_delta == fa2.mz_delta)
def create_fragments():
frag1 = FragmentAnnotation(
ion_type='b',
neutral_loss=18,
isotope=0,
charge=1,
adduct='',
analyte_number=1,
mz_delta=(0.0, "Da")
)
frag2 = FragmentAnnotation(
ion_type='b',
neutral_loss=18,
isotope=0,
charge=1,
adduct='',
analyte_number=1,
mz_delta=(0.0, "Da")
)
return frag1, frag2
@pytest.mark.parametrize("equality_fn", [
lambda f1, f2: FragmentAnnotation.__eq__,
lambda f1, f2: new_equality_check(f1, f2)
])
def test_eq_runtime(benchmark, equality_fn):
frag1, frag2 = create_fragments()
def run_eq():
for _ in range(1000):
equality_fn(frag1, frag2)
result = benchmark(run_eq)
return result so the benchmarking results are expected to be same as i discussed earlier :-
|
i feel using pytest benchmark fixture would be more prominent then the timeit module because we can integrate with the testing framework and provides statistical insights that can help decide if the performance improvement is significant and consistent across multiple runs. |
Yes, I agree that pytest-benchmark seems like a more principled approach. I'm not really familiar with it though. Does it also support running timing comparisons between successive version of the code to flag excessive slowdowns that would be newly introduced? The discrepancy in results is very weird though. Intuitively I'd expect that the new implementation should be faster because it's less complex (e.g. no regex matching, etc.) and because of the eager evalution of the |
However i am also not familier with the pytest‑benchmark but after doing some research i find that it does support tracking performance changes over time using options like --benchmark-save and --benchmark-compare to persist benchmark results across runs to detect and flag any excessive slowdowns introduced in new versions. |
That could be interesting to set up for all time-sensitive functions at some point, to ensure that newer versions don't introduce (excessive) slowdowns. Direct comparison should always be faster than creating a string, let alone if a regex is used. In contrast, in your test it's not even a bit slower, but up to an order of magnitude. Which doesn't make sense to me at all. We'll have to evaluate that in more details, and @Janne98 will do an independent evaluation of #80. |
okay @bittremieux ,i will try to learn more about pytest benchmark.Additionally i am planning to use |
What do you mean with CPython? That's the default Python backend. Or do you mean Cython? In which case I don't think it's such a good idea, because it will make installation on different systems much more challenging. If we want to compile anything, Numba is a better approach. |
yeah numba can be a good option to try with .i will give try using it . and cython may be is not a good idea as u said it become more challenging furthur |
Fixed in #80. |
Checking whether two
FragmentAnnotation
s are equal is slow due to the regex comparison in__str__
. The regex can be compiled or more direct equality checking can be implemented.It's important to benchmark the runtime of the old implementation versus any optimized implementation.
The text was updated successfully, but these errors were encountered: