update of plots

HDembinski · May 9, 2023 · f90e5ff · f90e5ff
1 parent fc04d9c
commit f90e5ff
Show file tree

Hide file tree

Showing 25 changed files with 18,766 additions and 16,547 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -29,7 +29,7 @@ repos:
   - id: sort-simple-yaml
   - id: file-contents-sorter
   - id: trailing-whitespace
-    exclude: ^doc/_static/.*.svg
+    exclude: .*\.svg
 
 # Python linter (Flake8)
 - repo: https://github.com/PyCQA/flake8

diff --git a/README.md b/README.md
@@ -67,6 +67,10 @@ Note that this is only faster if `x` has sufficient length (about 1000 elements
 
 The following benchmarks were produced on an Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz against SciPy-1.10.1. The dotted line on the right-hand figure shows the expected speedup (4x) from parallelization on a CPU with four physical cores.
 
+We see large speed-ups with respect to `scipy` for almost all distributions. Also calls with short arrays profit from `numba_stats`, due to the reduced call-overhead. The functions `voigt.pdf` and `t.ppf` do not run faster than the `scipy` versions, because we call the respective `scipy` implementation written in FORTRAN. The advantage provided by `numba_stats` here is that you can call these functions from other `numba`-JIT'ed functions, which is not possible with the `scipy` implementations, and `voigt.pdf` still profits from auto-parallelization.
+
+The `bernstein.density` does not profit from auto-parallelization, on the contrary it becomes much slower, so this should be avoided. This is a known issue, the internal implementation cannot be easily auto-parallelized.
+
 ![](docs/_static/norm.pdf.svg)
 ![](docs/_static/norm.cdf.svg)
 ![](docs/_static/norm.ppf.svg)
@@ -87,10 +91,8 @@ The following benchmarks were produced on an Intel(R) Core(TM) i7-8569U CPU @ 2.
 ![](docs/_static/truncexpon.ppf.svg)
 ![](docs/_static/voigt.pdf.svg)
 ![](docs/_static/bernstein.density.svg)
-
-The functions `voigt.pdf`, `t.cdf`, and `t.ppf` do not run faster than the `scipy` versions, because we call the respective `scipy` implementation written in FORTRAN. The advantage provided by `numba_stats` here is that you can call these functions from other `numba`-JIT'ed functions, which is not possible with the `scipy` implementations.
-
-The `bernstein.density` does not profit from auto-parallelization, on the contrary it becomes much slower. This is under investigation.
+![](docs/_static/bernstein.density.svg)
+![](docs/_static/truncexpon.pdf.plus.norm.pdf.svg)
 
 ## Documentation
 

diff --git a/bench/plot.ipynb b/bench/plot.ipynb
diff --git a/bench/test_stats.py b/bench/test_stats.py
@@ -159,3 +159,38 @@ def method(x, beta, xmin, xmax):
     # warm-up JIT
     method(x, beta, xmin, xmax)
     benchmark(method, x, beta, xmin, xmax)
+
+
+@pytest.mark.parametrize("n", N)
+@pytest.mark.parametrize("lib", ("scipy", "ours", "ours:parallel,fastmath"))
+def test_speed_truncexpon_pdf_plus_norm_pdf(benchmark, lib, n):
+    x = np.linspace(0, 1, n)
+    rng = np.random.default_rng(1)
+    rng.shuffle(x)
+
+    xmin = np.min(x)
+    xmax = np.max(x)
+
+    if lib == "scipy":
+        from scipy.stats import norm, truncexpon
+
+        def method(x, z, mu, sigma, slope):
+            p1 = truncexpon.pdf(x, xmax, xmin, slope)
+            p2 = norm.pdf(x, mu, sigma)
+            return (1 - z) * p1 + z * p2
+
+    else:
+        from numba_stats import norm, truncexpon
+
+        def method(x, z, mu, sigma, slope):
+            p1 = truncexpon.pdf(x, xmin, xmax, 0.0, slope)
+            p2 = norm.pdf(x, mu, sigma)
+            return (1 - z) * p1 + z * p2
+
+        if lib == "ours:parallel,fastmath":
+            method = nb.njit(parallel=True, fastmath=True)(method)
+
+    # warm-up JIT
+    args = 0.5, 0.5, 0.1, 1.0
+    method(x, *args)
+    benchmark(method, x, *args)