This library provides a class for storing values with errors and nicely printing them. It also facilitates working with CI (confidence/credible intervals).
from ValueWithError import ImplValueWithError, ValueWithErrorVec
a = ImplValueWithError(1.0, 0.1)
print(a)
# Prints: 1.00 ± 0.10
print(a.get_CI95()) # 95% confidence interval calculated assuming normal distribution.
# Prints: CI_95%: (0.80, 1.20)
a = ImplValueWithError(1.0, 0.1,
N=5) # N is the number of samples used to calculate the value. It is used by the CI calculation.
print(a.get_CI95())
# Prints: CI_95%: (0.72, 1.28) - a bigger interval because of the smaller N
import numpy as np
vec = np.random.normal(123456, 10, 100)
b = ValueWithErrorVec(vec)
print(b)
# Prints: 123456 ± 11
print(b.get_CI95())
# Prints: CI_95%: (123436, 123476)
ValueWithErrorVec
is a class that calculates the value with error of a vector of values and stores that vector for potential later use.
Confidence intervals for the ValueWithErrorVec are calculated using percentiles, not the normal distribution.
Standard error is defined as standard deviation of the vector.
To get the standard error of the mean, use the .estimateMean()
method to get ValueWithError
that represents estimate of Mean.
This estimator is assumed to be normal/student.
Similarly .estimateSE()
returns the standard error of the mean estimator, which is, for simplicity, also assumed normal/student.
import numpy as np
from ValueWithError import ValueWithErrorVec
vec = np.random.normal(123456, 10, 100)
b = ValueWithErrorVec(vec)
print(b.estimateMean())
# Prints: 123456.0 ± 1.1
print(b.estimateSE())
# Prints: 11.39 ± 0.81
print(b.estimateMean().get_CI95())
# Prints: CI_95%: (123453.8, 123458.2)
print(b.estimateMean().get_CI(0.995))
# Prints: CI_99.5%: (123452.8, 123459.2)
If one needs small memory footprint, there's an online version of the calculation that doesn't ever store the vector. At the moment this method is not faster than the vector version, but it's more memory efficient.
import numpy as np
from timeit import timeit
from ValueWithError import ValueWithErrorVec, make_ValueWithError_from_generator
def random_generator(mean, std, size):
for i in range(size):
yield np.random.normal(mean, std)
method1 = lambda : ValueWithErrorVec([v for v in random_generator(123456, 10, 10000)], estimate_mean=True)
method2 = lambda : make_ValueWithError_from_generator(random_generator(123456, 10, 10000), estimate_mean=True)
print(timeit(method1, number=10))
# Prints: 0.17308157600928098
print(timeit(method2, number=10))
# Prints: 0.16921406201436184
ValueWithError class is designed to handle edge cases gracefully. For example, if the error is zero, the value is printed without the error:
from ValueWithError import ImplValueWithError
a = ImplValueWithError(1.0, 0.0)
print(a)
# Prints: 1.00
print(a.get_CI95())
# Prints: CI_95%: (1.0, 1.0)
It also handles NaN and Inf values:
from ValueWithError import ImplValueWithError
import numpy as np
a = ImplValueWithError(np.nan, 0.1)
print(a)
# Prints: NaN
print(a.get_CI95())
# Prints: CI_95%: (NaN, NaN)
b = ImplValueWithError(np.inf, 0.1)
print(b)
# Prints: ∞
print(b.get_CI95())
# Prints: CI_95%: (∞, ∞)
If one does not want to print the error, it can be suppressed:
from ValueWithError import ImplValueWithError
a = ImplValueWithError(1.0, None)
print(a)
# Prints: 1.0
a.get_CI95() is None
# Prints: True
By default, all the values are rounded to two significant digits. This default cannot be changed for the ValueWithError class, but it can be customized by calling the value_with_error_repr
function directly:
from ValueWithError import value_with_error_repr, CI_repr
print(value_with_error_repr(1.0, 0.1, significant_digit_se=3))
# Prints: 1.000 ± 0.100
print(CI_repr(0.123456789, 0.1987654321, significant_digit=3))
# Prints: (0.1235, 0.1988)
Internally ValueWithError stores the value, the error and optionally N - the number of samples used to calculate this statistic. ValueWithErrorCI additionally stores a single confidence interval. It is useful when one wants to store not only a value ± error, but also a single CI inteerval, e.g. when the distribution is not really normal.
If one does not care about the memory footprint, the ValueWithErrorVec should be used instead, as it stores the whole vector and can calculate any statistic on demand, including percentile CI.
from ValueWithError import ValueWithErrorCI
a = ValueWithErrorCI(1.0, 0.1, 0.9, 1.1)
print(a)
# Prints: "1.00 ± 0.10 CI_95%: (0.90, 1.10)"
b = ValueWithErrorCI(1.0, 0.1, 0.9, 1.1, ci_level=0.99)
print(b)
# Prints: "1.00 ± 0.10 CI_99%: (0.90, 1.10)"