Description
Based on discussions in #113, #3, #128, I would like to propose the following addition to stdlib_experimental_stats
:
var
- variance of array elements
Description
Returns the variance of all the elements of array
, or of the elements of array
along dimension dim
if provided, and if the corresponding element in mask
is true
.
The variance is defined as the best unbiased estimator and is computed as:
var(x) = 1/(n-1) sum_i (array(i) - mean(array))^2
Syntax
result = var(array [, mask])
result = var(array, dim [, mask])
Arguments
array
: Shall be an array of type integer
, or real
.
dim
: Shall be a scalar of type integer
with a value in the range from 1 to n, where n is the rank of array
.
mask
(optional): Shall be of type logical
and either by a scalar or an array of the same shape as array
.
Return value
If array
is of type real
, the result is of the same type as array
.
If array
is of type integer
, the result is of type double precision
.
If dim
is absent, a scalar with the variance of all elements in array
is returned. Otherwise, an array of rank n-1, where n equals the rank of array
, and a shape similar to that of ar ray
with dimension dim
dropped is returned.
If mask
is specified, the result is the variance of all elements of array
corresponding to true
elements of mask
. If every element of mask
is false
, the result is IEEE NaN
.
Example
program demo_mean
use stdlib_experimental_stats, only: var
implicit none
real :: x(1:6) = [ 1., 2., 3., 4., 5., 6. ]
print *, var(x) !returns __TOBECOMPLETED__
print *, var( reshape(x, [ 2, 3 ] )) !returns __TOBECOMPLETED__
print *, var( reshape(x, [ 2, 3 ] ), 1) !returns [__TOBECOMPLETED__]
print *, var( reshape(x, [ 2, 3 ] ), 1,&
reshape(x, [ 2, 3 ] ) > 3.) !returns [__TOBECOMPLETED__]
end program demo_mean
To be discussed (not exhaustive):
-
Based on discussions in Style guide #3, I suggest to first implement a two-pass algorithm. Other algorithms can be implemented later, as proposed in Trade-off between efficiency and robustness/accuracy #134. Allowing
dim
andmask
in the API will not lead to a function as simple as in #3 comment. -
The centering of an array along a dimension (e.g.,
x(:, i) - mean(x, 2)
) will most likely require a loop. To have a clean implementation of the functionvar
, I propose to add a functioncenter
to perform the different centering of an arrayx
, andvar
would call it for the centering. However, I am afraid about efficiency (especially memory usage since an additional temporary array could be needed for the functioncenter
) with this proposition. -
The proposed name for the variance function is
var
. But what aboutvariance
(or other propositions)?
Others:
Octave var
R var
Julia var
Numpy var
Requesting feedback from (at least) @certik @milancurcic @ivan-pi @aradi @leonfoks
Activity