Skip to content

Proposal for variance and centering functions #137

Closed
@jvdp1

Description

@jvdp1

Based on discussions in #113, #3, #128, I would like to propose the following addition to stdlib_experimental_stats:

var - variance of array elements

Description

Returns the variance of all the elements of array, or of the elements of array along dimension dim if provided, and if the corresponding element in mask is true.

The variance is defined as the best unbiased estimator and is computed as:

 var(x) = 1/(n-1) sum_i (array(i) - mean(array))^2

Syntax

result = var(array [, mask])

result = var(array, dim [, mask])

Arguments

array: Shall be an array of type integer, or real.

dim: Shall be a scalar of type integer with a value in the range from 1 to n, where n is the rank of array.

mask (optional): Shall be of type logical and either by a scalar or an array of the same shape as array.

Return value

If array is of type real, the result is of the same type as array.
If array is of type integer, the result is of type double precision.

If dim is absent, a scalar with the variance of all elements in array is returned. Otherwise, an array of rank n-1, where n equals the rank of array, and a shape similar to that of ar ray with dimension dim dropped is returned.

If mask is specified, the result is the variance of all elements of array corresponding to true elements of mask. If every element of mask is false, the result is IEEE NaN.

Example

program demo_mean
    use stdlib_experimental_stats, only: var
    implicit none
    real :: x(1:6) = [ 1., 2., 3., 4., 5., 6. ]
    print *, var(x)                            !returns __TOBECOMPLETED__
    print *, var( reshape(x, [ 2, 3 ] ))       !returns __TOBECOMPLETED__ 
    print *, var( reshape(x, [ 2, 3 ] ), 1)    !returns [__TOBECOMPLETED__]
    print *, var( reshape(x, [ 2, 3 ] ), 1,&
                  reshape(x, [ 2, 3 ] ) > 3.)  !returns [__TOBECOMPLETED__]
end program demo_mean

To be discussed (not exhaustive):

  • Based on discussions in Style guide #3, I suggest to first implement a two-pass algorithm. Other algorithms can be implemented later, as proposed in Trade-off between efficiency and robustness/accuracy #134. Allowing dim and mask in the API will not lead to a function as simple as in #3 comment.

  • The centering of an array along a dimension (e.g., x(:, i) - mean(x, 2)) will most likely require a loop. To have a clean implementation of the function var, I propose to add a function center to perform the different centering of an array x, and var would call it for the centering. However, I am afraid about efficiency (especially memory usage since an additional temporary array could be needed for the function center) with this proposition.

  • The proposed name for the variance function is var. But what about variance (or other propositions)?

Others:
Octave var
R var
Julia var
Numpy var

Requesting feedback from (at least) @certik @milancurcic @ivan-pi @aradi @leonfoks

Activity

added
topic: mathematicslinear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...
ideaProposition of an idea and opening an issue to discuss it
on Feb 2, 2020
added
implementationImplementation in experimental and submission of a PR
and removed
ideaProposition of an idea and opening an issue to discuss it
on Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    implementationImplementation in experimental and submission of a PRtopic: mathematicslinear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jvdp1

        Issue actions

          Proposal for variance and centering functions · Issue #137 · fortran-lang/stdlib