29 Jan 09:26

tschuelia

58136a8

v2.0.1 Latest

Latest

Minor bug fixes:

fix an issue that resulted in bootstrapping not being properly terminated if convergence was detected
fix an issue with dataclass export
use re.split instead of str.split for more robust behavior

Assets 2

19 Apr 11:33

tschuelia

2.0.0

ad4b6f5

v2.0.0

Major changes

MDS implemenation

One of our beta-testers discovered some issues with the MDS implementation in Pandora. So far we used the scikit-learn MDS implementation which implements a solver that is best suited for non-metric MDS (and mostly suited for small matrices). However, scikit-learn uses the same solver for metric MDS (independent of the size of the data) as well, resulting in unexpected results (and sometimes does not find a solution at all resulting in circular MDS embeddings). These issues are known issues in scikit-learn (scikit-learn/scikit-learn#18933, scikit-learn/scikit-learn#16846, scikit-learn/scikit-learn#11381, scikit-learn/scikit-learn#15272), and a PR implementing an alternative standard SVD solver for MDS remains unmerged for about 1.5 years now (scikit-learn/scikit-learn#22330).
To prevent these issues in Pandora, we switched to the PCoA implementation in scikit-allel (which implements a standard SVD solver for metric MDS). This resulted in the following additional changes:

The MDS and PCA classes in embedding.py are now one unified as one class called Embedding .
MDS results don’t have the stress attribute anymore, but the explained variance per dimension similar to PCA results (scikit-allel’s PCoA provides the explained variance rations rather than a stress factor which is more informative anyway).
All MDS plots now show the explained variance per dimension similar to PCA plots (instead of the stress).

CLI flag + variable naming

we renamed the bootstrap_convergence_confidence_level to bootstrap_convergence_tolerance to follow the terminology of our paper

Minor changes

We improved the implementation of the missing_corrected_hamming_distance resulting in a 100x speedup

Bug fixes

We fixed a bug causing the FST distance matrix in the EigenDataset to be recomputed independent of the redo flag

Contributions

Thanks Lucas for testing Pandora and reporting all issues 🙂

Assets 2

12 Dec 13:26

tschuelia

1.0.8

33dc63e

v1.0.8

Bug Fixes:

Fix an issue checking for string occurrences in Pandas Series
set the random state for the scikit-learn MDS computation for reproducible results

Improvements:

New documentation site: a Jupyter notebook with an example of a more thorough inspection of Pandora results
set the default smartpca path to 'smartpca' in EigenDataset::run_pca

New Feature:

HTML export of all plots when plot_results: true in the Pandora config file: the HTML exports can be opened in any browser and they provide interactive exploration of plots leveraging the full power of Plotly 🙂

Assets 2

23 Nov 21:20

tschuelia

1.0.7

2bbd5d0

v1.0.7

Fix bug in MDS computation for NumpyDatasets with missing data

Assets 2

22 Nov 16:47

tschuelia

1.0.6

2da791e

v1.0.6

Fix a bug in the parallel bootstrap code: in some cases not all processes were terminated properly.

Assets 2

20 Nov 16:10

tschuelia

1.0.5

d895a14

v1.0.5

Changes:

We changed the dtype of the input matrix of the NumpyDataset to uint8 instead of float64. The idea here is that genotype data usually only comprises four values: 0, 1, 2, and np.nan. Setting the dtype to the standard numpy float64 results in a huge memory overhead during bootstrapping. So instead, we change the default type to uint8, but allow the user to change the type in case the input data requires another data type as it e.g. comprises of more than four values not fitting the default uint8 type.
Note that in case of missing data and a non-float dtype, the missing value will not be np.nan. For more details see the documentation.
For easier use of the Pandora library, we provide more default settings for multiple bootstrap and PCA computation methods:
- bootstrap_and_embed_multiple and bootstrap_and_embed_multiple_numpy
  - embedding = EmbeddingAlgorithm.PCA
  - n_components = 10
  - n_bootstraps = 100
  - smartpca = "smartpca"
  - result_dir = same directory as the input data
- bootstrap_and_embed_muliple_numpy
  - embedding = EmbeddingAlgorithm.PCA
  - n_components = 10
  - n_bootstraps = 100
- EigenDataset.bootstrap and NumpyDataset.bootstrap : seed = None
- EigenDataset.run_pca : smartpca = "smartpca"

Bug Fixes:

When computing the FST Matrix for the MDS Analysis for an EigenDataset, smartpca will ignore all samples with population Ignore. So far, this caused a failure in the MDS computation as Pandora was not aware of it. We now adapted the Ignore logic from smartpca and remove samples with Ignore population.
Dashes in population names caused an issue when computing the FST-Distance matrix for an EigenDataset with smartpca

Assets 2

07 Nov 13:00

tschuelia

1.0.4

0ba8215

v1.0.4

A race condition in the python multiprocessing library sometimes caused unexpected AttributeErrors during the bootstrap process creation. The bug is fixed in Python 3.12, we added the fix as backport for older Python versions.
We replaced np.empty by np.zeros in the hamming distance computation.
We changed the np.sum invocation with generators to explicit lists (np.sum from generators is deprecated)
Documentation updates

Assets 2

03 Nov 21:00

tschuelia

1.0.3

6df846b

v1.0.3

Remove test_config.py and manually setting the smartpca and convertf paths.

The update of bioconda's eigensoft recipe enables eigensoft installation on osx-arm64 as well, so no explicit path setting should be necessary (bioconda/bioconda-recipes#44082). This also enables me to run the tests in the conda-forge recipe.

Assets 2

03 Nov 20:41

tschuelia

1.0.2

2d1ea13

v1.0.2

Forgot to bump the version number in the previous release...

_{^{I don't always deploy on Fridays, but when I do it takes me three attempts...well, hopefully only three.}}

Assets 2

03 Nov 20:33

tschuelia

1.0.1

ed46c84

v1.0.1

Add License

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major changes

MDS implemenation

CLI flag + variable naming

Minor changes

Bug fixes

Contributions

Bug Fixes:

Improvements:

New Feature:

Changes:

Bug Fixes:

Releases: tschuelia/Pandora

v2.0.1

v2.0.0

Major changes

MDS implemenation

CLI flag + variable naming

Minor changes

Bug fixes

Contributions

v1.0.8

Bug Fixes:

Improvements:

New Feature:

v1.0.7

v1.0.6

v1.0.5

Changes:

Bug Fixes:

v1.0.4

v1.0.3

v1.0.2

v1.0.1