Skip to content

Commit 538fec6

Browse files
committed
Release 0.1.2. Minor updates / bug fixes + Adding containers.
1 parent e868877 commit 538fec6

11 files changed

+328
-62
lines changed

Diff for: CHANGELOG.md

+14
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.1.2] - 2024-04-24
9+
10+
### Changed
11+
12+
- Fixed `manhattan` plot implementation to support various new features.
13+
- Added a warning when accessing `csr_matrix` property of `LDMatrix` when it hasn't been loaded
14+
previously.
15+
16+
### Added
17+
18+
- `reset_mask` method for magenpy `LDMatrix`.
19+
- `Dockerfile`s for both `cli` and `jupyter` modes.
20+
- A helper script to convert LD matrices from old format to new format.
21+
822
## [0.1.1] - 2024-04-12
923

1024
### Changed

Diff for: containers/cli.Dockerfile

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Usage:
2+
# ** Step 1 ** Build the docker image:
3+
# docker build -f cli.Dockerfile -t magenpy-cli .
4+
# ** Step 2** Run the docker container in interactive shell mode:
5+
# docker run -it magenpy-cli /bin/bash
6+
# ** Step 3** Test magenpy_ld:
7+
# magenpy_ld -h
8+
9+
FROM python:3.11-slim-buster
10+
11+
LABEL authors="Shadi Zabad"
12+
LABEL version="0.1"
13+
LABEL description="Docker image containing all requirements to run the commandline scripts in the magenpy package"
14+
15+
# Install system dependencies
16+
RUN apt-get update && apt-get install -y \
17+
unzip \
18+
wget \
19+
pkg-config \
20+
g++ gcc \
21+
libopenblas-dev \
22+
libomp-dev
23+
24+
# Download and setup plink2:
25+
RUN mkdir -p /software && \
26+
wget https://s3.amazonaws.com/plink2-assets/alpha5/plink2_linux_avx2_20240105.zip -O /software/plink2.zip && \
27+
unzip /software/plink2.zip -d /software && \
28+
rm /software/plink2.zip
29+
30+
# Download and setup plink1.9:
31+
RUN mkdir -p /software && \
32+
wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20231211.zip -O /software/plink.zip && \
33+
unzip /software/plink.zip -d /software && \
34+
rm /software/plink.zip
35+
36+
# Add plink1.9 and plink2 to PATH:
37+
RUN echo 'export PATH=$PATH:/software' >> ~/.bashrc
38+
39+
# Install magenpy package from PyPI
40+
RUN pip install --upgrade pip magenpy
41+
42+
# Test the installation
43+
RUN magenpy_ld -h

Diff for: containers/jupyter.Dockerfile

+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Usage:
2+
# ** Step 1 ** Build the docker image:
3+
# docker build -f ../vemPRS/containers/jupyter.Dockerfile -t magenpy-jupyter .
4+
# ** Step 2 ** Run the docker container (pass the appropriate port):
5+
# docker run -p 8888:8888 magenpy-jupyter
6+
# ** Step 3 ** Open the link in your browser:
7+
# http://localhost:8888
8+
9+
10+
FROM python:3.11-slim-buster
11+
12+
LABEL authors="Shadi Zabad"
13+
LABEL version="0.1"
14+
LABEL description="Docker image containing all requirements to run the magenpy package in a Jupyter Notebook"
15+
16+
# Install system dependencies
17+
RUN apt-get update && apt-get install -y \
18+
unzip \
19+
wget \
20+
pkg-config \
21+
g++ gcc \
22+
libopenblas-dev \
23+
libomp-dev
24+
25+
# Download and setup plink2:
26+
RUN mkdir -p /software && \
27+
wget https://s3.amazonaws.com/plink2-assets/alpha5/plink2_linux_avx2_20240105.zip -O /software/plink2.zip && \
28+
unzip /software/plink2.zip -d /software && \
29+
rm /software/plink2.zip
30+
31+
# Download and setup plink1.9:
32+
RUN mkdir -p /software && \
33+
wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20231211.zip -O /software/plink.zip && \
34+
unzip /software/plink.zip -d /software && \
35+
rm /software/plink.zip
36+
37+
# Add plink1.9 and plink2 to PATH:
38+
RUN echo 'export PATH=$PATH:/software' >> ~/.bashrc
39+
40+
# Install magenpy package from PyPI
41+
RUN pip install --upgrade pip magenpy jupyterlab
42+
43+
# Expose the port Jupyter Lab will be served on
44+
EXPOSE 8888
45+
46+
# Set the working directory
47+
WORKDIR /magenpy_dir
48+
49+
# Copy the current directory contents into the container at /app
50+
COPY . /magenpy_dir
51+
52+
# Run Jupyter Lab
53+
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root", "--NotebookApp.token=''"]

Diff for: docs/installation.md

+18
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,21 @@ source magenpy_env/bin/activate
5858
python -m pip install --upgrade pip
5959
python -m pip install magenpy>=0.1
6060
```
61+
62+
### Using `Docker` containers
63+
64+
If you are using `Docker` containers, you can build a container with the `viprs` package
65+
and all its dependencies by downloading the relevant `Dockerfile` from the
66+
[repository](https://github.com/shz9/magenpy/tree/master/containers) and building it
67+
as follows:
68+
69+
```bash
70+
# Build the docker image:
71+
docker build -f cli.Dockerfile -t magenpy-cli .
72+
# Run the container in interactive mode:
73+
docker run -it magenpy-cli /bin/bash
74+
# Test that the package installed successfully:
75+
magenpy_ld -h
76+
```
77+
78+
We plan to publish pre-built `Docker` images on `DockerHub` in the future.

Diff for: examples/convert_old_ld_matrices.py

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
"""
2+
This is a utility script that converts the old-style published LD matrices (magenpy 0.0.X) to the new
3+
format deployed since magenpy>=0.1. The old LD matrix format used ragged Zarr arrays, while the new format
4+
uses flattened Zarr arrays that are more efficient and easier to work with. The script takes the path to the
5+
old LD matrices and converts them to the new format with the desired precision (e.g. float32).
6+
7+
The user may also specify the compressor name and compression level for the new LD matrices.
8+
The script will validate the conversion by checking the integrity of the new LD matrices.
9+
10+
Usage:
11+
12+
python convert_old_ld_matrices.py --old-matrix-path /path/to/old/ld_matrices/chr_* \
13+
--new-path /path/to/new/ld_matrices/ \
14+
--dtype float32
15+
16+
"""
17+
18+
import magenpy as mgp
19+
from magenpy.utils.system_utils import makedir
20+
import zarr
21+
import os.path as osp
22+
import glob
23+
import argparse
24+
25+
26+
parser = argparse.ArgumentParser(description="""
27+
Convert old-style LD matrices (magenpy 0.0.X) to the new format (magenpy >=0.1).
28+
""")
29+
30+
parser.add_argument('--old-matrix-path', dest='old_path', type=str, required=True,
31+
help='The path to the old LD matrix. Can be a wild card of the form "path/to/chr_*"')
32+
parser.add_argument('--new-path', dest='new_path', type=str, required=True,
33+
help='The path where to store the new LD matrix.')
34+
parser.add_argument('--dtype', dest='dtype', type=str, default='int16',
35+
choices={'int8', 'int16', 'float32', 'float64'},
36+
help='The desired data type for the entries of the new LD matrix.')
37+
parser.add_argument('--compressor', dest='compressor', type=str, default='zstd',
38+
help='The compressor name for the new LD matrix.')
39+
parser.add_argument('--compression-level', dest='compression_level', type=int, default=9,
40+
help='The compression level for the new LD matrix.')
41+
42+
args = parser.parse_args()
43+
44+
for f in glob.glob(args.old_path):
45+
46+
try:
47+
z_arr = zarr.open(f, 'r')
48+
chrom = z_arr.attrs['Chromosome']
49+
except Exception as e:
50+
print(f"Error: {e}")
51+
continue
52+
53+
print(f"> Converting LD matrix for chromosome: {chrom}")
54+
55+
new_path_suffix = f'chr_{chrom}'
56+
if new_path_suffix not in args.new_path:
57+
new_path = osp.join(args.new_path, new_path_suffix)
58+
else:
59+
new_path = args.new_path
60+
61+
makedir(new_path)
62+
63+
ld_mat = mgp.LDMatrix.from_ragged_zarr_matrix(f,
64+
new_path,
65+
overwrite=True,
66+
dtype=args.dtype,
67+
compressor_name=args.compressor,
68+
compression_level=args.compression_level)
69+
print("Valid conversion:", ld_mat.validate_ld_matrix())

Diff for: magenpy/LDMatrix.py

+27-8
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import os.path as osp
33
import numpy as np
44
import pandas as pd
5+
import warnings
56
from scipy.sparse import csr_matrix, identity, triu, diags
67
from .utils.model_utils import quantize, dequantize
78

@@ -712,27 +713,32 @@ def window_size(self):
712713
@property
713714
def n_neighbors(self):
714715
"""
715-
The number of variants in the LD window for each SNP.
716-
717716
!!! seealso "See Also"
718717
* [window_size][magenpy.LDMatrix.LDMatrix.window_size]
719718
720719
!!! note
721720
This includes the variant itself if the matrix is in memory and is symmetric.
722721
722+
:return: The number of variants in the LD window for each SNP.
723+
723724
"""
724725
return self.window_size()
725726

726727
@property
727728
def csr_matrix(self):
728729
"""
729-
:return: The in-memory CSR matrix object.
730-
731730
..note ::
732731
If the LD matrix is not in-memory, then it'll be loaded using default settings.
732+
This means that the matrix will be loaded as upper-triangular matrix with
733+
default data type. To customize the loading, call the `.load(...)` method before
734+
accessing the CSR matrix in this way.
733735
736+
:return: The in-memory CSR matrix object.
734737
"""
735738
if self._mat is None:
739+
warnings.warn("> Warning: Loading LD matrix with default settings. "
740+
"To customize, call the `.load(...)` method before invoking `.csr_matrix`.",
741+
stacklevel=2)
736742
self.load()
737743
return self._mat
738744

@@ -833,7 +839,20 @@ def set_mask(self, mask):
833839
if self.in_memory:
834840
self.load(force_reload=True,
835841
return_symmetric=self.is_symmetric,
836-
fill_diag=self.is_symmetric)
842+
fill_diag=self.is_symmetric,
843+
dtype=self.dtype)
844+
845+
def reset_mask(self):
846+
"""
847+
Reset the mask to its default value (None).
848+
"""
849+
self._mask = None
850+
851+
if self.in_memory:
852+
self.load(force_reload=True,
853+
return_symmetric=self.is_symmetric,
854+
fill_diag=self.is_symmetric,
855+
dtype=self.dtype)
837856

838857
def to_snp_table(self, col_subset=None):
839858
"""
@@ -1409,11 +1428,11 @@ def validate_ld_matrix(self):
14091428
return True
14101429

14111430
def __getstate__(self):
1412-
return self.store.path, self.in_memory, self.is_symmetric, self._mask
1431+
return self.store.path, self.in_memory, self.is_symmetric, self._mask, self.dtype
14131432

14141433
def __setstate__(self, state):
14151434

1416-
path, in_mem, is_symmetric, mask = state
1435+
path, in_mem, is_symmetric, mask, dtype = state
14171436

14181437
self._zg = zarr.open_group(path, mode='r')
14191438
self.in_memory = in_mem
@@ -1426,7 +1445,7 @@ def __setstate__(self, state):
14261445
self.set_mask(mask)
14271446

14281447
if in_mem:
1429-
self.load(return_symmetric=is_symmetric, fill_diag=is_symmetric)
1448+
self.load(return_symmetric=is_symmetric, fill_diag=is_symmetric, dtype=dtype)
14301449

14311450
def __len__(self):
14321451
return self.n_snps

Diff for: magenpy/SumstatsTable.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ def p_value(self):
310310
return self.pval
311311

312312
@property
313-
def log10_p_value(self):
313+
def negative_log10_p_value(self):
314314
"""
315315
:return: The negative log10 of the p-value (-log10(p_value)) of association
316316
test of each variant on the phenotype.
@@ -623,7 +623,7 @@ def to_table(self, col_subset=None):
623623
elif col == 'PVAL':
624624
table['PVAL'] = self.p_value
625625
elif col == 'LOG10_PVAL':
626-
table['LOG10_PVAL'] = self.log10_p_value
626+
table['NLOG10_PVAL'] = self.negative_log10_p_value
627627
elif col == 'CHISQ':
628628
table['CHISQ'] = self.get_chisq_statistic()
629629
elif col == 'MAF_VAR':

Diff for: magenpy/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
from .utils.data_utils import *
1818

19-
__version__ = '0.1.1'
19+
__version__ = '0.1.2'
2020
__release_date__ = 'April 2024'
2121

2222

0 commit comments

Comments
 (0)