Statistical Processing of attributes via Recursive Cross Elimination

SPARCE

The sparce software is a statistical machine learning software that automates feature seleciton in genomics data files. The software was originally outiftted for general use in genetics, transcirptomics, methylomics and ATAC-seq data.

Installation

conda create -n sparce pip
conda activate sparce

pip install sparce

HOW TO RUN

'''
Run inside script
'''


import sparce
from sparce import feature_selection
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

def preprocess(file): 
  X = pd.read_csv('file')
  enc = OrdinalEncoder()
  enc.fit(X['a column in X'])
  X['a column in X'] = enc.transform(X['a column in X'])
  y = X['a column in X']
  X = X.drop('a column in X', axis = 1)
  
  return X,y

X, y = preprocess(file)

nFeatures = 5
nJobs = 10

CV = feature_selection.grade_features(X = X, y = y, nFeatures = nFeatures , nJobs = nJobs)

CLI

Clone the repository and re-invoke the main function. import args_parse into the sparce.py Ready to run in the cli

python sparce.py -x <file> -y <target> -nFeatures <int> -nJobs <int>

conda deactivate sparce

sparce assumptions

The data is in tidy format where (Features x samples) with a column labeled "target" The features are continuous attributes in a classificaiton problem The classes are mutually exclusive nFeatures > nSamples, you are attempting to reduce the dimensionality of the problem to produce nSamples > nFeatures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Statistical Processing of attributes via Recursive Cross Elimination

CLI

Files

README.md

Latest commit

History

README.md

File metadata and controls

Statistical Processing of attributes via Recursive Cross Elimination

CLI