Skip to content

Latest commit

 

History

History
82 lines (49 loc) · 1.63 KB

README.md

File metadata and controls

82 lines (49 loc) · 1.63 KB

PyPI version

Statistical Processing of attributes via Recursive Cross Elimination

SPARCE

The sparce software is a statistical machine learning software that automates feature seleciton in genomics data files. The software was originally outiftted for general use in genetics, transcirptomics, methylomics and ATAC-seq data.

Installation

conda create -n sparce pip
conda activate sparce
pip install sparce

HOW TO RUN

'''
Run inside script
'''


import sparce
from sparce import feature_selection
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

def preprocess(file): 
  X = pd.read_csv('file')
  enc = OrdinalEncoder()
  enc.fit(X['a column in X'])
  X['a column in X'] = enc.transform(X['a column in X'])
  y = X['a column in X']
  X = X.drop('a column in X', axis = 1)
  
  return X,y

X, y = preprocess(file)

nFeatures = 5
nJobs = 10

CV = feature_selection.grade_features(X = X, y = y, nFeatures = nFeatures , nJobs = nJobs)

CLI

Clone the repository and re-invoke the main function. import args_parse into the sparce.py Ready to run in the cli

python sparce.py -x <file> -y <target> -nFeatures <int> -nJobs <int>

conda deactivate sparce

sparce assumptions

The data is in tidy format where (Features x samples) with a column labeled "target" The features are continuous attributes in a classificaiton problem The classes are mutually exclusive nFeatures > nSamples, you are attempting to reduce the dimensionality of the problem to produce nSamples > nFeatures