DNAmetabarcoding

data processing for DNA metabarcoding

This repo currently consists of a single data processing script: contingency.py

General instructions that guided the development of the script are found in: contingency.table.flow.txt

The script currently takes ~8 minutes to run, and could be significantly optimized if needed. The script depends on an input directory existing at the same level, that includes:

listofclusters_AMD18S_swarm_fastidious_shortnames.txt
test_shortnames.tab
statistics_swarm_fastidious_shortnames.txt
ID_removal_list.txt

The output directory will be created if it does not exist. Final output will be written to: output/final_contingency.txt (tab-delimited text file)

The output and input directories are included in .gitignore, therefore they will not be pushed to this repo.

The packages needed to run the code are listed in requirements.txt. Note that these packages include numba and dependencies, which can be used to speed up the processing time for this code (future task). To get started all you really should need is pandas, which can be installed with pip install pandas (numpy will also be installed as a dependency of pandas). If you are running conda or miniconda, you should be good to just run the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNAmetabarcoding

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
contingency.py		contingency.py
contingency.table.flow.txt		contingency.table.flow.txt
requirements.txt		requirements.txt

GEUS-Glaciology-and-Climate/DNAmetabarcoding

Folders and files

Latest commit

History

Repository files navigation

DNAmetabarcoding

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages