PATHOS:Predicting Variant Pathogenicity by Combining Protein Language Models and Biological Features
PATHOS (PATHOgenicity Scoring) is a deep learning-based tool designed to predict the pathogenicity of protein variants directly from sequence data. The tool analyzes mutations and predicts their pathogenic potential using advanced machine learning models. PATHOS provides pathogenicity scores with a threshold of 0.63, where scores above this threshold indicate likely pathogenic variants. The tool processes multiple variants simultaneously, providing reliable pathogenicity assessments for clinical and research applications. Pre-computed scores for 139M+ variants across 17,574 human proteins are available.
# Query a specific mutation
python query_pathos.py --protein P16501 --mutation M1A
# Single protein
python query_pathos.py --protein P16501 --limit 10
# Query from input file
python query_pathos.py --file example_input.txt
# Export high-confidence pathogenic variants
python query_pathos.py --protein P16501 --min-score 0.9 --output results.csv
# Database statistics
python query_pathos.py --statsPrerequisites: Python 3.8+ (no dependencies), database included (9.7 GB)
P16501 # All mutations
Q9Y6X3 M1A M1C # Specific mutations
P10635 R56V # Specific mutations
- < 0.63: BENIGN (likely harmless)
- ≥ 0.63: PATHOGENIC (likely disease-causing)
--protein, -p UniProt ID
--file, -f Input file path
--mutation, -m Specific mutation
--min-score Minimum score threshold
--max-score Maximum score threshold
--limit, -l Max number of results
--output, -o Export to CSV
--stats Database statistics
--list-proteins List all proteins
[Citation information to be added]
For questions or support: [email protected]