π Data β’ Infrastructure β’ ML Systems Engineer
βοΈ HPC | Analytics Engineering | Automation | Research Computing
I build scalable data and compute systems that transform complex datasets into reliable, reproducible, and production-ready workflows.
My work spans GPU cloud infrastructure, high-performance computing, machine learning pipelines, and large-scale analytics systems across both research and production environments.
- π§ GIS-linked Electronic Health Record (EHR) analytics pipelines
- π€ Automation systems integrating APIs, structured logging, and alerts
- β‘ Reproducible SLURM & HPC data processing workflows
- π Applied machine learning and causal inference analysis
- ποΈ Production-style research infrastructure & validation systems
Enterprise-style automation platform integrating compliant web scraping, territory mapping, structured logging, and workflow alerting.
Tech: Python β’ APIs β’ ETL β’ Automation β’ Data Validation
Reusable Python/Bash wrappers enabling scalable execution of large data-processing workloads across HPC environments.
Tech: Python β’ Bash β’ SLURM β’ HPC
π https://github.com/DCAN-Labs/SLURM_wrappers
FAIR-compliant container-linking system enabling reproducible scientific pipelines across heterogeneous compute environments.
Tech: Python β’ Docker β’ Singularity
π https://github.com/DCAN-Labs/CABINET
Containerized deep learning application improving performance and accuracy for large-scale neuroimaging pipelines.
Tech: Python β’ Deep Learning β’ Containers β’ HPC
π https://github.com/DCAN-Labs/BIBSnet
Google Apps Script tools automating workflow tracking, dynamic link generation, and real-time updates.
Tech: JavaScript β’ Google Apps Script β’ Node.js
Python β’ SQL β’ Bash β’ JavaScript β’ R β’ MATLAB
Pandas β’ NumPy β’ Statistical Modeling
Predictive Modeling β’ Causal Inference
Benchmarking β’ Data Validation
Kubernetes β’ SLURM β’ Docker β’ Singularity
AWS S3 β’ Ceph β’ Distributed Compute Systems
Grafana β’ Prometheus β’ Linux/Unix Systems
ETL Pipelines β’ API Integrations
Workflow Automation β’ CI/CD Concepts
GitHub Actions
β
Provisioned and maintained thousands of NVIDIA GPU nodes in production Kubernetes clusters
β
Built pipelines processing petabyte-scale datasets
β
Achieved 600Γ performance improvements in imaging workflows
β
Reduced runtimes 10β12Γ through HPC optimization
β
Designed automation systems reducing manual operational overhead
β
Led documentation & reproducibility initiatives across multi-team environments
- Data Engineering
- Machine Learning Infrastructure
- Research Engineering
- Analytics Engineering
- HPC & Distributed Systems
- Python Backend & Automation
- πΌ LinkedIn: https://linkedin.com/in/audreyhoughton
- π§ Email: audreymhoughton@gmail.com


