This tool provides a method for retrieving figures from NCBI's PMC publications using the Entrez API. The tool systematically searches for publications related to specific plant species and downloads associated figures for research and analysis purposes.
- Automated Species Search: Searches for publications related to 30+ plant species
- Figure Extraction: Downloads high-quality figures from PMC articles
- Resume Capability: Caches processed PMC IDs to resume interrupted downloads
- Rate Limiting: Respects NCBI API limits (3 requests/second, 10 with API key)
- Batch Processing: Efficiently processes thousands of articles per species
- Organized Output: Structures downloaded figures by species and publication ID
This code is maintained for educational and historical reference purposes only. The tool was originally developed for academic research. Please note that the use of this tool for retrieving figures from PMC publications is subject to NCBI's policies. Use at own risk.
- Node.js: Version 20 or higher
- RAM: 4GB minimum
- Internet: Stable connection with >7MB/s download speed
Clone the repository and install dependencies:
git clone https://github.com/AlexJSully/Publication-Figure-Retrieval.git
cd Publication-Figure-Retrieval
npm ciTo increase API rate limits from 3 to 10 requests per second, obtain an NCBI API key:
- Visit NCBI API Key Documentation
- Create a
.envfile in the project root:
NCBI_API_KEY=your_api_key_hereStart the figure retrieval process:
npm run startThe tool will:
- Process each species from
src/data/species.json - Search PMC for related articles
- Download figures to
build/output/[species_name]/ - Cache progress in
build/output/cache/id.json
If interrupted, simply run npm run start again. The tool will:
- Check the cache for already processed PMC IDs
- Resume from where it left off
- Skip duplicate downloads
To reset and start fresh, delete the cache file:
rm build/output/cache/id.jsonDownloaded figures are organized in a structured hierarchy:
build/output/
├── cache/
│ └── id.json # Cached PMC IDs for resume capability
├── Arabidopsis_thaliana/
│ ├── PMC123456/
│ │ ├── figure1.jpg
│ │ ├── figure2.png
│ │ └── metadata.json # Article metadata
│ └── PMC789012/
│ └── figure1.svg
├── Cannabis_sativa/
│ └── PMC345678/
│ ├── figure1.jpg
│ └── figure2.tiff
└── [other_species]/
We aim to make this tool as perfect as possible but unfortunately, there may be some unforeseen bugs. If you manage to find one that is not here, feel free to create a bug report so we can fix it.
- None at the moment... Help us find some!
For comprehensive documentation, see the docs/ folder:
- Getting Started - Complete overview and setup guide
- Architecture - Technical architecture and design decisions
- Usage Examples - Detailed usage examples and troubleshooting
- API Reference - Complete API documentation
- Contributing - Development setup and contribution guidelines
This project is currently in maintenance mode. This means that:
- Only critical bug fixes and security updates will be addressed.
- New feature requests are unlikely to be implemented.
If you want to support my work, you can do so through the following methods: