Checks, standardizes, and upgrades .bib files automatically.
| Feature | Description |
|---|---|
| Venue standardization | Detects common spellings of conference/journal names and replaces them with canonical @String macros |
| arXiv → published upgrade | Searches Semantic Scholar, CrossRef, arXiv, Google Scholar, and Perplexity AI to find the formal publication venue for preprints |
| DuckDuckGo verification | Confirms every found publication exists on the web to guard against LLM hallucinations |
| Entry type inference | Fixes @misc → @article or @inproceedings based on available fields |
| Missing field detection | Warns about required fields absent from entries |
| Duplicate key detection | Errors on duplicate cite keys |
Undefined @String detection |
Errors on bare-word macro references not defined anywhere |
Requires Python ≥ 3.11 and uv.
uv syncThis installs the bib-check command into a local .venv.
uv run bib-check references.bibOptions:
| Flag | Description |
|---|---|
-o FILE |
Output .bib file (default: <input>_fixed.bib) |
-r FILE |
Output report file in Markdown (default: <input>_report.md) |
--offline |
Skip all network lookups |
--no-upgrade |
Skip arXiv → published upgrade (venue standardization still runs) |
--perplexity-key KEY |
Perplexity AI API key (overrides PERPLEXITY_API_KEY env var) |
--s2-key KEY |
Semantic Scholar API key for higher rate limits (overrides SEMANTIC_SCHOLAR_API_KEY env var) |
--no-scholar |
Disable the Google Scholar search backend |
--no-learn-venues |
Do not save newly discovered venues to venues.json |
-v |
Verbose: print search progress to stderr |
Semantic Scholar is the primary search backend. By default requests are unauthenticated (~1 req / 3.5 s). Providing an API key raises the limit to ~10 req / s:
- Get a free key at https://www.semanticscholar.org/product/api#api-key
- Pass it via environment variable or CLI flag:
export SEMANTIC_SCHOLAR_API_KEY=...
uv run bib-check references.bib
# or inline:
uv run bib-check references.bib --s2-key ...Perplexity AI has web-search capabilities and can find publication venues that structured APIs (Semantic Scholar, CrossRef) may not index yet.
- Get a key at https://www.perplexity.ai/settings/api
- Pass it via environment variable or CLI flag:
export PERPLEXITY_API_KEY=pplx-...
uv run bib-check references.bib
# or inline:
uv run bib-check references.bib --perplexity-key pplx-...Every result from Perplexity is verified via DuckDuckGo before being accepted, so hallucinated DOIs / venues are discarded automatically.
For each arXiv preprint the tool runs the following pipeline (in order, stopping at the first confirmed published result):
1. Semantic Scholar (structured API, most complete venue data)
2. CrossRef (DOI-indexed published works)
3. arXiv (journal_ref / DOI sometimes present)
4. Google Scholar (broad academic coverage; requires `scholarly` package)
5. Perplexity AI (web search + LLM; requires API key)
└→ DuckDuckGo (verifies result to catch hallucinations)
6. DuckDuckGo (soft-verify S2 / CrossRef results too)
Papers that cannot be confirmed as published are flagged for manual review in the report.
Google Scholar is used as a fallback search backend via the scholarly package.
uv sync --group scholarOnce installed, Google Scholar is enabled automatically. Disable it with
--no-scholar if needed.
references_fixed.bib– cleaned bib with canonical@Stringmacros and upgraded entriesreferences_report.md– Markdown report with all changes and issues
Only macros that are actually used in the output file are emitted. The canonical set includes: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, AAAI, IJCAI, AISTATS, SIGGRAPH, TOG, PAMI, IJCV, TIP, TVCG, TMM, TCSVT, WACV, ACMMM, BMVC, ICPR, CGF, EGSR, ARXIV, and more.
/
├── pyproject.toml
└── bib_checker/
├── cli.py – argument parsing and entry point
├── checker.py – main checking logic
├── parser.py – BibTeX parser (no external deps)
├── writer.py – BibTeX serializer
├── search.py – arXiv / S2 / CrossRef / Google Scholar / Perplexity / DDG backends
├── strings.py – canonical @String definitions and alias table
├── datatypes.py – shared data classes
└── report.py – Markdown report generator