Citegeist: BibTeX Reference Checker

Checks, standardizes, and upgrades .bib files automatically.

Features

Feature	Description
Venue standardization	Detects common spellings of conference/journal names and replaces them with canonical `@String` macros
arXiv → published upgrade	Searches Semantic Scholar, CrossRef, arXiv, Google Scholar, and Perplexity AI to find the formal publication venue for preprints
DuckDuckGo verification	Confirms every found publication exists on the web to guard against LLM hallucinations
Entry type inference	Fixes `@misc` → `@article` or `@inproceedings` based on available fields
Missing field detection	Warns about required fields absent from entries
Duplicate key detection	Errors on duplicate cite keys
Undefined `@String` detection	Errors on bare-word macro references not defined anywhere

Installation

Requires Python ≥ 3.11 and uv.

uv sync

This installs the bib-check command into a local .venv.

Usage

uv run bib-check references.bib

Options:

Flag	Description
`-o FILE`	Output `.bib` file (default: `<input>_fixed.bib`)
`-r FILE`	Output report file in Markdown (default: `<input>_report.md`)
`--offline`	Skip all network lookups
`--no-upgrade`	Skip arXiv → published upgrade (venue standardization still runs)
`--perplexity-key KEY`	Perplexity AI API key (overrides `PERPLEXITY_API_KEY` env var)
`--s2-key KEY`	Semantic Scholar API key for higher rate limits (overrides `SEMANTIC_SCHOLAR_API_KEY` env var)
`--no-scholar`	Disable the Google Scholar search backend
`--no-learn-venues`	Do not save newly discovered venues to `venues.json`
`-v`	Verbose: print search progress to stderr

Semantic Scholar API key

Semantic Scholar is the primary search backend. By default requests are unauthenticated (~1 req / 3.5 s). Providing an API key raises the limit to ~10 req / s:

Get a free key at https://www.semanticscholar.org/product/api#api-key
Pass it via environment variable or CLI flag:

export SEMANTIC_SCHOLAR_API_KEY=...
uv run bib-check references.bib

# or inline:
uv run bib-check references.bib --s2-key ...

Perplexity AI integration

Perplexity AI has web-search capabilities and can find publication venues that structured APIs (Semantic Scholar, CrossRef) may not index yet.

Get a key at https://www.perplexity.ai/settings/api
Pass it via environment variable or CLI flag:

export PERPLEXITY_API_KEY=pplx-...
uv run bib-check references.bib

# or inline:
uv run bib-check references.bib --perplexity-key pplx-...

Every result from Perplexity is verified via DuckDuckGo before being accepted, so hallucinated DOIs / venues are discarded automatically.

Search pipeline

For each arXiv preprint the tool runs the following pipeline (in order, stopping at the first confirmed published result):

1. Semantic Scholar  (structured API, most complete venue data)
2. CrossRef          (DOI-indexed published works)
3. arXiv             (journal_ref / DOI sometimes present)
4. Google Scholar    (broad academic coverage; requires `scholarly` package)
5. Perplexity AI     (web search + LLM; requires API key)
   └→ DuckDuckGo     (verifies result to catch hallucinations)
6. DuckDuckGo        (soft-verify S2 / CrossRef results too)

Papers that cannot be confirmed as published are flagged for manual review in the report.

Google Scholar support

Google Scholar is used as a fallback search backend via the scholarly package.

uv sync --group scholar

Once installed, Google Scholar is enabled automatically. Disable it with --no-scholar if needed.

Output

references_fixed.bib – cleaned bib with canonical @String macros and upgraded entries
references_report.md – Markdown report with all changes and issues

@String macros emitted

Only macros that are actually used in the output file are emitted. The canonical set includes: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, AAAI, IJCAI, AISTATS, SIGGRAPH, TOG, PAMI, IJCV, TIP, TVCG, TMM, TCSVT, WACV, ACMMM, BMVC, ICPR, CGF, EGSR, ARXIV, and more.

Project structure

/
├── pyproject.toml
└── bib_checker/
    ├── cli.py       – argument parsing and entry point
    ├── checker.py   – main checking logic
    ├── parser.py    – BibTeX parser (no external deps)
    ├── writer.py    – BibTeX serializer
    ├── search.py    – arXiv / S2 / CrossRef / Google Scholar / Perplexity / DDG backends
    ├── strings.py   – canonical @String definitions and alias table
    ├── datatypes.py – shared data classes
    └── report.py    – Markdown report generator

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bib_checker		bib_checker
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
venues.json		venues.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citegeist: BibTeX Reference Checker

Features

Installation

Usage

Semantic Scholar API key

Perplexity AI integration

Search pipeline

Google Scholar support

Output

@String macros emitted

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Citegeist: BibTeX Reference Checker

Features

Installation

Usage

Semantic Scholar API key

Perplexity AI integration

Search pipeline

Google Scholar support

Output

@String macros emitted

Project structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages