Document parsing and knowledge extraction tools.
All d* projects are entirely AI generated.
Extract structured data from unstructured documents. Parse PDFs, Word docs, HTML, video, and audio into formats AI can use. Layout analysis, text extraction, semantic chunking for RAG.
| Project | Description | Status |
|---|---|---|
| docling_rs | Docling in Rust. Parse PDF, Word, HTML, complex layouts. | Preview |
| sg | SuperGrep with Warp. Semantic code search across repos. | Preview |
| chunker | State-of-art text chunking for RAG. Semantic splitting. | Preview |
| video_audio_extracts | Extract knowledge from video/audio. Uses ML models. | Preview |
| pdfium_fast | Multi-process PDFium. Fast PDF text extraction. | Preview |
| dashextract | Orchestration layer for all extraction. | Planned |
Apache 2.0 - See LICENSE for details.
See RELEASES.md for version history.