Skip to content
/ dKNOW Public

Document parsing and knowledge extraction tools

License

Notifications You must be signed in to change notification settings

dropbox/dKNOW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dKNOW - Document Processing

Status License

Document parsing and knowledge extraction tools.

All d* projects are entirely AI generated.

Thesis

Extract structured data from unstructured documents. Parse PDFs, Word docs, HTML, video, and audio into formats AI can use. Layout analysis, text extraction, semantic chunking for RAG.

Projects

Project Description Status
docling_rs Docling in Rust. Parse PDF, Word, HTML, complex layouts. Preview
sg SuperGrep with Warp. Semantic code search across repos. Preview
chunker State-of-art text chunking for RAG. Semantic splitting. Preview
video_audio_extracts Extract knowledge from video/audio. Uses ML models. Preview
pdfium_fast Multi-process PDFium. Fast PDF text extraction. Preview
dashextract Orchestration layer for all extraction. Planned

License

Apache 2.0 - See LICENSE for details.

Release History

See RELEASES.md for version history.

About

Document parsing and knowledge extraction tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •