Skip to content

digitalplatdev-lab/Img-Vid_analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image & Video Analyzer — Prototype

This prototype accepts an image or a video, detects human keypoints, detects objects, performs zoomed-in captions of detected regions, and returns a 20-point descriptive summary.

Features

  • Accepts images or videos of any size
  • Extracts human keypoints (MediaPipe Holistic)
  • Runs object detection (Ultralytics YOLO)
  • Produces global caption and zoomed-in captions (BLIP)
  • Returns 20 numbered descriptive points

Requirements

  • Python 3.9+
  • NVIDIA GPU recommended for performance but not required

Quick start (local)

  1. Create and activate a virtualenv: python -m venv .venv source .venv/bin/activate # macOS / Linux .venv\Scripts\activate # Windows

  2. Install dependencies: pip install -r requirements.txt

  3. Run the app: uvicorn app:app --reload --host 0.0.0.0 --port 8000

  4. Open http://localhost:8000/ and upload an image or a video.

Notes

  • Models (BLIP, YOLO) download weights on first run; this may take time.
  • If you want to avoid heavy models, you can disable object detection or captioning in processing.py.

API

  • POST /analyze (multipart form) field file which is image or video
  • Returns JSON: { "keypoints": {...}, "objects": [...], "global_caption": "...", "zoomed_captions": [...], "description_points": ["1. ...", "... up to 20"] }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published