AI Vision CLI

An enterprise-grade command-line interface for AI-powered image analysis using Google's Gemini and Vertex AI models with advanced reliability features.

Acknowledgements

This tool was created based on the ai-vision-mcp repository. Special thanks to Tan Yong Sheng for the original implementation and inspiration.

Features

Image Analysis: Analyze single images with custom prompts
Image Comparison: Compare multiple images and identify differences
Object Detection: Detect and identify objects in images with bounding box annotations
Multiple AI Providers: Support for Google Gemini and Vertex AI
Flexible Output: JSON, text, and table output formats
Configuration Management: Easy setup and configuration management
Progress Tracking: Visual progress indicators for long-running operations
Advanced Error Handling: Intelligent retry logic and circuit breaker patterns
Rate Limiting: Built-in quota management and rate limiting
Health Monitoring: Provider connectivity and health checks

Installation

From npm (when published)

npm install -g ai-vision-cli

From source

git clone https://github.com/majormark/ai-vision-cli.git
cd ai-vision-cli
npm install
npm run build
npm link

Prerequisites

Node.js 18.0.0 or higher
Google Cloud Vertex AI credentials (for Vertex AI provider) or Google AI Studio API key (for Gemini provider)

Quick Start

Initialize the CLI:
```
ai-vision init
```

Analyze an image:

ai-vision analyze image ./path/to/image.jpg

Compare two images:

ai-vision compare images image1.jpg image2.jpg

Detect objects:

ai-vision detect objects ./path/to/image.jpg --prompt "Find all cars and people"

Commands

`init`

Initialize AI Vision CLI configuration.

ai-vision init [options]

Options:

-p, --provider <provider>: AI provider to use (google|vertex_ai)
-i, --interactive: Run interactive setup (default: true)
-c, --config <path>: Custom config file path
--defaults: Use default configuration without prompts

`analyze`

Analyze images.

Image Analysis

ai-vision analyze image <image> [options]

Arguments:

<image>: Image file path or URL

Options:

-p, --prompt <prompt>: Analysis prompt (default: "Analyze this image")
-o, --output <format>: Output format - json|text|table (default: "json")
-s, --save <path>: Save output to file
-t, --temperature <temp>: AI temperature (0-1)
--max-tokens <tokens>: Maximum output tokens
--top-p <value>: Top P value (0-1)
--top-k <value>: Top K value (1-100)
--system-instruction <instruction>: System instruction to guide model behavior
--provider <provider>: AI provider (google|vertex_ai)
--no-progress: Disable progress indicators
--verbose: Enable detailed debug output

`compare`

Compare multiple images.

ai-vision compare images <images...> [options]

Arguments:

<images...>: Image file paths or URLs (2-4 images)

Options:

-p, --prompt <prompt>: Comparison prompt
-o, --output <format>: Output format (json|text|table)
-s, --save <path>: Save output to file
--provider <provider>: AI provider (google|vertex_ai)

`detect`

Detect objects in images.

ai-vision detect objects <image> [options]

Arguments:

<image>: Image file path or URL

Options:

-p, --prompt <prompt>: Detection prompt describing what to detect
-o, --output <path>: Output path for annotated image
--format <format>: Output format (json|image)
--confidence <threshold>: Confidence threshold (0-1)
--save-detections <path>: Save detection results to file

`config`

Manage configuration.

ai-vision config <subcommand> [options]

Subcommands:

show: Show current configuration
set <key> <value>: Set configuration value
get <key>: Get configuration value
reset: Reset configuration to defaults

Configuration

The CLI uses a YAML configuration file stored at ~/.ai-vision/config.yaml by default.

Environment Variables

You can set these environment variables instead of using the config file:

GOOGLE_AI_API_KEY: Google AI Studio API key
VERTEX_AI_PROJECT_ID: Vertex AI project ID
VERTEX_AI_LOCATION: Vertex AI location (default: "us-central1")
VERTEX_AI_CREDENTIALS: Path to Vertex AI credentials JSON file

Example Configuration

provider: "google"  # or "vertex_ai"
google:
  api_key: "your-google-ai-api-key"
vertex_ai:
  project_id: "your-project-id"
  location: "us-central1"
  credentials: "/path/to/credentials.json"
output:
  format: "json"
  save_directory: "./output"
  file_prefix: "ai-vision-"
performance:
  max_file_size: 10485760  # 10MB
  timeout: 30000  # 30 seconds
  upload_threshold: 4194304  # 4MB
logging:
  level: "info"
  file: "~/.ai-vision/logs/ai-vision.log"

Usage Examples

Basic Image Analysis

ai-vision analyze image ./photo.jpg --prompt "Describe the main objects in this image"

Advanced Analysis with Custom Parameters

ai-vision analyze image ./photo.jpg \
  --prompt "Analyze the composition and lighting" \
  --temperature 0.7 \
  --max-tokens 500 \
  --output table \
  --save analysis.txt

Object Detection

ai-vision detect objects ./street.jpg \
  --prompt "Find all vehicles, pedestrians, and traffic signs" \
  --save-detections detections.json

Image Comparison

ai-vision compare images ./before.jpg ./after.jpg \
  --prompt "Compare the differences between these two images"

Batch Analysis

# Using glob patterns for multiple images
ai-vision analyze image "./images/*.jpg" \
  --prompt "Identify the main subject" \
  --save results.json

Output Formats

JSON

Structured JSON output with complete metadata:

{
  "success": true,
  "data": {
    "analysis": "The image shows...",
    "confidence": 0.95,
    "model": "gemini-1.0-pro-vision",
    "timestamp": "2024-01-01T12:00:00Z"
  }
}

Text

Human-readable text output:

Analysis Result:
The image shows a beautiful sunset over mountains with vibrant colors.
Confidence: 95%
Model: gemini-1.0-pro-vision

Table

Formatted table output for quick viewing:

┌─────────────────────┬──────────────────────┐
│ Analysis            │ The image shows...   │
│ Confidence          │ 95%                  │
│ Model               │ gemini-1.0-pro-vision│
└─────────────────────┴──────────────────────┘

File Support

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)
GIF (.gif)
BMP (.bmp)
TIFF (.tiff)
HEIC (.heic, .heif)

File Size Limits

Default maximum file size: 10MB
Files larger than 4MB are uploaded instead of sent inline
Configurable via performance.max_file_size setting

Troubleshooting

Common Issues

Authentication Errors:

ai-vision config set provider google
ai-vision config set google.api_key YOUR_API_KEY

File Size Errors:

ai-vision config set performance.max_file_size 20971520  # 20MB

Timeout Issues:

ai-vision config set performance.timeout 60000  # 60 seconds

Debug Mode

Enable verbose logging:

ai-vision analyze image ./photo.jpg --verbose

View Configuration

ai-vision config show

Development

Building from Source

git clone https://github.com/majormark/ai-vision-cli.git
cd ai-vision-cli
npm install
npm run build

Running Tests

npm test

Linting

npm run lint
npm run lint:fix

Development Mode

npm run dev

API Integration

The CLI can be integrated into other tools and scripts:

JavaScript/Node.js

const { execSync } = require('child_process');

const result = execSync(
  'ai-vision analyze image ./photo.jpg --output json',
  { encoding: 'utf8' }
);

const analysis = JSON.parse(result);
console.log(analysis.data.analysis);

Shell Script

#!/bin/bash

for image in ./images/*.jpg; do
  echo "Analyzing $image..."
  ai-vision analyze image "$image" \
    --prompt "Describe this image" \
    --save "./results/$(basename "$image" .jpg).json"
done

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Support

GitHub Issues: https://github.com/majormark/ai-vision-cli/issues
Documentation: https://github.com/majormark/ai-vision-cli

Changelog

v1.0.0

Initial release
Image analysis with customizable prompts
Object detection with bounding box annotations
Image comparison (up to 4 images simultaneously)
Multiple AI providers support (Google Gemini, Vertex AI)
Configuration management with YAML support
Advanced error handling with intelligent retry logic
Rate limiting and quota management
Health monitoring and provider connectivity checks
Multiple output formats (JSON, text, table)
Progress indicators for long-running operations

Architecture Overview

The AI Vision CLI is built with enterprise-grade reliability features:

Circuit Breaker Pattern: Automatic provider switching during failures
Exponential Backoff: Intelligent retry with jitter for network issues
Structured Logging: Comprehensive debugging with correlation IDs
Health Checks: Proactive provider monitoring and validation
Rate Limiting: Token bucket algorithm to prevent quota exhaustion
Metrics Collection: Performance tracking and success/failure rates

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
bin		bin
dist		dist
src		src
templates		templates
tests		tests
.env.example		.env.example
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
test-image.png		test-image.png
tsconfig.json		tsconfig.json

mark-major/ai-vision-cli

Folders and files

Latest commit

History

Repository files navigation

AI Vision CLI

Acknowledgements

Features

Installation

From npm (when published)

From source

Prerequisites

Quick Start

Commands

init

analyze

Image Analysis

compare

detect

config

Configuration

Environment Variables

Example Configuration

Usage Examples

Basic Image Analysis

Advanced Analysis with Custom Parameters

Object Detection

Image Comparison

Batch Analysis

Output Formats

JSON

Text

Table

File Support

Supported Image Formats

File Size Limits

Troubleshooting

Common Issues

Debug Mode

View Configuration

Development

Building from Source

Running Tests

Linting

Development Mode

API Integration

JavaScript/Node.js

Shell Script

License

Contributing

Support

Changelog

v1.0.0

Architecture Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`init`

`analyze`

`compare`

`detect`

`config`

Packages