Skip to content

[BUG] improve the packaging of PIP in modular way #41

@DarshanKumar89

Description

@DarshanKumar89

Current Package Size Analysis

Package Structure

multimind-sdk/
├── setup.py                    # Main package configuration
├── requirements-base.txt       # Base dependencies (16 packages)
├── requirements.txt            # Full dependencies (150+ packages)
├── multimind/                  # Source code (~40 modules)
│   ├── __init__.py
│   ├── agents/
│   ├── rag/
│   ├── fine_tuning/
│   ├── compliance/
│   ├── gateway/
│   └── ... (40+ modules)
└── examples/                   # Example code

Current Installation Options

1. Basic Installation (pip install multimind-sdk)

# Installs: requirements-base.txt (16 packages)
# Size: ~50MB
# Dependencies:
- openai>=1.0.0
- anthropic>=0.5.0
- pydantic>=2.0.0
- python-dotenv>=1.0.0
- fastapi>=0.100.0
- python-jose[cryptography]>=3.3.0
- python-multipart>=0.0.6
- click>=8.1.0
- rich>=13.0.0
- requests>=2.26.0
- typing-extensions>=4.5.0
- pytest>=7.0.0
- pytest-asyncio>=0.21.0
- black>=23.0.0
- isort>=5.12.0
- mypy>=1.0.0
- ruff>=0.1.0

2. Full Installation (pip install multimind-sdk[full])

# Installs: requirements.txt (150+ packages)
# Size: ~3GB
# Major dependencies:
- torch==2.7.0 (2GB+)
- transformers==4.52.3 (500MB+)
- accelerate==1.7.0
- peft==0.15.2
- chromadb==1.0.10
- faiss-cpu==1.11.0
- sentence-transformers==4.1.0
- numpy==2.2.6
- pandas==2.2.3
- scikit-learn==1.6.1
- scipy==1.15.3
- onnxruntime==1.22.0
- opentelemetry-api==1.33.1
- pinecone-client==6.0.0
- ... (140+ more packages)

Package Size Breakdown

Source Code Size

multimind/ directory: ~2MB
├── __init__.py: 3.2KB
├── config.py: 3.2KB
├── agents/: ~500KB
├── rag/: ~300KB
├── fine_tuning/: ~800KB
├── compliance/: ~200KB
├── gateway/: ~400KB
└── other modules: ~1MB

Dependency Size Analysis

Heavy Dependencies (>100MB each)

  1. PyTorch (torch==2.7.0): ~2GB

    • Deep learning framework
    • Used for fine-tuning and model operations
    • CPU version: ~800MB, GPU version: ~2GB
  2. Transformers (transformers==4.52.3): ~500MB

    • Hugging Face transformers library
    • Model loading and inference
    • Includes model weights and tokenizers
  3. Accelerate (accelerate==1.7.0): ~200MB

    • Hugging Face accelerate
    • Distributed training support

Medium Dependencies (10-100MB each)

  1. ChromaDB (chromadb==1.0.10): ~50MB
  2. FAISS (faiss-cpu==1.11.0): ~40MB
  3. Sentence Transformers (sentence-transformers==4.1.0): ~30MB
  4. NumPy (numpy==2.2.6): ~20MB
  5. Pandas (pandas==2.2.3): ~15MB

Light Dependencies (<10MB each)

  • OpenAI client: ~5MB
  • Anthropic client: ~3MB
  • FastAPI: ~8MB
  • Pydantic: ~2MB
  • Click: ~1MB
  • Rich: ~2MB
  • ... (100+ more packages)

Impact on Existing Users

Current User Base

  • 800+ downloads of multimind-sdk
  • Users expect current functionality to work
  • Cannot break backward compatibility

User Scenarios

Scenario 1: RAG-Only Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~200MB
- OpenAI/Anthropic clients
- Sentence transformers
- ChromaDB/FAISS
- NumPy/scikit-learn

Scenario 2: Agent-Only Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~10MB
- Click, Rich
- Async support
- Core utilities

Scenario 3: Fine-tuning Users

# Current: Gets everything (3GB)
pip install multimind-sdk

# What they actually need: ~2.5GB
- PyTorch, Transformers
- PEFT, Accelerate
- Datasets, Tokenizers

Recommendations for Existing Users

Immediate Actions (Keep Current Package)

  1. Don't change current package - 800+ users depend on it
  2. Keep backward compatibility - All existing installations must work
  3. Add better documentation - Help users understand size implications

Short-term Improvements

  1. Add feature-based extras (optional for users)
  2. Improve documentation about package sizes
  3. Add size warnings for large installations

Long-term Strategy

  1. Create modular packages alongside current package
  2. Encourage gradual migration to smaller packages
  3. Maintain legacy support for 1+ years

User Communication Strategy

1. Size Transparency

# README.md
## Package Sizes

### Current Installation
- `pip install multimind-sdk`: ~50MB (basic)
- `pip install multimind-sdk[full]`: ~3GB (complete)

### Recommended for New Users
- RAG only: `pip install multimind-sdk[rag]` (~200MB)
- Agents only: `pip install multimind-sdk[agents]` (~10MB)
- Full AI: `pip install multimind-sdk[ai-core]` (~2.5GB)

2. Backward Compatibility

## For Existing Users

Your current installation will continue to work:
```bash
pip install multimind-sdk  # Still works!

No breaking changes will be made to the current package.


## **Conclusion**

### **Current State**
- **Basic installation**: ~50MB (reasonable)
- **Full installation**: ~3GB (very large)
- **800+ existing users**: Must maintain compatibility

### **Recommended Actions**
1. **Keep current package unchanged** (critical)
2. **Add feature-based extras** (improvement)
3. **Create modular packages** (future)
4. **Maintain backward compatibility** (long-term)

### **Benefits**
- ✅ No disruption to existing users
- ✅ Better experience for new users
- ✅ Path to true modular architecture
- ✅ Sustainable development model

Metadata

Metadata

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions