@michaelfeil
Description
When trying to send a request using the reranker models such as BAAI/bge-reranker-v2-m3, the worker successfully processes rerank requests but crashes when attempting to return results, causing 60-second client timeouts. The error indicates that RerankReturnType objects from the infinity-emb library are not JSON serializable.
Error Message
Error while returning job result. | Object of type RerankReturnType is not JSON serializable
Environment
- RunPod Serverless: Yes
- infinity-emb version: 0.0.76
- runpod version: ~1.7.0
- Base Image: nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
- Python: 3.11
Steps to Reproduce
- Deploy the worker-infinity-embedding to RunPod Serverless
- Send a rerank request with the following structure:
{
"input": {
"query": "your search query",
"docs": ["doc1", "doc2", "doc3"],
"model": "your-rerank-model",
"return_docs": true
}
}
- Worker processes successfully but crashes when returning results
- Client receives timeout after 60 seconds
Root Cause
The handler.py returns the result from embedding_service.infinity_rerank() directly, which returns a RerankReturnType Pydantic model object. RunPod's serverless framework requires plain Python dictionaries (JSON-serializable objects) as return values.
The issue is in handler.py at this code block:
if job_input.get("query"):
call_fn, kwargs = embedding_service.infinity_rerank, {
"query": job_input.get("query"),
"docs": job_input.get("docs"),
"return_docs": job_input.get("return_docs"),
"model_name": job_input.get("model"),
}
And later:
try:
out = await call_fn(**kwargs)
return out # ❌ This returns a Pydantic model, not a dict
except Exception as e:
return create_error_response(str(e)).model_dump()
Proposed Solution
Convert all Pydantic model responses to dictionaries before returning:
try:
out = await call_fn(**kwargs)
# Convert Pydantic models to dicts
if hasattr(out, 'model_dump'):
return out.model_dump()
elif hasattr(out, 'dict'):
return out.dict()
return out
except Exception as e:
return create_error_response(str(e)).model_dump()
Alternatively, ensure each route handler explicitly converts its response:
if job_input.get("query"):
call_fn, kwargs = embedding_service.infinity_rerank, {
"query": job_input.get("query"),
"docs": job_input.get("docs"),
"return_docs": job_input.get("return_docs"),
"model_name": job_input.get("model"),
}
result = await call_fn(**kwargs)
return result.model_dump() if hasattr(result, 'model_dump') else result
Additional Context
- Embedding requests work fine (possibly because they return serializable structures)
- Error responses work correctly (they use
.model_dump())
- This affects all rerank operations
@michaelfeil
Description
When trying to send a request using the reranker models such as
BAAI/bge-reranker-v2-m3, the worker successfully processes rerank requests but crashes when attempting to return results, causing 60-second client timeouts. The error indicates thatRerankReturnTypeobjects from the infinity-emb library are not JSON serializable.Error Message
Environment
Steps to Reproduce
{ "input": { "query": "your search query", "docs": ["doc1", "doc2", "doc3"], "model": "your-rerank-model", "return_docs": true } }Root Cause
The
handler.pyreturns the result fromembedding_service.infinity_rerank()directly, which returns aRerankReturnTypePydantic model object. RunPod's serverless framework requires plain Python dictionaries (JSON-serializable objects) as return values.The issue is in
handler.pyat this code block:And later:
Proposed Solution
Convert all Pydantic model responses to dictionaries before returning:
Alternatively, ensure each route handler explicitly converts its response:
Additional Context
.model_dump())