Skip to content

Conversation

noooop
Copy link
Contributor

@noooop noooop commented Sep 1, 2025

TL;DR

Use the override_pooler_config to support mxbai-rerank sigmoid_normalize:

vllm serve mixedbread-ai/mxbai-rerank-base-v2 \
  --runner pooling \
  --hf_overrides '{"architectures":["Qwen2ForSequenceClassification"],"classifier_from_token":["0","1"],"method":"from_2_way_softmax"}' \
  --override_pooler_config '{"logit_bias": 4.5}' 

logit_bias is half of estimated_max:

https://github.com/mixedbread-ai/mxbai-rerank/blob/21d9e79f181298b8dd436bef20d7ac3d80643c9a/mxbai_rerank/mxbai_rerank_v2.py#L20-L25

https://github.com/mixedbread-ai/mxbai-rerank/blob/21d9e79f181298b8dd436bef20d7ac3d80643c9a/mxbai_rerank/utils.py#L8-L21

  • mixedbread-ai/mxbai-rerank-base-v2: {"logit_bias": 4.5}
  • mixedbread-ai/mxbai-rerank-large-v2: {"logit_bias": 6.0}

Demo:

import requests

url = "http://127.0.0.1:8000/score"
MODEL_NAME = "mixedbread-ai/mxbai-rerank-base-v2"

# Please use the query_template and document_template to format the query and
# document for better reranker results.

prefix = "<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n"

query_template = "{prefix}query: {query}\n"
document_template = "document: {doc}\n{instruction}{suffix}"

instruction = "You are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:"

queries = [
    "Who wrote To Kill a Mockingbird?"
]

documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

queries = [
    query_template.format(prefix=prefix, query=query)
    for query in queries
]
documents = [
    document_template.format(doc=doc, suffix=suffix, instruction=instruction) for doc in documents
]


response = requests.post(url,
                         json={
                             "model": MODEL_NAME,
                             "text_1": queries,
                             "text_2": documents,
                             "truncate_prompt_tokens": -1,
                         }).json()
for i, r in enumerate(response["data"]):
    print(i, r["score"])

0 0.9945342540740967
1 0.0470464788377285
2 0.9746929407119751
3 0.12403740733861923
4 0.026046426966786385
5 0.02023334428668022

similar to model.rank(query, documents, normalize=True)


import torch
from mxbai_rerank import MxbaiRerankV2

model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2", torch_dtype=torch.float32)

query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

# Lets get the scores
results = model.rank(query, documents, normalize=True)

results.sort(key = lambda x:x.index)

for i, r in enumerate(results):
    print(i, r.score)


"""
0 0.9941529631614685
1 0.0521107017993927
2 0.9704784750938416
3 0.2976154386997223
4 0.06989647448062897
5 0.028927486389875412
"""

Purpose

Classification models support logit_bias / sigmoid_normalize

Fix #22983
address #19675 (comment)

Test Plan

pytest -s -vvv tests/models/multimodal/pooling/test_jinavl_reranker.py

Test Result

pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for logit_bias in classification models, which is useful for models like mxbai-rerank. The implementation correctly adds the logit_bias parameter to the PoolerConfig and applies it within the ClassifierPooler. Additionally, this PR includes a significant and beneficial refactoring of JinaVLForSequenceClassification, correcting its integration with the pooling mechanism by properly using the ClassifierPooler and removing hardcoded logic. This makes the implementation cleaner and more robust. I've found one critical issue in the implementation that needs to be addressed.

noooop and others added 3 commits September 1, 2025 16:13
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
@noooop
Copy link
Contributor Author

noooop commented Sep 2, 2025

cc @DarkLight1337

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 2, 2025 11:55
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025
@DarkLight1337 DarkLight1337 merged commit e0653f6 into vllm-project:main Sep 2, 2025
52 checks passed
@devang-sifthub
Copy link

devang-sifthub commented Sep 2, 2025

Thanks for the update, really appreciate it!
And when will this be released ?

rzabarazesh pushed a commit to rzabarazesh/vllm that referenced this pull request Sep 2, 2025
…llm-project#24031)

Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@noooop
Copy link
Contributor Author

noooop commented Sep 3, 2025

Thanks for the update, really appreciate it! And when will this be released ?

vLLM provides wheels for Linux running on an x86 platform with CUDA 12 for every commit

You can install the latest code at any time.

https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#install-the-latest-code_1

vLLM releases a new version approximately every four weeks.

@noooop noooop deleted the sigmoid_normalize branch September 3, 2025 00:25
akaihaoshuai pushed a commit to akaihaoshuai/vllm that referenced this pull request Sep 3, 2025
…llm-project#24031)

Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: 子悬 <[email protected]>
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Sep 3, 2025
…llm-project#24031)

Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Matthew Bonanni <[email protected]>
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Sep 3, 2025
…llm-project#24031)

Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
842974287 pushed a commit to 842974287/vllm that referenced this pull request Sep 3, 2025
…llm-project#24031)

Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Shiyan Deng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
3 participants