1st Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge from Team momo

We have released our model checkpoint in Huggingface.

Our technical report is available at arXiv.

The WWW 2025 Multimodal CTR Prediction Challenge: https://www.codabench.org/competitions/5372/

The overall architecture of our solution is shown below.

Our model (without future work part) achieves a 0.9839 AUC on the test set.

Notes & Future Work

Our model is roughly based on Transformer and DCNv2. Two future works can also be seen in the right part of the figure.

Semantic embeddings with quantization (Done, tuning)
- Due to the success of quantization in computer vision, we believe that quantization can also be applied to multimodal RecSys. Vector Quantization (VQ) and Residual Quantization (RQ) are used to quantize the original multimodal embeddings. Quantization transforms the freezed multimodal embeddings into discrete learnable semantic codes.
- Codes for quantization are provided in src/Transformer_DCN_Quant.py and the tuning work is still in progress.
Semantic similarity scores as part of the input of Transformer (In progress)
- Multimodal embeddings contain rich semantic information, and users have specific preferences for different semantic information. Semantic similarity information can be explicitly utilized in our model for better performance.
- We plan to use semantic similarity scores as part of the input of the Transformer part.

Both of the two future works are dedicated to utilizing the semantic information of the multimodal representations.

We believe that these two directions have huge potential. And works on these two directions are still ongoing even after the challenge.

Environment

We run the experiments on a customized 4080 Super GPU server with 32GB memory (vGPU-32G) from AutoDL.

Requirements:

fuxictr==2.3.7
numpy==1.26.4
pandas==2.2.3
scikit_learn==1.4.0
torch==1.13.1+cu117

Environment setup:

conda create -n fuxictr_momo python==3.9
pip install -r requirements.txt
source activate fuxictr_momo

How to Run

One-click run

sh ./run.sh

This script will run the whole pipeline, including model training and prediction.

Run step by step

Train the model on train and validation sets:

python run_expid.py --config config/Transformer_DCN_microlens_mmctr_tuner_config_01 --expid Transformer_DCN_MicroLens_1M_x1_001_820c435c --gpu 0

We got the best validation AUC: 0.976603.

Make predictions on the test set:

python prediction.py --config config/Transformer_DCN_microlens_mmctr_tuner_config_01 --expid Transformer_DCN_MicroLens_1M_x1_001_820c435c --gpu 0

Submission result on the leaderboard.

In particular, we got a 0.9814 result (Submission ID: 246140) for un-tuned Residual Quantization (RQ) model (part of future work 1).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
config		config
data		data
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fuxictr_version.py		fuxictr_version.py
prediction.py		prediction.py
requirements.txt		requirements.txt
run.sh		run.sh
run_expid.py		run_expid.py
run_param_tuner.py		run_param_tuner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1st Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge from Team momo

Notes & Future Work

Environment

How to Run

One-click run

Run step by step

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

pinskyrobin/WWW2025_MMCTR

Folders and files

Latest commit

History

Repository files navigation

1st Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge from Team momo

Notes & Future Work

Environment

How to Run

One-click run

Run step by step

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages