Official implementation of TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing, in collaboration with Google Cloud AI, accepted at AAAI 2026 (Main Technical Track).
π€ TabFlash is an efficient and accurate multimodal LLM, achieving state-of-the-art performance outperforming GPT-4o and Gemini 2.5 Pro with exceptionally low computational cost.
π TabFlash (3B) achieves state-of-the-art performance while reducing FLOPs by 27% and memory usage by 30% compared to the second-best MLLM.
β‘ TabFlash (1B) outperforms most MLLMs with exceptionally low TFLOPs and just 11.2 GB peak memory, enabling deployment on low-memory GPUs.
This code is tested on python 3.9, CUDA 12.4, PyTorch 2.4.1, and FlashAttention 2.7.3.
conda create -n tabflash python=3.9 -y
conda activate tabflashFollow the official guide.
cd InternVL
pip install -r requirements.txtpip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124git clone --branch v2.7.3 --single-branch https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
python setup.py install
cd ..pip install wandb sacrebleu distance apted bitsandbytes --upgrade
pip install datasets==2.18.0TabFlash uses MMTab from Table-LLaVA.
- Download MMTab-instruct_table_images_82K.zip and MMTab-pre_table_images_part_2_16K.zip.
- Place them under
data/LLaVA-Pretrain/imagesand unzip them. RenameIID_train_imagedirectory totable_pretrain_part_1. - Download table_only_pretrain_data_with_length.jsonl. Place it under
data/LLaVA-Pretrain.
- Download MMTab-instruct_table_images_82K.zip.
- Place it under
data/LLaVA-Finetune/images/table_instructVand unzip it. Rename the resultingIID_train_imagedirectory toimages. - Download table_only_sft_data_with_length.jsonl. Place it under
data/LLaVA-Finetune.
- Download MMTab-eval_test_data_49K_llava_jsonl_format.jsonl and MMTab-eval_table_images_23K.zip.
- Place them under
data/LLaVA-Inferenceand unzip it.
- Download MMTab-eval_test_data_49K.json and MMTab-eval_test_tables_23K.json.
- Place them under
data/MMTab-eval_evaluation.
TabFlash/
βββ InternVL/
β βββ internvl_chat/
β β βββ scripts/
β β βββ inference.py
β β βββ mmtab_eval.py
β β βββ ...
β βββ ...
βββ data/
β βββ LLaVA-Pretrain
β β βββ images/
β β β βββ table_pretrain_part_1/
β β β βββ table_pretrain_part_2/
β β βββ table_only_pretrain_data_with_length.jsonl
β βββ LLaVA-Finetune
β β βββ images/
β β β βββ table_instructV/
β β β β βββ images/
β β βββ table_only_sft_data_with_length.jsonl
β βββ LLaVA-Inference
β β βββ all_test_image/
β β βββ MMTab-eval_test_data_49K_llava_jsonl_format.jsonl
β βββ MMTab-eval_evaluation
β β βββ MMTab-eval_test_data_49K.json
β β βββ MMTab-eval_test_tables_23K.json
βββ assets/
β βββ acc_tflops_plot.png
β βββ ...
βββ README.md
Move into the directory below for training / inference / evaluation.
cd InternVL/internvl_chat/If you only want to use model, download tabflash_stage2_4b.tar and tabflash_stage2_1b.tar and unzip it under work_dirs/internvl_chat_v2_5/tabflash_4b and work_dirs/internvl_chat_v2_5/tabflash_1b, respectively.
If you want to train the model from scratch, follow the instructions below. TabFlash training consists of two stages:
bash scripts/4b_train_stage1.sh # For 4B model
bash scripts/1b_train_stage1.sh # For 1B modelbash scripts/4b_train_stage2.sh # For 4B model
bash scripts/1b_train_stage2.sh # For 1B modelRun inference on test set:
bash scripts/4b_inference.sh # For 4B model
bash scripts/1b_inference.sh # For 1B modelEvaluate the model predictions:
python mmtab_eval.py --pred_file results/{exp_name}/result.jsonlIf you find this work useful, please cite:
@inproceedings{
kim2026tabflash,
title={TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing},
author={Kim, Jongha and Bae, Minseong and Lee, Sanghyeok and Yoon, Jinsung and Kim, Hyunwoo J},
booktitle={AAAI},
year={2026}
}This codebase is based on InternVL and Table-LLaVA.
