CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

CommitSuite is a comprehensive benchmark designed for both commit classification and commit message generation tasks. By integrating CCS (Conventional Commits Specification) and high-quality commit messages, we construct a large-scale, clean dataset covering multiple languages and repositories. We propose a new binary evaluation metric system that enables automated assessment of commit messages without relying on human-written references. Experiments demonstrate that LLMs outperform existing tools in both tasks, especially in message generation.

Dataset

Our dataset comprises 63,581 high-quality commits from 243 open-source repositories, all strictly following the CCS format and covering seven popular programming languages.

Supported Languages:

(counts only include commits modifying files in a single language)

Language	Commit Count
GO	14807
JAVASCRIPT	10903
TYPESCRIPT	18835
PYTHON	12740
C/C++	5612
JAVA	2009

Data Format:

- url
- hash
- msg
- message_type
- description
- why
- what
- author
- email
- date
- modified_files (array)
- old_path
- new_path
- filename
- change_type
- diff
- added_lines
- deleted_lines
- issues (array)
- number
- title
- body
- url
- prs (array)
- comments (array)
- repo_name
- strict_ccs
- ast_changes (array)
- file
- changes (array)
 - type
 - name
 - structure_type
 - location
 - details (object)

Data Fields Description

url: URL of the commit
hash: Hash identifier of the commit
msg: Full commit message (raw text)
message_type: Commit type specified by the developer (e.g., fix, feat)
description: Descriptive text following the commit type (use this field to hide type information from tools/LLMs)
why: Indicates presence of "why" rationale in commit message (0 = absent, 1 = present)
what: Indicates presence of "what" change information in commit message (0 = absent, 1 = present)
author: Developer who authored the commit
email: Author's email address
date: Commit date (YYYY-MM-DD format)
modified_files: Modified files metadata including diffs
issues: Issues referenced in commit message (via #xxx notation)
prs: Pull requests referenced in commit message (via #xxx notation)
comments: Code review comments on the commit
repo_name: Source repository name (format: owner/repo)
strict_ccs: CCS compliance level (0 = near-perfect adherence, 1 = majority adherence)
AST_changes: AST changes parsed by tree-sitter

Evaluation Metrics and Benchmark Suites

Ten-category Evaluation

Test your target tool on our Ten-category-eval Dataset
Save results in Ten-category_result.json with these required fields:
- "message_type" (original commit type)
- "Ten-category_result" (tool-predicted type)
Run the evaluation script:
```
python Ten-category_eval.py
```
Generated reports will appear in ./Ten-category_reports:
- Classification metrics: classification_metrics.csv
- Confusion matrix visualization: confusion_matrix.png

CMG Evaluation

Test your target tool on our CMG-eval Dataset
Save results in CMG_result.json containing:
- All original dataset fields
- Additional "CMG_result" field (tool-generated message)
Evaluate Traditional Metrics:
```
python CMG_eval_traditional_metrics.py
```
- Output: ./CMG_reports/CMG_eval_traditional_metrics.csv
- Metrics: BLEU, ROUGE-L, METEOR
Evaluate Binary Metrics:

Before running CMG_eval_binary_metrics.py, please read Configuration Note first.
```
python CMG_eval_binary_metrics.py
```
- Outputs in ./CMG_reports:
  - CMG_eval_binary_metrics.json (augmented with binary scores per entry)
  - CMG_eval_binary_metrics.csv (aggregated binary metrics)
Configuration Note: The binary evaluation script uses:
- API: https://api.deepseek.com
- Model: deepseek-v3 (Modify prompts/script logic if changing API/model)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CMG-eval_dataset		CMG-eval_dataset
Ten-category-eval_dataset		Ten-category-eval_dataset
all_data		all_data
CMG_eval_binary_metrics.py		CMG_eval_binary_metrics.py
CMG_eval_traditional_metrics.py		CMG_eval_traditional_metrics.py
README.md		README.md
Ten-category_eval.py		Ten-category_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

Dataset

Supported Languages:

Data Format:

Data Fields Description

Evaluation Metrics and Benchmark Suites

Ten-category Evaluation

CMG Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

Dataset

Supported Languages:

Data Format:

Data Fields Description

Evaluation Metrics and Benchmark Suites

Ten-category Evaluation

CMG Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages