Skip to content

security-pride/CommitSuite

Repository files navigation

CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

CommitSuite is a comprehensive benchmark designed for both commit classification and commit message generation tasks. By integrating CCS (Conventional Commits Specification) and high-quality commit messages, we construct a large-scale, clean dataset covering multiple languages and repositories. We propose a new binary evaluation metric system that enables automated assessment of commit messages without relying on human-written references. Experiments demonstrate that LLMs outperform existing tools in both tasks, especially in message generation.

Dataset

Our dataset comprises 63,581 high-quality commits from 243 open-source repositories, all strictly following the CCS format and covering seven popular programming languages.

Supported Languages:

(counts only include commits modifying files in a single language)

Language Commit Count
GO 14807
JAVASCRIPT 10903
TYPESCRIPT 18835
PYTHON 12740
C/C++ 5612
JAVA 2009

Data Format:

- url
- hash
- msg
- message_type
- description
- why
- what
- author
- email
- date
- modified_files (array)
- old_path
- new_path
- filename
- change_type
- diff
- added_lines
- deleted_lines
- issues (array)
- number
- title
- body
- url
- prs (array)
- comments (array)
- repo_name
- strict_ccs
- ast_changes (array)
- file
- changes (array)
 - type
 - name
 - structure_type
 - location
 - details (object)

Data Fields Description

  • url: URL of the commit
  • hash: Hash identifier of the commit
  • msg: Full commit message (raw text)
  • message_type: Commit type specified by the developer (e.g., fix, feat)
  • description: Descriptive text following the commit type (use this field to hide type information from tools/LLMs)
  • why: Indicates presence of "why" rationale in commit message (0 = absent, 1 = present)
  • what: Indicates presence of "what" change information in commit message (0 = absent, 1 = present)
  • author: Developer who authored the commit
  • email: Author's email address
  • date: Commit date (YYYY-MM-DD format)
  • modified_files: Modified files metadata including diffs
  • issues: Issues referenced in commit message (via #xxx notation)
  • prs: Pull requests referenced in commit message (via #xxx notation)
  • comments: Code review comments on the commit
  • repo_name: Source repository name (format: owner/repo)
  • strict_ccs: CCS compliance level (0 = near-perfect adherence, 1 = majority adherence)
  • AST_changes: AST changes parsed by tree-sitter

Evaluation Metrics and Benchmark Suites

Ten-category Evaluation

  1. Test your target tool on our Ten-category-eval Dataset

  2. Save results in Ten-category_result.json with these required fields:

    • "message_type" (original commit type)
    • "Ten-category_result" (tool-predicted type)
  3. Run the evaluation script:

    python Ten-category_eval.py
    
  4. Generated reports will appear in ./Ten-category_reports:

    • Classification metrics: classification_metrics.csv
    • Confusion matrix visualization: confusion_matrix.png

CMG Evaluation

  1. Test your target tool on our CMG-eval Dataset

  2. Save results in CMG_result.json containing:

    • All original dataset fields
    • Additional "CMG_result" field (tool-generated message)
  3. Evaluate Traditional Metrics:

    python CMG_eval_traditional_metrics.py
    
    • Output: ./CMG_reports/CMG_eval_traditional_metrics.csv
    • Metrics: BLEU, ROUGE-L, METEOR
  4. Evaluate Binary Metrics:

    Before running CMG_eval_binary_metrics.py, please read Configuration Note first.

    python CMG_eval_binary_metrics.py
    
    • Outputs in ./CMG_reports:
      • CMG_eval_binary_metrics.json (augmented with binary scores per entry)
      • CMG_eval_binary_metrics.csv (aggregated binary metrics)

    Configuration Note: The binary evaluation script uses:

About

CommitSuite: A comprehensive benchmark for commit classification and message generation. Features a large-scale, clean dataset with Conventional Commits Specification (CCS) across multiple languages, plus a novel binary evaluation metric for automated, reference-free assessment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages