Skip to content

Save and output number of samples of each task #851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

itsmejul
Copy link

@itsmejul itsmejul commented Jul 3, 2025

This PR closes #804 .

What does this PR do?

This PR adds the num_samples field to both the results_dict that is saved as json, but also the final_dict that is passed to make_results_table() as requested in the issue. All existing elements in these dicts are left unchanged.

  "results": {
    "lighteval|gsm8k|0": {
      "extractive_match": 0.6,
      "extractive_match_stderr": 0.1632993161855452
    },
    "all": {
      "extractive_match": 0.6,
      "extractive_match_stderr": 0.1632993161855452
    }
  },
  "num_samples": {
    "lighteval|gsm8k|0": 10,
    "all": 10
  }

The keys in num_samples are the exact same as the keys in results (meaning we calculate the number of samples for each individual task, as well as all grouped tasks by summing their subtasks, and the "all" task), allowing us to add the number of samples to the markdown table created in make_results_table() like so:
image

To guarantee backwards compatibility in make_results_table(), the "Number of Samples" fields will just be empty in the case that the result_dict does not contain num_samples.
The samples are counted via the length of each entry in details_logger.details.

Changes

  • Added calculate_num_samples() method in EvaluationTracker
  • Added num_samples field to results_dict in EvaluationTracker.save()
  • Added num_samples field to final_dict in EvaluationTracker.generate_final_dict()
  • Added "Number of Samples" field to markdown table generated in make_results_table()
  • Modified example results.json in docs to include the new entry

Tests

All tests passed locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FT] showing count in Markdown summary table
1 participant