Skip to content

Conversation

@Negiiiin
Copy link
Collaborator

@Negiiiin Negiiiin commented Jan 8, 2026

PR Type

[Feature]
Added AutoBencher and SYNQUE metrics


This change is Reviewable

@Negiiiin Negiiiin requested a review from afkanpour January 8, 2026 18:10
Copy link
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 5 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Negiiiin).


src/run_quality_evaluation.py line 41 at r1 (raw file):

    """
    Collect all accuracy values from JSON files in a directory (recursively).
    

Please remove all instances of trailing whitespace

Code quote:

···

src/run_quality_evaluation.py line 60 at r1 (raw file):

def _load_model_accuracies_from_dir(base_dir: str) -> Dict[str, float]:

Since this function calculates average accuracy (is it over problems?), let's rename it to reflect that. for example, it could be _load_avg_model_accuracy_from_dir()

Code quote:

_load_model_accuracies_from_dir

src/run_quality_evaluation.py line 230 at r1 (raw file):

        return np.array([]), []
    
    return embeddings_array, []

All instances of return in this function return an empty list as the second element. Should we change that?

Code quote:

return embeddings_array, []

src/utils/quality_evaluation_utils.py line 29 at r1 (raw file):

) -> float:
    """
    Compute benchmark difficulty given per-model accuracies.

For each metric, please add a comment saying which source paper the metric was introduced/proposed.

Code quote:

Compute benchmark difficulty given per-model accuracies.

src/utils/quality_evaluation_utils.py line 303 at r1 (raw file):

# ===========================
# ---- Diversity Metrics (PAD, MMD, MDM)

Do these use the implementation of the SynQue paper or did you implement them from scratch?

Code quote:

# ---- Diversity Metrics (PAD, MMD, MDM)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants