Refactor generation pipeline pipeline #48

kohankhaki · 2025-12-12T10:40:03Z

PR Type

Fix

Short Description

This pull request refactors the base generation pipeline to use the schema in src/schemas and use experimental "diverse task generator" for task generation and validation.
It also removed legacy unused files.

Tests Added

None

This change is

afkanpour

@afkanpour partially reviewed 43 files and all commit messages, and made 12 comments.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @kohankhaki).

src/schemas/task_schemas.py line 22 at r2 (raw file):

    task: str
    capability: Capability
    generation_metadata: Optional[Dict] = field(default_factory=dict)

Let's use this sparingly. Please add at least the difficulty, bloom's level, solution type (mcq, open-ended, ...), and the solution attributes to this class.
In addition, since we're converging to multiple choice questions, add another field, e.g., a list of (label, solution) that holds the choices. "label" refers to "A", "B", ..., and "solution" is the actual solution.

Code quote:

generation_metadata: Optional[Dict] = field(default_factory=dict)

src/utils/embedding_utils.py line 170 at r1 (raw file):

def generate_schema_capabilities_embeddings(

We don't keep the old capability class anymore, right? Then, let's remove schema from the name here and all comments that follow.

Code quote:

generate_schema_capabilities_embeddings

src/utils/capability_utils.py line 17 at r1 (raw file):

from inspect_ai.scorer import CORRECT

# from langsmith import traceable, tracing_context  # COMMENTED OUT FOR DEBUGGING

Do we use langsmith at all? If not, let's remove all references to it.

Code quote:

# from langsmith import traceable, tracing_context  # COMMENTED OUT FOR DEBUGGING

src/utils/capability_utils.py line 252 at r1 (raw file):

    # @traceable(  # COMMENTED OUT FOR DEBUGGING
    #     run_type="llm",
    # )

Are these for langsmith? Please remove these and all similar instances.

Code quote:

    # @traceable(  # COMMENTED OUT FOR DEBUGGING
    #     run_type="llm",
    # )

src/base_stages/generate_tasks.py line 0 at r2 (raw file):
What's the difference between generate_tasks.py and generate_diverse_tasks.py? It seems the previous one implements the combination pipeline whereas this one only generates the tasks. Let's use more descriptive names for modules and functions.

src/base_stages/generate_tasks.py line 116 at r2 (raw file):

                    "blueprint": blueprint.blueprint,
                    "subtopic": blueprint.subtopic,
                    "difficulty": blueprint.difficulty,

Please move difficulty and bloom's level to the main task class. We'll use them regularly.

More generally, I think generation_metadata should be used sparingly and only for experimental purposes. It has some serious shortcomings. For example, it significantly reduces the readability of the code (the reader has no idea what attributes are in the class). It causes issues with type safety, and makes refactoring more difficult.

Code quote:

                    "blueprint": blueprint.blueprint,
                    "subtopic": blueprint.subtopic,
                    "difficulty": blueprint.difficulty,

src/base_stages/generate_tasks.py line 117 at r2 (raw file):

                    "subtopic": blueprint.subtopic,
                    "difficulty": blueprint.difficulty,
                    "reasoning": blueprint.reasoning,

What is reasoning? Does it refer to the bloom's level? If so, let's use a self-explanatory name.

Code quote:

"reasoning": blueprint.reasoning

src/utils/prompts.py line 0 at r1 (raw file):
There is prompts.py under base_stages/ and there's this module. What's the difference between them? And why don't we merge them?

src/utils/capability_management_utils.py line 195 at r1 (raw file):

def filter_schema_capabilities_by_embeddings(
    capabilities: List[Any],  # List of schema Capability objects
    embeddings: List[torch.Tensor],

Should we add embedding as an optional field to the capability class?

Code quote:

embeddings: List[torch.Tensor]

src/base_stages/task_dataclasses.py line 0 at r2 (raw file):
Does this define classes for the 5-stage generation pipeline? If so, let's merge it with the main schemas. We don't want to keep two sets of classes around. If there are experimental work, they should go in an experimental directory.

README.md line 273 at r1 (raw file):

1. **Follow Schema Guidelines**: All data objects must use the schema classes defined in `src/schemas/`:
   - Use `Domain`, `Area`, `Capability`, `Task`, `TaskSolution`, `ValidationResult` objects

TaskSolution should be part of Task. Let's add new fields for each of these: solution, multiple choices, task_type (with values like OpenEnded, MultipleChoice, ...)

In addition, when a subject model solves a task, how do we store its answer?

Code quote:

`Task`, `TaskSolution`

src/model.py line 13 at r1 (raw file):

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_openai import ChatOpenAI
# from langsmith import traceable  # COMMENTED OUT FOR DEBUGGING

Let's remove all of the commented-out lines.

Code quote:

# from langsmith import traceable  # COMMENTED OUT FOR DEBUGGING

kohankhaki added 2 commits December 12, 2025 01:41

Remove run_capability_generation.py

1c91933

used task diverse verification impelementation.

41fa531

afkanpour self-requested a review December 15, 2025 14:56

kohankhaki added 2 commits December 15, 2025 09:01

fixed docstrings and comments.

6e9d84d

separated out task solution generation from task generation.

5509410

afkanpour requested a review from Negiiiin January 8, 2026 15:15

afkanpour requested changes Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor generation pipeline pipeline #48

Refactor generation pipeline pipeline #48

Uh oh!

kohankhaki commented Dec 12, 2025 •

edited by afkanpour

Loading

Uh oh!

afkanpour left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refactor generation pipeline pipeline #48

Are you sure you want to change the base?

Refactor generation pipeline pipeline #48

Uh oh!

Conversation

kohankhaki commented Dec 12, 2025 • edited by afkanpour Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Tests Added

Uh oh!

afkanpour left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kohankhaki commented Dec 12, 2025 •

edited by afkanpour

Loading