-
Notifications
You must be signed in to change notification settings - Fork 1
Refactor generation pipeline pipeline #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afkanpour partially reviewed 43 files and all commit messages, and made 12 comments.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @kohankhaki).
src/schemas/task_schemas.py line 22 at r2 (raw file):
task: str capability: Capability generation_metadata: Optional[Dict] = field(default_factory=dict)
Let's use this sparingly. Please add at least the difficulty, bloom's level, solution type (mcq, open-ended, ...), and the solution attributes to this class.
In addition, since we're converging to multiple choice questions, add another field, e.g., a list of (label, solution) that holds the choices. "label" refers to "A", "B", ..., and "solution" is the actual solution.
Code quote:
generation_metadata: Optional[Dict] = field(default_factory=dict)src/utils/embedding_utils.py line 170 at r1 (raw file):
def generate_schema_capabilities_embeddings(
We don't keep the old capability class anymore, right? Then, let's remove schema from the name here and all comments that follow.
Code quote:
generate_schema_capabilities_embeddingssrc/utils/capability_utils.py line 17 at r1 (raw file):
from inspect_ai.scorer import CORRECT # from langsmith import traceable, tracing_context # COMMENTED OUT FOR DEBUGGING
Do we use langsmith at all? If not, let's remove all references to it.
Code quote:
# from langsmith import traceable, tracing_context # COMMENTED OUT FOR DEBUGGINGsrc/utils/capability_utils.py line 252 at r1 (raw file):
# @traceable( # COMMENTED OUT FOR DEBUGGING # run_type="llm", # )
Are these for langsmith? Please remove these and all similar instances.
Code quote:
# @traceable( # COMMENTED OUT FOR DEBUGGING
# run_type="llm",
# )src/base_stages/generate_tasks.py line 0 at r2 (raw file):
What's the difference between generate_tasks.py and generate_diverse_tasks.py? It seems the previous one implements the combination pipeline whereas this one only generates the tasks. Let's use more descriptive names for modules and functions.
src/base_stages/generate_tasks.py line 116 at r2 (raw file):
"blueprint": blueprint.blueprint, "subtopic": blueprint.subtopic, "difficulty": blueprint.difficulty,
Please move difficulty and bloom's level to the main task class. We'll use them regularly.
More generally, I think generation_metadata should be used sparingly and only for experimental purposes. It has some serious shortcomings. For example, it significantly reduces the readability of the code (the reader has no idea what attributes are in the class). It causes issues with type safety, and makes refactoring more difficult.
Code quote:
"blueprint": blueprint.blueprint,
"subtopic": blueprint.subtopic,
"difficulty": blueprint.difficulty,src/base_stages/generate_tasks.py line 117 at r2 (raw file):
"subtopic": blueprint.subtopic, "difficulty": blueprint.difficulty, "reasoning": blueprint.reasoning,
What is reasoning? Does it refer to the bloom's level? If so, let's use a self-explanatory name.
Code quote:
"reasoning": blueprint.reasoningsrc/utils/prompts.py line 0 at r1 (raw file):
There is prompts.py under base_stages/ and there's this module. What's the difference between them? And why don't we merge them?
src/utils/capability_management_utils.py line 195 at r1 (raw file):
def filter_schema_capabilities_by_embeddings( capabilities: List[Any], # List of schema Capability objects embeddings: List[torch.Tensor],
Should we add embedding as an optional field to the capability class?
Code quote:
embeddings: List[torch.Tensor]src/base_stages/task_dataclasses.py line 0 at r2 (raw file):
Does this define classes for the 5-stage generation pipeline? If so, let's merge it with the main schemas. We don't want to keep two sets of classes around. If there are experimental work, they should go in an experimental directory.
README.md line 273 at r1 (raw file):
1. **Follow Schema Guidelines**: All data objects must use the schema classes defined in `src/schemas/`: - Use `Domain`, `Area`, `Capability`, `Task`, `TaskSolution`, `ValidationResult` objects
TaskSolution should be part of Task. Let's add new fields for each of these: solution, multiple choices, task_type (with values like OpenEnded, MultipleChoice, ...)
In addition, when a subject model solves a task, how do we store its answer?
Code quote:
`Task`, `TaskSolution`src/model.py line 13 at r1 (raw file):
from langchain_google_genai import ChatGoogleGenerativeAI from langchain_openai import ChatOpenAI # from langsmith import traceable # COMMENTED OUT FOR DEBUGGING
Let's remove all of the commented-out lines.
Code quote:
# from langsmith import traceable # COMMENTED OUT FOR DEBUGGING
PR Type
Fix
Short Description
This pull request refactors the base generation pipeline to use the schema in
src/schemasand use experimental "diverse task generator" for task generation and validation.It also removed legacy unused files.
Tests Added
None
This change is