Skip to content

Conversation

vishnya
Copy link
Contributor

@vishnya vishnya commented Jun 15, 2025

PR: feat: Introduce Task-based workflow for all project operations

This PR introduces Taskfile as the new, unified entry point for all developer and user-facing operations, such as setup, testing, and running experiments. The motivation is to replace a collection of standalone scripts and manual command sequences with a single, self-documenting, and reproducible workflow. This simplifies onboarding and ensures consistency across all environments.

Summary of Notable Changes (File by File)

File Change Rationale
Taskfile.yaml Added The new heart of the workflow. Defines a set of clean, high-level tasks (setup, test, run, run_fisher, etc.) that orchestrate all necessary environment setup, downloads, and script executions. All configuration variables are documented with inline comments.
run_compute_fisher.sh Deleted This script's logic (environment setup, file cleanup, Ray management, and Python execution) has been fully absorbed into the run_fisher task in Taskfile.yaml, removing redundancy.
replace_files.sh Modified The script no longer creates temporary ld_path.txt and pl_path.txt files, as it now cleans them up upon completion. This keeps the project directory clean without needing .gitignore entries.
tests/test_taskfile.py Added A new test file that validates the Taskfile.yaml. It ensures that critical tasks are defined and that the file does not contain unresolved template placeholders.
pytest.ini Added Configures pytest to look for tests exclusively within the tests/ directory. This prevents it from discovering and running tests from downloaded dependency repositories (e.g., in data/raid/repos_new).
README.md Modified Updated to reflect the new Taskfile-based workflow, instructing users to run task <command> instead of using individual scripts.
.gitignore Modified Removed ignore patterns for temporary files that are now cleaned up automatically by their generating scripts.
requirements.txt No Change The final requirements are identical to main.
dynamic_database.py Modified Misplaced comment blocks were breaking runs.

How to Test This PR

Reviewers can optionally validate these changes by checking out the branch and running the primary workflows, which now feel much cleaner:

# 1. Complete one-time project setup
task setup

# 2. Run the test suite
task test

# 3. Launch the main training and proving process
task run

# 4. Run the Elastic Weight Consolidation (EWC) workflow
task run_fisher

Known Issues & Next Steps

  • Google Drive Quota: The download_checkpoint_data task, part of the setup workflow, relies on gdown to fetch large files from Google Drive. During heavy testing, it's possible to hit a download quota, which appears to last up to 24 hours. A future improvement would be to host these artifacts on a more robust platform (e.g., Hugging Face Hub, AWS S3).
  • Refactor replace_files.sh: The file-patching mechanism in replace_files.sh is effective but somewhat brittle. A more robust, Python-based solution for applying these patches would be a valuable next step.
  • Create a config file from which scripts can pull from (as in the TODOs).

@vishnya
Copy link
Contributor Author

vishnya commented Jun 15, 2025

@Adarsh321123 @motiwari please review!

@vishnya vishnya force-pushed the rp/taskification branch 2 times, most recently from 8ac5ec0 to ee8399a Compare June 16, 2025 01:51
@vishnya vishnya force-pushed the rp/taskification branch from ee8399a to 1b6d85c Compare June 16, 2025 01:56
@motiwari
Copy link
Collaborator

This looks way, way better than having the user follow many different instructions (with many different ways things could go wrong) -- and I'm learning how to use Taskfiles effectively for the first time. Thank you for doing this!

I reviewed the code changes and they look good.

The only question I have is whether these changes preserve correctness. Does the code in this PR still produce the same results from the original paper?

@vishnya
Copy link
Contributor Author

vishnya commented Jun 16, 2025

hi @motiwari ! That’s a great point. For me the runs take a very long time, and it’s hard to tell. Do you happen to have a benchmark toy dataset that we can use for mocking? If not, we should create one.

@Adarsh321123
Copy link
Collaborator

Adarsh321123 commented Jun 18, 2025

Hi @vishnya. Thanks for this contribution! The Taskfile-based workflow is a huge improvement for developer experience and onboarding. For sanity checking correctness, we can simply run the new task run workflow on a small set of repos (like just Compfiles and MIL) and compare key metrics/outputs against those in the paper. Moreover, to check that the entire workflow works, we can use a separate blank repo. You can quickly do these by following the README.md and then hardcoding those repositories in leanagent.py.

@vishnya
Copy link
Contributor Author

vishnya commented Jun 21, 2025

Hi folks! I've been too busy with work last week and haven't had a chance to test. Running the new task run on a small set of repos, and separately a blank repo, makes sense, although it seems fairer to compare the results against running the old flow on the same repos , rather than comparing the paper results.

@vishnya
Copy link
Contributor Author

vishnya commented Jun 23, 2025

@Adarsh321123 @motiwari

1/ Want to confirm that we want to test the following way, and whether or not you think the test will be straightforward to implement (i.e. youve done something similar before):
For a) a blank repo, for b) 1-2 repos, compare results of the run

  • Existing code, with
  • proposed taskfile code.
    2/ Which metrics should we focus on, and what is the range of values/margin of error that is acceptable?

@Adarsh321123
Copy link
Collaborator

@vishnya

1/ Yes, testing that way for (a) and (b) is straightforward.
2/ It would be easiest to focus on LeanAgent's accuracy during lifelong learning for PFR and compare that to the 2.7% reported in Table 5 in the paper. This is somewhat stochastic and may deserve a few runs.

@motiwari
Copy link
Collaborator

Hi @vishnya , my apologies for the delay in getting back to you after our 1:1 discussion.

The steps @Adarsh321123 mentioned seem good. Let us know if you need more details on how to run everything. @Adarsh321123 and I are also discussing setting up a lighter testing framework in #4 and #5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants