Skip to content

Conversation

@chughtapan
Copy link
Owner

  • Change --dataset to --datasets (comma-separated, default: train,dev)
  • Add --default-few-shot flag (default: zero-shot, no examples)
  • Update prompts.py to conditionally include few-shot examples
  • Update output_dir and experiment_name fixtures for multi-dataset
  • Preserve --start-from functionality from main branch
  • Update all documentation

🤖 Generated with Claude Code

Tapan Chugh and others added 6 commits December 4, 2025 15:52
- Change --dataset to --datasets (comma-separated, default: train,dev)
- Add --default-few-shot flag (default: zero-shot, no examples)
- Update prompts.py to conditionally include few-shot examples
- Update output_dir and experiment_name fixtures for multi-dataset
- Preserve --start-from functionality from main branch
- Update all documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Instead of trying to remove the "examples" intro line via string
replacement (which was fragile), remove it from the template and
only add it when few-shot mode is enabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove --experiment-dir from conftest
- Add --appworld-experiment-name for custom experiment names
- Auto-infer as {model}/{datasets} when not specified
- Fix zero-shot prompt (add examples intro only in few-shot mode)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- AppWorld output_dir now respects --output-dir from main conftest
- Auto-infers as results/{model}/{datasets}/outputs/ when default
- Add --appworld-experiment-name for custom AppWorld experiment names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add get_datasets_dir() helper to DRY up datasets parsing
- Fix validation mode to use --datasets instead of old --dataset
- Fix failure_report_dir to derive from output_dir.parent
- Import parse_datasets and get_datasets_dir at top of file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- output_dir now uses results/{experiment_name}/outputs/
- Added get_experiment_name() helper for use in fixtures and pytest_generate_tests
- Validation mode now uses experiment_name for consistent directory lookup
- Both output_dir and experiment_name respect --appworld-experiment-name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants