-
Notifications
You must be signed in to change notification settings - Fork 11
Initial checks cosmetic edits #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
liranc6
wants to merge
20
commits into
OPTML-Group:main
Choose a base branch
from
liranc6:initial-checks-cosmetic-edits
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… fast eval. files changed: eval.py, metrics/knowmem.py, metrics/privleak.py, metrics/verbmem.py
The primary purpose is to improve evaluation robustness and flexibility when managing model outputs and debug workflows. The primary changes are: - Updated `eval_model` to ensure `forget_data`, `retain_data`, and `holdout_data` are initialized consistently before use. - Replaced hardcoded paths with `os.path.join` using `MUSE_DIR` in `eval_model` for improved path handling. - Added a `kwargs` parameter to both `eval_model` and `load_then_eval_models` to support dynamic control over file creation and loading. - Implemented conditional logic in `eval_model` for managing `privleak` file generation based on `kwargs['create_new_files']`. - Removed unused imports and dynamic import logic from `eval.py`, replacing `importlib` with `sys.path.append` to streamline module loading. - Improved debug visibility in `eval_model` with additional `print` statements for key file paths and parameter values. - Increased `debug_subset_len` from 2 to 50 in `eval_model` for broader test coverage during debug mode. - Updated `exp.ipynb` to align with changes in model handling and evaluation behavior in `eval_model`.
8f76d7d to
aba8ff4
Compare
Purpose: Improve the clarity and depth of ILL evaluation, and introduce new tools for classifier-based analysis. Changes: - Updated to clean outputs, improve ROC curves, and set . - Improved structure and markdown clarity in , with added analysis on loss distributions and unlearning. - Added and for classifier-based ILL feature exploration. - Added script for reproducible, scriptable Random Forest analysis. These updates improve reproducibility, interpretability, and support deeper ILL feature analysis.
…ion in notebooks The primary purpose is to fix broken imports and implement functional Input Loss Landscape feature computation for machine learning interpretability analysis. The primary changes are: - Enhanced import structure in with additional sklearn modules and SHAP availability check. - Replaced broken function calls with working ILL feature computation pipeline. - Added comprehensive logistic regression analysis with performance metrics, confusion matrix, and feature importance analysis. - Integrated permutation importance computation and visualization for feature interpretability. - Fixed execution flow by removing error-prone cells and replacing with successful feature extraction results. - Updated notebook outputs to show successful ILL feature computation for forget/retain/holdout datasets. - Added baseline logistic regression performance evaluation with 74% accuracy and detailed classification report. - Modified to align with the working implementation in .
The primary purpose is to evaluate the loss landscape of first neighbor sentences to understand the impact of unlearning. The primary changes are: - Created a new notebook `MUSE/notebooks/1st_neighbor_classification.ipynb` to analyze the loss landscape of first neighbor sentences. - Modified `loss_landscape.py` to extract logic into `new_ILL_eval`, `get_features`, and `normalize_features`. - Replaced dynamic imports with `sys.path` appends in `utils.py`. - Added `transformers` to `requirements.txt`. - Increased UMAP dimensionality from 2 to 10 in `embedding.py`. - Added AUC heatmap and bar chart of top features in `visualization.py`. - Modified `plotting.py` to return `matplotlib` figure objects instead of file paths. - Updated `plotting.py` to align with changes made in `visualization.py`.
The purpose of this change is to prevent errors when saving the statistical distances heatmap. The changes include: - Added `os.makedirs(plots_base_dir, exist_ok=True)` before saving the heatmap in `eval_with_ILL.py` to ensure the directory exists.
The primary purpose is to provide a reproducible workflow for evaluating Input Loss Landscape (ILL) features on the TOFU dataset using a Llama-2-7b model. The primary changes are: - Added `TOFU/notebooks/eval_with_ILL.ipynb` containing a step-by-step pipeline for: - Loading and preprocessing the TOFU dataset from Hugging Face. - Loading model and tokenizer with correct prompt formatting. - Running ILL evaluation using project utilities and saving results. - Extracting and normalizing ILL feature tensors for analysis. - Visualizing loss landscape features with matplotlib plots. - The notebook demonstrates integration between the TOFU, MUSE, and project source directories. - Example code for prompt formatting, model inference, and loss calculation is included for clarity. - Notebook serves as a reference for future ILL experiments and analysis on TOFU.
…aluation Detailed description: - Introduced to provide a full pipeline for running, analyzing, and visualizing unlearning experiments across multiple models and benchmarks. - Added argument parsing, configuration, and directory management for reproducible experiments. - Implemented data loading utilities for TOFU, WMDP, and MUSE datasets. - Integrated model loading, evaluation, and feature extraction using HuggingFace Transformers. - Added baseline and custom metric computation (AUC, min-k, zlib, ROUGE-L, etc.). - Created a class for robust saving/loading of results, tables, and visualizations. - Automated table generation (aggregate, family, detailed) and summary statistics. - Added plotting and visualization routines for performance comparison. - Ensured compatibility with Weights & Biases logging. - Updated to return trained classifiers for downstream saving and analysis. - Modified binary comparison training in to return classifier objects. - These changes enable end-to-end experiment management, result analysis, and reporting for the project.
…ults aggregation Detailed description: - Added notbooks/ablations_results/create_commands.ipynb to generate experiment command-line arguments for ablation studies across parameters such as n_tokens, max_new_tokens, neighbor_dist, and max_neighbors. - Added notbooks/ablations_results/max_new_tokens_read_results.ipynb to load, aggregate, and visualize experiment results for different max_new_tokens values, including summary tables and TSV exports for further analysis. - Both notebooks support reproducible experiment setup and results inspection, with code for classifier loading, dummy predictions, and formatted output for Google Sheets. These changes enable systematic parameter sweeps and facilitate detailed ablation analysis of unlearning experiments.
…ation and new result scraping tool The primary purpose is to refine the setup for ablation experiments by adjusting parameter configurations in command generation and introducing a new utility to systematically extract and organize result file paths from terminal run outputs. The key changes are: Modified notbooks/ablations_results/create_commands.ipynb to update experiment parameter lists, including adjustments to n_tokens, max_new_tokens, neighbor_dist, and max_neighbors values, and refined the command output structure for better indexing. Updated notbooks/ablations_results/neighbor_dist_read_results.ipynb to change the Python version from 3.11.13 to 3.11.12. Added new notbooks/ablations_results/scrap_results_file_names.ipynb to scrape result file paths from terminal output files, build a DataFrame mapping experiment parameters to job indices and paths, and enable querying specific experiments.
Purpose: Provide a reproducible notebook to extract experiment result file paths from terminal outputs and synchronize experiment command lists across ablations notebooks. What changed: - Added `notbooks/ablations_results/scrap_results_file_names.ipynb` to build parameter lists, parse terminal outputs, extract JSON paths via regex, and create `results_df` with pandas. - Updated `notbooks/ablations_results/create_commands.ipynb` to clean command listings and fix job indices. - Updated `notbooks/ablations_results/generic_read_results.ipynb` to generate plots to examine performance w.r.t parameter, also summarize in tables the results and find outliers run jobs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.