## Context From e2e testing: - Quality QA questions occasionally ask multiple things in one question, conflating multiple meaning units. This makes the compare step unfairly strict. - Quality compare sometimes penalizes generalized answers that preserve core meaning but differ in specificity from the original QA answer. ## Scope - Prompt-tune meaning unit extraction and/or quality QA generation to produce single-focus questions - Guide the quality compare prompt to treat generalized-but-correct answers as matching - Measure impact on utility scores across bio and legal datasets ## Files - `src/anonymizer/engine/rewrite/qa_generation.py` (QA generation prompts) - `src/anonymizer/engine/rewrite/evaluate.py` (compare prompt).
Context
From e2e testing:
Scope
Files
src/anonymizer/engine/rewrite/qa_generation.py(QA generation prompts)src/anonymizer/engine/rewrite/evaluate.py(compare prompt).