Conversation
- TRAINING_GUIDE_KO.md: A-Z guide for training reference-free QE models covering 4 approaches (ReferencelessRegression scratch, UnifiedMetric QE scratch, COMETKiwi fine-tuning, QE model fine-tuning) - scripts/prepare_data.py: Data preprocessing for EN-KO patent QE data - scripts/run_training.sh: Training execution wrapper script - scripts/download_checkpoint.py: Pretrained checkpoint downloader - scripts/evaluate_model.py: Model evaluation with correlation metrics - configs/models/en-ko-qe/: Training configs for all 4 approaches + mini test https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
Major updates based on research of wasanx/ComeTH and MTQE.en-he paper: - Compare our approach with ComeTH (COMETKiwi EN-TH fine-tune, +4.9%) - Add MTQE.en-he findings: full fine-tuning degrades COMETKiwi with small data, but our 9.6M samples mitigate this risk - Add approach 5 (FTHead) and approach 6 (LoRA) as safer alternatives - scripts/finetune_lora.py: LoRA/BitFit/FTHead fine-tuning script - Updated references with 7 new papers and HuggingFace models - Revised recommendation order: FTHead first, then Full FT, then LoRA https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…e update Critical finding: training data is heavily skewed (0.7-1.0 = 59.4%, 0.0-0.3 = 3.5%). Research confirms COMET is "highly susceptible to the distribution of scores in the training data" (Pitfalls paper, WMT 2024). - scripts/analyze_and_rebalance.py: 3 rebalancing strategies (equal, soft/sqrt-inverse-freq, weighted) - TRAINING_GUIDE_KO.md: New section 4.5 on score distribution impact with diagnosis, solutions, and step-by-step rebalancing instructions https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
UnifiedMetric.prepare_sample returns (tuple_of_dicts, targets) not (dict, targets). The training loop now correctly iterates over input sequences for each forward pass, matching UnifiedMetric.training_step. https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
Includes: - Training guide (TRAINING_GUIDE_KO.md) - Data preparation, evaluation, fine-tuning scripts - 6 training approach configs (scratch, fine-tune, LoRA, FTHead) - Score distribution analysis and rebalancing tools
- SummaryWriter 초기화 (output_dir/tensorboard/) - 하이퍼파라미터 텍스트 기록 - Step별 train/step_loss 기록 - 에폭별 train/epoch_loss, val/pearson, val/spearman, val/kendall, val/mse 기록 - 학습 완료 시 writer flush/close https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Claude/integrate comet evaluation 2i fb6
- prepare_data.py: 중복 출력(referenceless_*, unified_qe_*) 제거, 단일 train.csv/val.csv로 통합. 모든 접근법에서 동일 파일 사용. - --include_pairwise 시 pairwise 변환 데이터가 train.csv에 자동 합쳐짐 - 전체 파일(configs, scripts, guide)의 경로 참조를 train.csv/val.csv로 통일 https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Simplify data pipeline: unify output to train.csv/val.csv
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- train/grad_norm: gradient norm per step (학습 안정성 모니터링) - train/lr: learning rate per step - val/pred_distribution, val/target_distribution: 히스토그램 (score collapse 감지) - val/pred_std, val/pred_mean: 예측값 통계 - --eval_interval N: 에폭 중간 validation (N step마다, val_mid/* 로깅) - 콘솔 로그에 grad_norm 추가 https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Add enhanced TensorBoard logging to finetune_lora.py
Step-level: train/step_loss, train/grad_norm, train/lr Epoch-level performance: val/pearson, spearman, kendall, mse, mae Collapse detection: collapse/pred_std, std_ratio, pred_range, pred_iqr Score bias: bias/pred_mean, target_mean, mean_diff, pred_skewness Distribution: histogram of predictions, targets, errors Quantiles: pred_q25, q50, q75 Mid-epoch validation: --eval_interval N (val_mid/* metrics) https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Add comprehensive TensorBoard monitoring for QE fine-tuning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.