Released fine-tuned weights vs. Reproduced SFT results

Hello, thank you very much for sharing this excellent research and providing both the code and datasets. 
I really appreciate the contribution to the community :)

While experimenting with the fine-tuning scripts, I noticed a significant discrepancy between the performance of the released fine-tuned weights and the weights I obtained by running the provided supervised fine-tuning (SFT) script.

For example, on ETTh1 (4 horizons, average MAE):
	•	Base (pretrained only): 0.231
	•	Chat (released fine-tuned weights): 0.144
	•	Chat-reproduced (pretrained + SFT using provided script & dataset): 0.408

I used the fine-tuning dataset exactly as provided, and I did not make any modifications to the fine-tuning script. Given this, I am wondering:
	•	Is there any difference between the released fine-tuned weights and the ones obtained by running the current script?
	•	Could there be additional training steps, hyperparameters, or preprocessing procedures that were applied when producing the released weights but are not reflected in the script?

Any clarification on this would be greatly appreciated. Thank you again for your valuable work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Released fine-tuned weights vs. Reproduced SFT results #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Released fine-tuned weights vs. Reproduced SFT results #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions