You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is because llama_sample_token in common.cpp uses a static for mirostat1 and 2 mu. Because of this, different sequences will affect each other (including ones that were already deleted).
The fix for this doesn't really seem that simple. I don't think it can be done only inside llama_sample_token. I think llama_sample_token is going to have to get changed to take something like a sequence-specific sampler state structure where stuff like that sequence's mu could get stored. Then it would be up to the app to reset mu when appropriate (like the sequence ends and the slot will be reused).