Train Sort test #6

mauicv · 2023-12-22T22:44:12Z

As an initial test to verify the transformer works i'm training it to sort a sequence of 6 numbers with 3 digits in order of size. It usually gets accuracy ~ 99% on this after less than 10 epochs. I turns out that removing the MLP layer has almost no effect on the result so we should devise a harder test to ensure that we're testing both the MLP and the Attention layer.

mauicv added the bug Something isn't working label Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Sort test #6

Train Sort test #6

mauicv commented Dec 22, 2023

Train Sort test #6

Train Sort test #6

Comments

mauicv commented Dec 22, 2023