Attention mechanism - intuition

The lecture notes on Deep Learning from Francois Fleuret contain a very nice intuition on why and where the attention mechanism works better than conv nets.

In the example, he considers a toy sequence-to-sequence problem with triangular and rectangular shapes with random heights as input. The expected target contains the same shapes but with their heights averaged, as in the figure below.

Since there was no source code available in his lecture (as far as I know), I have tried to reproduce the same intuition in this notebook. As we can see, with the exact training procedure, the attention mechanism is able to learn the task much faster than the conv net. The conv net model's poor performance is expected due to its inability to look far away the input signal to learn the task. There are plenty of mechanisms we can equip the conv net with to make it work better (more layers, fully connected layers, ...), but the attention mechanism is a very simple and elegant solution to this problem.

I hope you find this notebook useful and that Professor Fleuret don't mind me using his example.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
1.attention_intuition.ipynb		1.attention_intuition.ipynb
2.attention_positioning_encoding.ipynb		2.attention_positioning_encoding.ipynb
3.multi_head_attention.ipynb		3.multi_head_attention.ipynb
4.attention_bitnet.ipynb		4.attention_bitnet.ipynb
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention mechanism - intuition

About

Releases

Packages

Languages

tiagofrepereira2012/attention_intuition

Folders and files

Latest commit

History

Repository files navigation

Attention mechanism - intuition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages