CU-8696nbm9j: Add module to convert vocab vectors #504
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a module to convert the vocab vectors from the default (or really anything) to a smaller length.
The default vocab vector length is 300. However, we don't really make use of all this information. Experiments show that we can go quite a lot smaller in vocab size and retain the same performance. See e.g: https://gist.github.com/mart-r/e9db909cde1922464bcc753f54006994
Or (somewhat more comprehensively): https://gist.github.com/mart-r/21460286466d17b9f23719ba3f4dc938
The benefits of using a smaller vocab size mainly boil down to (examples at 50 vector size):
NOTE:
There might be improvements we could do here: