Comparison of three approaches for the Natural Language Processing task WiC.
An explanation regarding task and tested models can be found in the report.pdf file.
I developed this code on an Ubuntu 20.04.2 LTS machine.
Unless otherwise stated, all commands here are expected to be run from the root directory of this project.
To run test.sh, we need to perform two additional steps:
- Install Docker
- Setup a client
test.sh essentially setups a server exposing the model through a REST Api and then queries this server, evaluating the model. So first, you need to install Docker:
curl -fsSL get.docker.com -o get-docker.sh
sudo sh get-docker.sh
rm get-docker.sh
sudo usermod -aG docker $USER
Unfortunately, for the latter command to have effect, you need to logout and re-login. Do it before proceeding.
The model will be exposed through a REST server. In order to call it, we need a client. The client has been written in the evaluation script, but it needs some dependencies to run. We will be using conda to create the environment for this client.
conda create -n nlp2021-hw1 python=3.7
conda activate nlp2021-hw1
pip install -r requirements.txt
test.sh is a simple bash script. To run it:
conda activate nlp2021-hw1
bash test.sh data/dev.jsonl
Actually, you can replace data/dev.jsonl to point to a different file, as far as the target file has the same format.
The StudentModel class takes as input two parameters. One is the device, the other one has as default value "SE": the best model is tested. The other two possible parameter values are "IDF-AVG" and "SIF", that can be used in order to test the other two implemented models. In the stud folder there are the corresponding three ipython notebooks containing the three training codes.
In order to be able to run the code with the "SE" model, it is necessary to download the glove.840B.300d vectors (they are needed also for the other two models) and the model and put them in the model/ folder.