Skip to content

2023-24 Part II Project, "Named Entity Recognition for Morphologically Rich Languages: a Modern Hebrew NLP Project"

Notifications You must be signed in to change notification settings

Yu-val-weiss/hebrew-ner

Repository files navigation

Hebrew NER: Part II Project, How to Run The App

Requirements

  • A working Python 3.8 installation (the app might be forwards compatible, but not tested)
  • The Docker daemon running
  • docker-compose installed

.env file

Users should create a .env file with the following parameters filled accordingly. Defaults are shown where applicable. ABSOLUTE_PATH_HEBREW_NER is the absolute path to the current directory.

ABSOLUTE_PATH_HEBREW_NER=
YAP_HOST=127.0.0.1
YAP_PORT=8000

Set up Python venv

To create the virtual environment

python3.8 -m venv venv

To activate it in the terminal

source venv/bin/activate

To install the library requirements

pip install -r requirements.txt

All further instructions assume that the virtual environment is active.

Deactivating the venv

Simply execute the following

deactivate

Installing fastText

It is now necessary to install the fastText binary.

We will place it in a folder called fastText.

mkdir fasttext && cd fasttext
wget https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.he.zip
unzip wiki.he.zip

Testing the installation

To test that the installation worked, start a Python interpreter

Now run the following

import fasttext

ft = fasttext.load_model('fasttext/wiki.he.bin')

Note: the following warning may appear.

Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar.

This can be safely ignored.

Installing the models

Now my extension models must be downloaded.

Run the following to download them (from Figshare)

wget -O trained_models.zip https://figshare.com/ndownloader/articles/25773039?private_link=ab195c4231927a669e0e

Once this has downloaded run

mkdir -p ncrf_hpc_configs/transformer &&
 unzip trained_models.zip -d ncrf_hpc_configs/transformer

Running

Use the following to run the app alongside YAP using Docker compose

docker compose up

Shutting down

The app can be shut down using a keyboard interrupt in the same terminal in which the app was run, or alternatively using

docker compose down

Force rebuild

Use the following to force-rebuild

docker compose build --no-cache

then run

docker compose up

Running the app natively

Use the command

python ner_app.py

If making changes to the app, can enable hot reload by running

python ner_app.py --reload

Can customise the port and host by running

python ner_app.py --host HOST --port PORT

Alternatively, can change the values in the Dockerfile, in the ENTRYPOINT line.

The app should be exited with ctrl+c.

Running just YAP using Docker

Run the following command

docker compose up yap

To monitor the process of it loading can run the following

docker ps

And find the container whose image is called 'hebrew-ner_yap', and copy the container ID. Now run

docker logs *CONTAINER_ID*

If it says `All models loaded!' then Yap is up and running (if not just wait a little longer, it shouldn't take more than a minute).

Making a request

Once the app is running a request can be made.

Here is an example one that can be run from the command line using curl.

curl --request POST \
  --url http://127.0.0.1:5000/predict \
  --header 'Content-Type: application/json' \
  --data '{
        "text": "גנן גידל דגן בגן.",
        "model": "token_single"
}'

Available endpoints

The app has two available endpoints /tokenize and /predict, both of which expect POST requests.

/tokenize

Used to tokenize a string input into sentences. Requires a JSON in the form:

{
    "text": "string"
}

/predict

Used to perform NER prediction. Can specify which model to run (token_single, token_multi, morph or hybrid). Requires a JSON in the form:

{
    "text": "text",
    "model": "token_single"
}

Documentation

Further documentation (auto generated by FastAPI) is available (when the app is running) at http://127.0.0.1:5000/docs, or the new host/port if they are modified.

About

2023-24 Part II Project, "Named Entity Recognition for Morphologically Rich Languages: a Modern Hebrew NLP Project"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages