In this guide, we show how to train and evaluate your own models. We assume that you created a corpus with the appropriate data format in data/corpus/<corpus_name>
.
All training scripts are located under deidentify/methods/
and are prefixed with run_
. For example, use deidentify/methods/bilstmcrf/run_bilstmcrf.py
to train a BiLSTM-CRF model.
Each script takes a set of arguments that you can print as follows:
python deidentify/methods/bilstmcrf/run_bilstmcrf.py --help
Below is a list of available scripts:
> tree -P run_*.py -I __* deidentify/methods/
deidentify/methods/
├── bilstmcrf
│ ├── run_bilstmcrf.py # Train a BiLSTM-CRF
│ └── run_bilstmcrf_training_sample.py # Train a BiLSTM-CRF with a fraction of the training set
├── crf
│ ├── run_crf.py # Train a CRF model
│ ├── run_crf_hyperopt.py # Perform a random search for a CRF model
│ ├── run_crf_learning_curve.py # Print a learning curve for a CRF model
│ └── run_crf_training_sample.py # Train a CRF with a fraction of the training set
└── deduce
└── run_deduce.py # Run the DEDUCE tagger on your dataset
All scripts save their predictions and model artifacts (e.g., pickle files, training logs) to output/predictions/<corpus_name>/<script>_<run_id>/
. This allows you to evaluate the predictions at a later stage.
Execute the command below to run the BiLSTM-CRF pipeline on the corpus ons
(aka. NUT) with run id demo_run
:
python deidentify/methods/bilstmcrf/run_bilstmcrf.py ons demo_run \
--pooled_contextual_embeddings \
--train_with_dev
The script saves the train/dev/test set predictions to output/predictions/ons/bilstmcrf_dummy_run
. We can the script below to evaluate a single run:
python deidentify/evaluation/evaluate_run.py nl data/corpus/ons/test/ data/corpus/ons/test/ output/predictions/ons/bilstmcrf_demo_run/test/
It should print an evaluation report on an entity-level, token-level and blind token-level for each PHI tag. Example:
> python deidentify/evaluation/evaluate_run.py nl data/corpus/ons/test/ data/corpus/ons/test/ output/predictions/ons/bilstmcrf_demo_run/test/
entity level tp: 3168 - fp: 288 - fn: 469 - tn: 0 - precision: 0.9167 - recall: 0.8710 - accuracy: 0.8071 - f1-score: 0.8933
Address tp: 132 - fp: 26 - fn: 24 - tn: 0 - precision: 0.8354 - recall: 0.8462 - accuracy: 0.7253 - f1-score: 0.8408
Age tp: 30 - fp: 8 - fn: 11 - tn: 0 - precision: 0.7895 - recall: 0.7317 - accuracy: 0.6122 - f1-score: 0.7595
Care_Institute tp: 142 - fp: 65 - fn: 74 - tn: 0 - precision: 0.6860 - recall: 0.6574 - accuracy: 0.5053 - f1-score: 0.6714
Date tp: 739 - fp: 59 - fn: 64 - tn: 0 - precision: 0.9261 - recall: 0.9203 - accuracy: 0.8573 - f1-score: 0.9232
Email tp: 10 - fp: 1 - fn: 0 - tn: 0 - precision: 0.9091 - recall: 1.0000 - accuracy: 0.9091 - f1-score: 0.9524
Hospital tp: 7 - fp: 2 - fn: 3 - tn: 0 - precision: 0.7778 - recall: 0.7000 - accuracy: 0.5833 - f1-score: 0.7369
ID tp: 12 - fp: 3 - fn: 13 - tn: 0 - precision: 0.8000 - recall: 0.4800 - accuracy: 0.4286 - f1-score: 0.6000
Initials tp: 111 - fp: 23 - fn: 67 - tn: 0 - precision: 0.8284 - recall: 0.6236 - accuracy: 0.5522 - f1-score: 0.7116
Internal_Location tp: 28 - fp: 10 - fn: 27 - tn: 0 - precision: 0.7368 - recall: 0.5091 - accuracy: 0.4308 - f1-score: 0.6021
Name tp: 1856 - fp: 67 - fn: 85 - tn: 0 - precision: 0.9652 - recall: 0.9562 - accuracy: 0.9243 - f1-score: 0.9607
Organization_Company tp: 71 - fp: 20 - fn: 65 - tn: 0 - precision: 0.7802 - recall: 0.5221 - accuracy: 0.4551 - f1-score: 0.6256
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 16 - fp: 2 - fn: 0 - tn: 0 - precision: 0.8889 - recall: 1.0000 - accuracy: 0.8889 - f1-score: 0.9412
Profession tp: 11 - fp: 1 - fn: 31 - tn: 0 - precision: 0.9167 - recall: 0.2619 - accuracy: 0.2558 - f1-score: 0.4074
SSN tp: 0 - fp: 1 - fn: 0 - tn: 0 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token level tp: 4894 - fp: 308 - fn: 500 - tn: 1810993 - precision: 0.9408 - recall: 0.9073 - accuracy: 0.8583 - f1-score: 0.9237
Address tp: 217 - fp: 22 - fn: 29 - tn: 120845 - precision: 0.9079 - recall: 0.8821 - accuracy: 0.8097 - f1-score: 0.8948
Age tp: 48 - fp: 11 - fn: 13 - tn: 121041 - precision: 0.8136 - recall: 0.7869 - accuracy: 0.6667 - f1-score: 0.8000
Care_Institute tp: 266 - fp: 78 - fn: 81 - tn: 120688 - precision: 0.7733 - recall: 0.7666 - accuracy: 0.6259 - f1-score: 0.7699
Date tp: 1835 - fp: 66 - fn: 36 - tn: 119176 - precision: 0.9653 - recall: 0.9808 - accuracy: 0.9473 - f1-score: 0.9730
Email tp: 10 - fp: 1 - fn: 0 - tn: 121102 - precision: 0.9091 - recall: 1.0000 - accuracy: 0.9091 - f1-score: 0.9524
Hospital tp: 11 - fp: 3 - fn: 3 - tn: 121096 - precision: 0.7857 - recall: 0.7857 - accuracy: 0.6471 - f1-score: 0.7857
ID tp: 12 - fp: 3 - fn: 12 - tn: 121086 - precision: 0.8000 - recall: 0.5000 - accuracy: 0.4444 - f1-score: 0.6154
Initials tp: 113 - fp: 21 - fn: 72 - tn: 120907 - precision: 0.8433 - recall: 0.6108 - accuracy: 0.5485 - f1-score: 0.7085
Internal_Location tp: 47 - fp: 11 - fn: 45 - tn: 121010 - precision: 0.8103 - recall: 0.5109 - accuracy: 0.4563 - f1-score: 0.6267
Name tp: 2135 - fp: 60 - fn: 80 - tn: 118838 - precision: 0.9727 - recall: 0.9639 - accuracy: 0.9385 - f1-score: 0.9683
Organization_Company tp: 119 - fp: 27 - fn: 89 - tn: 120878 - precision: 0.8151 - recall: 0.5721 - accuracy: 0.5064 - f1-score: 0.6723
Other tp: 0 - fp: 0 - fn: 5 - tn: 121108 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 38 - fp: 2 - fn: 0 - tn: 121073 - precision: 0.9500 - recall: 1.0000 - accuracy: 0.9500 - f1-score: 0.9744
Profession tp: 40 - fp: 3 - fn: 34 - tn: 121036 - precision: 0.9302 - recall: 0.5405 - accuracy: 0.5195 - f1-score: 0.6837
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 121109 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token (blind) tp: 5016 - fp: 187 - fn: 379 - tn: 115532 - precision: 0.9641 - recall: 0.9297 - accuracy: 0.8986 - f1-score: 0.9466
ENT tp: 5016 - fp: 187 - fn: 379 - tn: 115532 - precision: 0.9641 - recall: 0.9297 - accuracy: 0.8986 - f1-score: 0.9466
You can use the evaluate_corpus.py
script to evaluate all runs for a given corpus. The script produces a CSV file with the evaluation measures for each corpus part (i.e., train/dev/test) that you can use this for further analysis.
> python deidentify/evaluation/evaluate_corpus.py <corpus_name> <language>
[...]
> tree output/evaluation/<corpus_name>
output/evaluation/<corpus_name>
├── summary_dev.csv
├── summary_test.csv
└── summary_train.csv