Evaluating Content-based Pre-Training Strategies for a Knowledge-aware Recommender System based on Graph Neural Networks
Needed libraries:
- torch
- pandas
- pykeen
- SPARQLWrapper
- wikipedia2vec
- tqdm
You can find four scripts in this repository, all of them have tunable parameters that should be changed by editing the relative code section in the file.
This script is used to convert a list of entities, contained in a tsv file with two columns (id, url), where url is the dbpedia url, to the entity Wikipedia name. In some cases the Wikipedia PageID extracted from dbpedia is faulty, so it should be corrected manually by following the prompt instructions.
Parameters:
input_file
tsv file with two columns: id, url
The output of the previous script should be passed as input of this script together with a training file, this one only if you want to include user embeddings. This script outputs a dictionary of (id, embedding) pairs, which is dumped in the wiki2vec_embeddings.pkl file.
Parameters:
id2wikiname_file
file generated by data_to_wikiname.py scripttrianing_file
training set to compute user embeddings, set to None if you don't want to include themwiki2vec_dump
Wikipedia2Vec pre-trained embeddings file
Used to learn the graph embeddings, there are many adjustable parameters. The output of the previus script can be used as pre-trained embeddings for the model (optional).
Parameters:
dataset
one between dbbook and movielensemb_dim
embedding dimensionn_layers
number of CompGCN layersepochs
number of epochs to learn the embeddingswiki2vec_embeddings_file
wiki2vec pre-trained embeddings, not requiredoutput_path
output path
Script to train the recommender model and generate the predictions, many training parameters can be tuned. Of these, embeddings_file should be file generated by the previous script.
Parameters:
dataset
one between dbbook and movielensbatch_size
training batch sizeepochs
number of epochs to train the recommenderlearning_rate
learning rate valueembeddings_file
embeddings learned by train_embeddings.pyconcat
True to concatenate the embeddingsent2id_file
ent2id file generated by train_embeddings.py
The output lists generated by train_recommender.py can be evaluated using
Elliot. Elliot needs a test file (we provide test_elliot.tsv for each dataset)
containing only the ground thruth (positve ratings), that should be specified in
the Elliot config file along with the folder generated by the recommender.
We also provide the sample config file elliot/proxy_rec.yml
that should be
edited and placed in the config_files/
Elliot folder, here we can specify the
dataset
name, the metrics
to compute and other parameters.
Then, Elliot can be executed through the following command:
python start_experiments.py --config=proxy_rec.yml
Notice that we edited Elliot's code, specifically the F1 and Precision calculation:
in the original code, these metrics are computed by ignoring the fact that a
recommendation list for some users might be shorter than the cutoff value k
,
thus we modified the files elliot/elliot/evaluation/metrics/accuracy/f1/f1.py
and
elliot/elliot/evaluation/metrics/accuracy/precision/precision.py
to address
this issue.
Our custom files can be found in the elliot/
folder.