Pairwise ranking via SVM

Lorenzo Alfaro, David

Morote García, Carlos

ETSIINF, UPM

Introduction

In this work we describe a basic implementation of a retrieval function via linear support- vector machines, following the framework proposed in (Joachims, 2002), which lies within the pairwise ranking paradigm. To that end, we utilize a small sample of data from the LOINC database to extract a meaningful set of document descriptors both at training and inference time.

_{Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of
the eighth acm sigkdd international conference on knowledge discovery and data mining
(pp. 133–142).}

Run the code

First of all you have to set the information files in the data folder. This data will be used to train the model and consists of three files: docs.csv, queries.csv and query_doc.csv (The name does not have to be explicitly this one).

docs.csv is the collection of the documents and must have the following characteristics: loinc_num, long_common_name, component, system, property, scale, class, discourage, rank_loinc.
queries.csv is the collection of queries by which the model will learn. It consists of id_query, query.
query_doc.csv is the relationship between the documents and the queries. It consists of id_query, loinc_num, rank.

The main script corresponds to pairwise_ranking_retrieval.py. This script initializes the engine with the provided parameters and waits to a user's query in order to compute a proper ordering for that query. The script takes as parameters:

datapath_documents: Filepath to the file containing the collection of documents.
datapath_queries: Filepath to the file containing the collection of queries.
datapath_query_doc: Filepath to the file containing the per-query-per-document ranks.
--sentence_transformer: Name of a custom sentence transformer checkpoint. Default: "pritamdeka/S-Biomed-Roberta-snli-multinli-stsb"
--seed:Seed to use to grant reproducibility of the training process of the SVM model.Type: int. Default: 0
--separator: Data field separator, e.g., ',' in csv files. Default: ","
-s --similarity_measure: Default similarity score function. Values: "dot", "cos", "euclidean". Default: "cos"
-t --topk: Top k documents of the rank to show in CLI. Type:int . Default: 10

To execute the code you can use the following command.

python pairwise_ranking_retrieval.py *doc_path* *queries_path* *query_doc_path* [optional params]*

Folder sctructure

.
├── README.md
├── requirements.txt
├── data
│   ├── docs.csv
│   ├── queries.csv
│   └── query_doc.csv
├── dataset.py
├── SVM.py
└── pairwise_ranking_retrieval.py

Code dependences

numpy (1.20.3)
packaging (21.0)
pandas (1.3.4)
scikit_learn (1.0.2)
sentence_transformers (2.2.0)
torch (1.11.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pairwise ranking via SVM

Lorenzo Alfaro, David

Morote García, Carlos

ETSIINF, UPM

Introduction

Run the code

Folder sctructure

Code dependences

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
SVM.py		SVM.py
dataset.py		dataset.py
pairwise_ranking_retrieval.py		pairwise_ranking_retrieval.py
requirements.txt		requirements.txt

CarlosMorote/RankingSVM

Folders and files

Latest commit

History

Repository files navigation

Pairwise ranking via SVM

Lorenzo Alfaro, David

Morote García, Carlos

ETSIINF, UPM

Introduction

Run the code

Folder sctructure

Code dependences

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages