poetry install --no-root
poetry shell
This works by connecting to the Postgresql database, getting a list of all document ids, then downloading all chunks for that document id.
The database connection information is hardcoded in the script, it may need adjusting.
python step1-download-vespa-database.py
python step2.0-topic-generation.py data-download/GS_CEMS/ datasets/GS_CEMS-topics.json 200
python step2.1-filter-for-english.py datasets/GS_CEMS-topics.json datasets/GS_CEMS-topics-en.json
python step2.1-extract-primary-questions.py datasets/GS_CEMS-topics-en.json datasets/GS_CEMS-questions.txt
python step3.0-generate-danswer-dataset.py datasets/GS_CEMS-questions.txt datasets/GS_CEMS-goldenset.json
python step3.5-dataset-to-virtual-trulens.py dataset.json datasets/GS_CEMS-goldenset.json
python step4.0-dataset2xls.py datasets/GS_CEMS-goldenset.json datasets/GS_CEMS-goldenset.xls