Jim-at-SemEval-2025-Task-5

The SemEval-2025 Task 5 calls for the utilization of LLM capabilities to apply controlled subject labels to record descriptions in the multilingual library collection of the German National Library of Science and Technology.

System Overview

The multilingual BERT ensemble system described herein produces GND subject labels for various record types, including articles, books, conference papers, reports, and theses. Input a title and abstract in German or English to generate GND subject labels.

Train

The AutoTrain Advanced software package was used to train BERT models for GND classification based on examples from the TIB "All Subjects" dataset. A curated set of that data spilt into validation and train is available from Hugging Face

Test

This code was developed to test which set of models contributed to the highest scores using 1000 rows of held out data as the gold standard.

Inference

Inference code generates labels and aggregates label confidence scores so the BERT models work as an ensemble during inference.

GitHub CoPilot Attribution

Jim was assisted by GitHub Copilot, for development of the inference and testing code.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Inference		Inference
Test		Test
Train		Train
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jim-at-SemEval-2025-Task-5

System Overview

Train

Test

Inference

GitHub CoPilot Attribution

About

Languages

License

jimfhahn/SemEval-2025-Task5

Folders and files

Latest commit

History

Repository files navigation

Jim-at-SemEval-2025-Task-5

System Overview

Train

Test

Inference

GitHub CoPilot Attribution

About

Resources

License

Stars

Watchers

Forks

Languages