This repository accompanying the code for my master's thesis LegSum: Legal Document Summarization
| Notebook | Colab | Model checkpoint |
|---|---|---|
| T5 | Frederick0291/t5-small-finetuned-billsum | |
| BART billsum | murali-admin/bart-billsum-1 | |
| BART xsum | sshleifer/distilbart-xsum-12-6 | |
| Pegasus Legal | nsi319/legal-pegasus | |
| Pegasus billsum | google/pegasus-billsum | |
| BigBird | google/bigbird-pegasus-large-bigpatent | |
| LED | allenai/led-large-16384-arxiv |
| Notebook | Colab |
|---|---|
| Extractive | |
| Kmeans Bertsum | |
| Luhn's algorithm | |
| TF-IDF |
-
BillSum
- Official github repository 🤗 Dataset loader
- Processed and clean version of data can be found here
Following results are on BillSum Dataset (ca_test) with pre-trained models and extractive methods
| Algorithm / model | Rouge-1 | Rouge-2 | Rouge-L |
|---|---|---|---|
| Extractive | |||
| KL | 24.44 | 9.74 | 21.98 |
| LSA | 30.85 | 12.45 | 27.64 |
| SumBasics | 31.01 | 12.61 | 27.83 |
| Bert | 33.29 | 15.17 | 29.67 |
| Tf-Idf | 33.97 | 15.98 | 29.92 |
| LexRank | 36.83 | 18.98 | 32.95 |
| TextRank | 36.57 | 19.10 | 32.35 |
| Luhn’s Algorithm | 37.48 | 19.93 | 33.35 |
| Abstractive | |||
| BART | 26.02 | 11.87 | 22.02 |
| Pegasus(small) | 28.61 | 12.19 | 25.88 |
| T5(small) | 32.99 | 15.52 | 30.21 |
| BillPegasus | 34.25 | 16.63 | 30.22 |

