This dataset contains quality judgments for several different summarization systems on the CNN/DailyMail dataset. The data was published in The price of debiasing automatic metrics in natural language evaluation.
sacrerouge setup-dataset chaganty2018 \
<output-dir>
The output files are the following:
documents.jsonl
: The CNN/DailyMail documentssummaries.jsonl
: The system summariesmetrics.jsonl
: The corresponding manual evaluation metrics for the system summaries
006588 appears twice for ml+rl.