Questions on Table 2 of Bert paper #19

HaixiaChai · 2019-10-27T18:49:48Z

Table 2 shows many systems results on GAP, could I ask it is on GAP dev dataset or test dataset?
I couldn't reproduce c2f_coref result now, not sure what' wrong with files or parameters. I am wondering if you also use gap_to_jsonlines.py and to_gap_tsv.py for c2f_coref system? Do you use tokenizer or not in gap_to_jsonlines.py? And what doc_key do you set for each sample in JSON? because it is required from one of the genres.

Thank you in advance.

mandarjoshi90 · 2019-10-30T03:00:14Z

Sorry about the late response. Here's the pipleline. $gap_file_prefix points to the path of the GAP file without the tsv prefix. $vocab_file refers to the cased BERT vocab file.

#!/bin/bash
gap_file_prefix=$1
vocab_file=$2
python gap_to_jsonlines.py $gap_file_prefix.tsv $vocab_file
GPU=0 python predict.py bert_base $gap_file_prefix.jsonlines $gap_file_prefix.output.jsonlines
python to_gap_tsv.py $gap_file_prefix.output.jsonlines
python2 ../gap-coreference/gap_scorer.py --gold_tsv $gap_file_prefix.tsv --system_tsv $gap_file_prefix.output.tsv

Table 2 is on test.
The results seem to be off by 0.3 or so for BERT base. Not sure what changed. The genre has very little effect (upto 0.1 IIRC) on the number. I got to 82.4 with the default genre (bc).

HaixiaChai · 2019-10-30T10:44:23Z

I found all 4 numbers of e2e-coref on the first row are exactly the same as the results in the last row of Table 4 in the paper of Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. But, they said the results are on GAP development set. I think the probability is very low that dev set and test results are totally the same. So could you make sure if results in Table 2 surely are on GAP test set, please?
Thank you for your pipeline and bert_base result. Actually, I also got Overall score of 82.4. It is ok. However, my question is on c2f_coref model. The pipeline could be the same, but the codes should be slightly different for adapting to c2f_coref. Can you reproduce the 4 numbers of c2f-coref model?

Thanks a lot.

mandarjoshi90 · 2019-11-01T02:04:57Z

I did not run the e2e-coref model. Looks like we copied from the wrong table for that row. I will amend the paper. We definitely evaluated on the test set for BERT.
I don't have that handy right now, and I'm traveling until mid November. IIRC the only change should be to make sure that each element of the sentences field should be a natural language sentence (as opposed to a paragraph as with bert). This is because c2f-coref contextualizes each sentence independently with LSTMs.

If that doesn't work, I'll take a look after I'm back. Thanks for your patience.

HaixiaChai · 2019-11-01T10:46:47Z

Because gap_to_jsonlines.py file is compatible with tokenizer with None, so I used it. The Overall F1 score I evaluated is 68.5, but not 73.5 on your paper. If you can reproduce it again to have a check on what codes you used, I will be appreciated so much.

Hafsa-Masroor · 2020-05-18T19:26:16Z

@HaixiaChai
Could you please share the detailed steps to test & evaluate this model using GAP data-set? (Want to know what changes were made for environmental setup, commands, data, etc)
I am new to this research area, and want to re-produce the results with both GAP & Onto-notes data-sets. Your valuable help will be appreciated in this regard.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on Table 2 of Bert paper #19

Questions on Table 2 of Bert paper #19

HaixiaChai commented Oct 27, 2019

mandarjoshi90 commented Oct 30, 2019 •

edited

Loading

HaixiaChai commented Oct 30, 2019 •

edited

Loading

mandarjoshi90 commented Nov 1, 2019 •

edited

Loading

HaixiaChai commented Nov 1, 2019

Hafsa-Masroor commented May 18, 2020

Questions on Table 2 of Bert paper #19

Questions on Table 2 of Bert paper #19

Comments

HaixiaChai commented Oct 27, 2019

mandarjoshi90 commented Oct 30, 2019 • edited Loading

HaixiaChai commented Oct 30, 2019 • edited Loading

mandarjoshi90 commented Nov 1, 2019 • edited Loading

HaixiaChai commented Nov 1, 2019

Hafsa-Masroor commented May 18, 2020

mandarjoshi90 commented Oct 30, 2019 •

edited

Loading

HaixiaChai commented Oct 30, 2019 •

edited

Loading

mandarjoshi90 commented Nov 1, 2019 •

edited

Loading