Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on Table 2 of Bert paper #19

Open
HaixiaChai opened this issue Oct 27, 2019 · 5 comments
Open

Questions on Table 2 of Bert paper #19

HaixiaChai opened this issue Oct 27, 2019 · 5 comments

Comments

@HaixiaChai
Copy link

  1. Table 2 shows many systems results on GAP, could I ask it is on GAP dev dataset or test dataset?
  2. I couldn't reproduce c2f_coref result now, not sure what' wrong with files or parameters. I am wondering if you also use gap_to_jsonlines.py and to_gap_tsv.py for c2f_coref system? Do you use tokenizer or not in gap_to_jsonlines.py? And what doc_key do you set for each sample in JSON? because it is required from one of the genres.

Thank you in advance.

@mandarjoshi90
Copy link
Owner

mandarjoshi90 commented Oct 30, 2019

Sorry about the late response. Here's the pipleline. $gap_file_prefix points to the path of the GAP file without the tsv prefix. $vocab_file refers to the cased BERT vocab file.

#!/bin/bash
gap_file_prefix=$1
vocab_file=$2
python gap_to_jsonlines.py $gap_file_prefix.tsv $vocab_file
GPU=0 python predict.py bert_base $gap_file_prefix.jsonlines $gap_file_prefix.output.jsonlines
python to_gap_tsv.py $gap_file_prefix.output.jsonlines
python2 ../gap-coreference/gap_scorer.py --gold_tsv $gap_file_prefix.tsv --system_tsv $gap_file_prefix.output.tsv
  1. Table 2 is on test.
  2. The results seem to be off by 0.3 or so for BERT base. Not sure what changed. The genre has very little effect (upto 0.1 IIRC) on the number. I got to 82.4 with the default genre (bc).

@HaixiaChai
Copy link
Author

HaixiaChai commented Oct 30, 2019

  1. I found all 4 numbers of e2e-coref on the first row are exactly the same as the results in the last row of Table 4 in the paper of Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. But, they said the results are on GAP development set. I think the probability is very low that dev set and test results are totally the same. So could you make sure if results in Table 2 surely are on GAP test set, please?
  2. Thank you for your pipeline and bert_base result. Actually, I also got Overall score of 82.4. It is ok. However, my question is on c2f_coref model. The pipeline could be the same, but the codes should be slightly different for adapting to c2f_coref. Can you reproduce the 4 numbers of c2f-coref model?

Thanks a lot.

@mandarjoshi90
Copy link
Owner

mandarjoshi90 commented Nov 1, 2019

  1. I did not run the e2e-coref model. Looks like we copied from the wrong table for that row. I will amend the paper. We definitely evaluated on the test set for BERT.
  2. I don't have that handy right now, and I'm traveling until mid November. IIRC the only change should be to make sure that each element of the sentences field should be a natural language sentence (as opposed to a paragraph as with bert). This is because c2f-coref contextualizes each sentence independently with LSTMs.

If that doesn't work, I'll take a look after I'm back. Thanks for your patience.

@HaixiaChai
Copy link
Author

  1. Because gap_to_jsonlines.py file is compatible with tokenizer with None, so I used it. The Overall F1 score I evaluated is 68.5, but not 73.5 on your paper. If you can reproduce it again to have a check on what codes you used, I will be appreciated so much.

@Hafsa-Masroor
Copy link

@HaixiaChai
Could you please share the detailed steps to test & evaluate this model using GAP data-set? (Want to know what changes were made for environmental setup, commands, data, etc)
I am new to this research area, and want to re-produce the results with both GAP & Onto-notes data-sets. Your valuable help will be appreciated in this regard.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants