Skip to content

Commit f4ca763

Browse files
committed
remove underscores of wikipedia titles in aida-b.jsonl and correct numbers for demo-script of candidate lists
1 parent db0e468 commit f4ca763

File tree

3 files changed

+222
-221
lines changed

3 files changed

+222
-221
lines changed

Diff for: SCRIPTS.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,10 @@ Wikipedia id: 9663 Wikipedia title: Electronics
172172
The script scripts/scripts_for_candidate_lists/demo_of_candidate_lists.py demonstrates how we used the candidate lists to achieve the numbers in our paper (add reference).
173173
Note that to use it you need to set the PATH_TO_REPOSITORY variable in the script. Executing it should output the following numbers.
174174

175-
| | AIDA-B |TWEEKI | REDDIT-P |REDDIT-C |CWEB |WIKI |S-TAIL |S-SHADOW |S-TOP |
176-
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
177-
| MFS | 0,635 | 0,723 | 0,834 | 0,81 | 0,612 | 0,651 | 0,994 | 0,149 | 0,413 |
178-
| CL-Recall | 0,911 | 0,94 | 0,984 | 0,983 | 0,924 | 0,988 | 0,988 | 0,567 | 0,731 |
175+
| | AIDA-B |TWEEKI | REDDIT-P | REDDIT-C | CWEB | WIKI | S-TAIL | S-SHADOW | S-TOP |
176+
| ------------- |-------| ------------- |----------|----------|-------|-------|--------|----------|------|
177+
| MFS | 0,634 | 0,723 | 0,832 | 0,809 | 0,611 | 0,651 | 0,991 | 0,149 | 0,41 |
178+
| CL-Recall | 0,91 | 0,94 | 0,983 | 0,981 | 0,924 | 0,986 | 0,994 | 0,565 | 0,728 |
179179

180180
MFS ("most frequent sense") chooses, for each mention, the entity that we empirically counted the most often for that mention (assuming the mention is contained in our lists).
181181
CL-Recall (CL for "Candidate List") indicates whether the gold entity is actually contained in the candidate lists for all the mentions.

Diff for: scripts/scripts_for_test_data/aida-b_final.py

+1
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,7 @@
192192
pass
193193
else:
194194
token, wiki_id, wiki_title = line.split('\t')
195+
wiki_title = wiki_title.replace('_', ' ')
195196
# no annotation
196197
if wiki_id == 'O':
197198
text += token + ' '

0 commit comments

Comments
 (0)