forked from sebastianruder/NLP-progress
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added reading comprehension datasets for French and Russian
- Loading branch information
1 parent
ca02d5a
commit fdaf509
Showing
3 changed files
with
74 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Question answering | ||
|
||
Question answering is the task of answering a question. | ||
|
||
### Table of contents | ||
|
||
- [Reading comprehension](#reading-comprehension) | ||
- [FQuAD](#fquad) | ||
|
||
## Reading comprehension | ||
|
||
### FQuAD | ||
|
||
The [French Question Answering dataset (FQuAD)](https://arxiv.org/abs/2002.06071) is a | ||
reading comprehension dataset in the style of SQuAD. It consists of 25k questions on | ||
Wikipedia articles. The dataset is available [here](https://fquad.illuin.tech/). | ||
|
||
Example: | ||
|
||
| Document | Question | Answer | | ||
| ------------- | -----:| -----: | | ||
| Des observations de 2015 par la sonde Dawn ont confirmé qu'elle possède une forme sphérique, à la différence des corps plus petits qui ont une forme irrégulière. [...] |A quand remonte les observations faites par la sonde Dawn ? | 2015 | | ||
|
||
| Model | F1 | EM | Paper | | ||
| ------------- | :-----:| :-----:| --- | | ||
| Human performance | 92.1 | 78.4 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) | | ||
| CamemBERTQA (d'Hoffschmidt et al., 2020)* | 88.0 | 77.9 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) | | ||
| CamemBERTQA (d'Hoffschmidt et al., 2020)† | 84.1 | 70.9 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) | | ||
|
||
*: trained on the FQuAD training set | ||
|
||
†: trained on the SQuAD training set and zero-shot transferred to the FQuAD test set. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Question answering | ||
|
||
Question answering is the task of answering a question. | ||
|
||
### Table of contents | ||
|
||
- [Reading comprehension](#reading-comprehension) | ||
- [SberQuAD](#sberquad) | ||
|
||
|
||
## Reading comprehension | ||
|
||
### SberQuAD | ||
|
||
The [Sberbank Question Answering dataset (SberQuAD)](https://arxiv.org/abs/1912.09723) is a reading comprehension dataset | ||
in the style of SQuAD, which was created as part of a competition in 2017 by Sberbank. The data consists of around 50k | ||
questions on Wikipeda. | ||
|
||
Because the original SberQuAD development set is not available, the original training set of SberQuAD was partitioned | ||
into a (new) training (45,328) and testing (5,036) sets by the DeepPavlov team. | ||
|
||
| Model | F1 | EM | Paper | | ||
| ------------- | :-----:| :-----:| --- | | ||
| BERT (Efimov et al., 2019) | 84.8 | 66.6 | [SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis](https://arxiv.org/abs/1912.09723) | | ||
| DocQA (Efimov et al., 2019) | 79.5 | 59.6 | [SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis](https://arxiv.org/abs/1912.09723) | |