Skip to content

Commit

Permalink
Added reading comprehension datasets for French and Russian
Browse files Browse the repository at this point in the history
  • Loading branch information
sebastianruder committed Feb 23, 2020
1 parent ca02d5a commit fdaf509
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 9 deletions.
26 changes: 17 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,24 +41,32 @@
- [Text classification](english/text_classification.md)
- [Word sense disambiguation](english/word_sense_disambiguation.md)

### Chinese
### Vietnamese

- [Entity linking](chinese/chinese.md#entity-linking)
- [Chinese word segmentation](chinese/chinese_word_segmentation.md)
- [Dependency parsing](vietnamese/vietnamese.md#dependency-parsing)
- [Machine translation](vietnamese/vietnamese.md#machine-translation)
- [Named entity recognition](vietnamese/vietnamese.md#named-entity-recognition)
- [Part-of-speech tagging](vietnamese/vietnamese.md#part-of-speech-tagging)
- [Word segmentation](vietnamese/vietnamese.md#word-segmentation)

### Hindi

- [Chunking](hindi/hindi.md#chunking)
- [Part-of-speech tagging](hindi/hindi.md#part-of-speech-tagging)
- [Machine Translation](hindi/hindi.md#machine-translation)

### Vietnamese
### Chinese

- [Dependency parsing](vietnamese/vietnamese.md#dependency-parsing)
- [Machine translation](vietnamese/vietnamese.md#machine-translation)
- [Named entity recognition](vietnamese/vietnamese.md#named-entity-recognition)
- [Part-of-speech tagging](vietnamese/vietnamese.md#part-of-speech-tagging)
- [Word segmentation](vietnamese/vietnamese.md#word-segmentation)
- [Entity linking](chinese/chinese.md#entity-linking)
- [Chinese word segmentation](chinese/chinese_word_segmentation.md)

### French

- [Question answering](french/question_answering.md)

### Russian

- [Question answering](russian/question_answering.md)

### Spanish

Expand Down
32 changes: 32 additions & 0 deletions french/question_answering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Question answering

Question answering is the task of answering a question.

### Table of contents

- [Reading comprehension](#reading-comprehension)
- [FQuAD](#fquad)

## Reading comprehension

### FQuAD

The [French Question Answering dataset (FQuAD)](https://arxiv.org/abs/2002.06071) is a
reading comprehension dataset in the style of SQuAD. It consists of 25k questions on
Wikipedia articles. The dataset is available [here](https://fquad.illuin.tech/).

Example:

| Document | Question | Answer |
| ------------- | -----:| -----: |
| Des observations de 2015 par la sonde Dawn ont confirmé qu'elle possède une forme sphérique, à la différence des corps plus petits qui ont une forme irrégulière. [...] |A quand remonte les observations faites par la sonde Dawn ? | 2015 |

| Model | F1 | EM | Paper |
| ------------- | :-----:| :-----:| --- |
| Human performance | 92.1 | 78.4 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) |
| CamemBERTQA (d'Hoffschmidt et al., 2020)* | 88.0 | 77.9 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) |
| CamemBERTQA (d'Hoffschmidt et al., 2020)† | 84.1 | 70.9 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) |

*: trained on the FQuAD training set

†: trained on the SQuAD training set and zero-shot transferred to the FQuAD test set.
25 changes: 25 additions & 0 deletions russian/question_answering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Question answering

Question answering is the task of answering a question.

### Table of contents

- [Reading comprehension](#reading-comprehension)
- [SberQuAD](#sberquad)


## Reading comprehension

### SberQuAD

The [Sberbank Question Answering dataset (SberQuAD)](https://arxiv.org/abs/1912.09723) is a reading comprehension dataset
in the style of SQuAD, which was created as part of a competition in 2017 by Sberbank. The data consists of around 50k
questions on Wikipeda.

Because the original SberQuAD development set is not available, the original training set of SberQuAD was partitioned
into a (new) training (45,328) and testing (5,036) sets by the DeepPavlov team.

| Model | F1 | EM | Paper |
| ------------- | :-----:| :-----:| --- |
| BERT (Efimov et al., 2019) | 84.8 | 66.6 | [SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis](https://arxiv.org/abs/1912.09723) |
| DocQA (Efimov et al., 2019) | 79.5 | 59.6 | [SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis](https://arxiv.org/abs/1912.09723) |

0 comments on commit fdaf509

Please sign in to comment.