Added reading comprehension datasets for French and Russian

Zhong-Zhang · Feb 23, 2020 · fdaf509 · fdaf509
1 parent ca02d5a
commit fdaf509
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -41,24 +41,32 @@
 - [Text classification](english/text_classification.md)
 - [Word sense disambiguation](english/word_sense_disambiguation.md)
 
-### Chinese
+### Vietnamese
 
-- [Entity linking](chinese/chinese.md#entity-linking)
-- [Chinese word segmentation](chinese/chinese_word_segmentation.md)
+- [Dependency parsing](vietnamese/vietnamese.md#dependency-parsing)
+- [Machine translation](vietnamese/vietnamese.md#machine-translation)
+- [Named entity recognition](vietnamese/vietnamese.md#named-entity-recognition)
+- [Part-of-speech tagging](vietnamese/vietnamese.md#part-of-speech-tagging)
+- [Word segmentation](vietnamese/vietnamese.md#word-segmentation)
 
 ### Hindi
 
 - [Chunking](hindi/hindi.md#chunking)
 - [Part-of-speech tagging](hindi/hindi.md#part-of-speech-tagging)
 - [Machine Translation](hindi/hindi.md#machine-translation)
 
-### Vietnamese
+### Chinese
 
-- [Dependency parsing](vietnamese/vietnamese.md#dependency-parsing)
-- [Machine translation](vietnamese/vietnamese.md#machine-translation)
-- [Named entity recognition](vietnamese/vietnamese.md#named-entity-recognition)
-- [Part-of-speech tagging](vietnamese/vietnamese.md#part-of-speech-tagging)
-- [Word segmentation](vietnamese/vietnamese.md#word-segmentation)
+- [Entity linking](chinese/chinese.md#entity-linking)
+- [Chinese word segmentation](chinese/chinese_word_segmentation.md)
+
+### French
+
+- [Question answering](french/question_answering.md)
+
+### Russian
+
+- [Question answering](russian/question_answering.md)
 
 ### Spanish
 

diff --git a/french/question_answering.md b/french/question_answering.md
@@ -0,0 +1,32 @@
+# Question answering
+
+Question answering is the task of answering a question.
+
+### Table of contents
+
+- [Reading comprehension](#reading-comprehension)
+  - [FQuAD](#fquad)
+
+## Reading comprehension
+
+### FQuAD
+
+The [French Question Answering dataset (FQuAD)](https://arxiv.org/abs/2002.06071) is a 
+reading comprehension dataset in the style of SQuAD. It consists of 25k questions on 
+Wikipedia articles. The dataset is available [here](https://fquad.illuin.tech/).
+
+Example:
+
+| Document  | Question | Answer |
+| ------------- | -----:| -----: |
+| Des observations de 2015 par la sonde Dawn ont confirmé qu'elle possède une forme sphérique, à la différence des corps plus petits qui ont une forme irrégulière. [...] |A quand remonte les observations faites par la sonde Dawn ? | 2015 |
+
+| Model           | F1 | EM |  Paper |
+| ------------- | :-----:| :-----:| --- |
+| Human performance | 92.1 | 78.4 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) |
+| CamemBERTQA (d'Hoffschmidt et al., 2020)* | 88.0 | 77.9 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) |
+| CamemBERTQA (d'Hoffschmidt et al., 2020)† | 84.1 | 70.9 | [FQuAD: French Question Answering Dataset](https://arxiv.org/abs/2002.06071) |
+
+*: trained on the FQuAD training set 
+
+†: trained on the SQuAD training set and zero-shot transferred to the FQuAD test set.
diff --git a/russian/question_answering.md b/russian/question_answering.md
@@ -0,0 +1,25 @@
+# Question answering
+
+Question answering is the task of answering a question.
+
+### Table of contents
+
+- [Reading comprehension](#reading-comprehension)
+  - [SberQuAD](#sberquad)
+
+
+## Reading comprehension
+
+### SberQuAD
+
+The [Sberbank Question Answering dataset (SberQuAD)](https://arxiv.org/abs/1912.09723) is a reading comprehension dataset
+in the style of SQuAD, which was created as part of a competition in 2017 by Sberbank. The data consists of around 50k
+questions on Wikipeda. 
+
+Because the original SberQuAD development set is not available, the original training set of SberQuAD was partitioned
+into a (new) training (45,328) and testing (5,036) sets by the DeepPavlov team.
+
+| Model           | F1 | EM |  Paper |
+| ------------- | :-----:| :-----:| --- |
+| BERT (Efimov et al., 2019) | 84.8 | 66.6 | [SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis](https://arxiv.org/abs/1912.09723) |
+| DocQA (Efimov et al., 2019) | 79.5 | 59.6 | [SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis](https://arxiv.org/abs/1912.09723) |