Skip to content

Commit

Permalink
Adding results from the Dialogue System Technology Challenge 8 (sebas…
Browse files Browse the repository at this point in the history
…tianruder#420)

* Adding DSTC 8 Results

* Typo

* Formatting improvement

* Data entry error

* Fixing missing link
  • Loading branch information
Jonathan Kummerfeld authored Feb 19, 2020
1 parent 9049e26 commit ca02d5a
Showing 1 changed file with 25 additions and 19 deletions.
44 changes: 25 additions & 19 deletions english/dialogue.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,20 +85,24 @@ There are several corpra based on the [Ubuntu IRC Channel Logs](https://irclogs.
- UDC v1, [Lowe et al. (2015)](https://arxiv.org/abs/1506.08909), available [here](http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/), the first version of the Ubuntu Dialogue Corpus.
- UDC v2, [Lowe et al. (2017)](http://dad.uni-bielefeld.de/index.php/dad/article/view/3698), available [here](https://arxiv.org/abs/1506.08909), the second version of the Ubuntu Dialogue Corpus.
- DSTC 7, [Gunasekara et al. (2019)](http://workshop.colips.org/dstc7/papers/dstc7_task1_final_report.pdf), available [here](https://ibm.github.io/dstc-noesis/public/index.html), the data from DSTC 7 track 1.
- DSTC 8, [Gunasekara et al. (2020)](), available [here](https://github.com/dstc8-track2/NOESIS-II/), the data from DSTC 8 track 2.
- DSTC 8, [Gunasekara et al. (2020)](http://jkk.name/pub/dstc20task2.pdf), available [here](https://github.com/dstc8-track2/NOESIS-II/), the data from DSTC 8 track 2.

Each version of the dataset contains a set of dialogues from the IRC channel, extracted by automatically disentangling conversations occurring simultaneously. See below for results on the disentanglement process.

The exact tasks used vary slightly, but all consider variations of Recall_N@K, which means how often the true answer is in the top K options when there are N total candidates.

| Data | Model | R_2@1 | R_10@1 | R_100@1 | R_100@10 | R_100@50 | MRR | Paper / Source |
| ------ | ------------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |---------------|
| DSTC 8 | |
| DSTC 7 | Seq-Att-Network (Chen and Wang, 2019) | - | - |64.5 | 90.2 | 99.4 | 73.5 | [Sequential Attention-based Network for Noetic End-to-End Response Selection](http://workshop.colips.org/dstc7/papers/07.pdf)
| UDC v2 | DAM (Zhou et al. 2018) | 93.8 | 76.7| - | - | - | - | [Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://www.aclweb.org/anthology/P18-1103) |
| UDC v2 | SMN (Wu et al. 2017) | 92.3 | 72.3| - | - | - | - | [Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots](https://arxiv.org/pdf/1612.01627.pdf) |
| UDC v2 | Multi-View (Zhou et al. 2017) | 90.8 | 66.2 | - | - | - | - | [Multi-view Response Selection for Human-Computer Conversation](https://aclweb.org/anthology/D16-1036) |
| UDC v2 | Bi-LSTM (Kadlec et al. 2015) | 89.5 | 63.0 | - | - | - | - | [Improved Deep Learning Baselines for Ubuntu Corpus Dialogs](https://arxiv.org/pdf/1510.03753.pdf) |
| Data | Model | R_100@1 | R_100@10 | R_100@50 | MRR | Paper / Source |
| ------ | ------------- | :---------: | :---------: | :---------: | :---------: |---------------|
| DSTC 8 (main) | Wu et. al., (2020) | 76.1 | 97.9 | - | 84.8 | Enhancing Response Selection with Advanced Context Modeling and Post-training |
| DSTC 8 (subtask 2) | Wu et. al., (2020) | 70.6 | 95.7 | - | 79.9 | Enhancing Response Selection with Advanced Context Modeling and Post-training |
| DSTC 7 | Seq-Att-Network (Chen and Wang, 2019) | 64.5 | 90.2 | 99.4 | 73.5 | [Sequential Attention-based Network for Noetic End-to-End Response Selection](http://workshop.colips.org/dstc7/papers/07.pdf) |

| Data | Model | R_2@1 | R_10@1 | Paper / Source |
| ------ | ------------- | :---------: | :---------: |---------------|
| UDC v2 | DAM (Zhou et al. 2018) | 93.8 | 76.7 | [Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://www.aclweb.org/anthology/P18-1103) |
| UDC v2 | SMN (Wu et al. 2017) | 92.3 | 72.3 | [Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots](https://arxiv.org/pdf/1612.01627.pdf) |
| UDC v2 | Multi-View (Zhou et al. 2017) | 90.8 | 66.2 | [Multi-view Response Selection for Human-Computer Conversation](https://aclweb.org/anthology/D16-1036) |
| UDC v2 | Bi-LSTM (Kadlec et al. 2015) | 89.5 | 63.0 | [Improved Deep Learning Baselines for Ubuntu Corpus Dialogs](https://arxiv.org/pdf/1510.03753.pdf) |

Additional results can be found in the DSTC task reports linked above.

Expand All @@ -115,9 +119,10 @@ The [Reddit Corpus](https://arxiv.org/abs/1904.06472) contains 726 million multi
### Advising Corpus
The [Advising Corpus](http://workshop.colips.org/dstc7/papers/dstc7_task1_final_report.pdf), available [here](https://ibm.github.io/dstc-noesis/public/index.html), contains a collection of conversations between a student and an advisor at the University of Michigan. They were released as part of DSTC 7 track 1 and used again in DSTC 8 track 2.

| Data | Model | R_100@1 | R_100@10 | R_100@50 | MRR | Paper / Source |
| ------ | ------------- | :---------: | :---------: | :---------: | :---------: |---------------|
| DSTC 7 | Seq-Att-Network (Chen and Wang, 2019) | 21.4 | 63.0 | 94.8 | 33.9 | [Sequential Attention-based Network for Noetic End-to-End Response Selection](http://workshop.colips.org/dstc7/papers/07.pdf)
| Model | R_100@1 | R_100@10 | R_100@50 | MRR | Paper / Source |
| ------------- | :---------: | :---------: | :---------: | :---------: |---------------|
| Yang et. al., (2020) | 56.4 | 87.8 | - | 67.7 | Transformer-based Semantic Matching Model for Noetic Response Selection |
| Seq-Att-Network (Chen and Wang, 2019) | 21.4 | 63.0 | 94.8 | 33.9 | [Sequential Attention-based Network for Noetic End-to-End Response Selection](http://workshop.colips.org/dstc7/papers/07.pdf)


## Generative-based Chatbots
Expand Down Expand Up @@ -149,13 +154,14 @@ This can be formultated as a clustering problem, with no clear best metric. Seve

Manually labeled by [Kummerfeld et al. (2019)](https://www.aclweb.org/anthology/P19-1374), this data is available [here](https://jkk.name/irc-disentanglement/).

| Model | VI | 1-1 | Precision | Recall | F-Score | Paper / Source | Code |
| ---------------------- | :--: | :--: | :-------: | :----: | :-----: | ---------------| ------------- |
| FF ensemble: Vote (Kummerfeld et al., 2019) | 91.5 | 76.0 | 36.3 | 39.7 | 38.0 | [A Large-Scale Corpus for Conversation Disentanglement](https://www.aclweb.org/anthology/P19-1374/) | [Code](https://jkk.name/irc-disentanglement) |
| Feedforward (Kummerfeld et al., 2019) | 91.3 | 75.6 | 34.6 | 38.0 | 36.2 | [A Large-Scale Corpus for Conversation Disentanglement](https://www.aclweb.org/anthology/P19-1374/) | [Code](https://jkk.name/irc-disentanglement) |
| FF ensemble: Intersect (Kummerfeld et al., 2019) | 69.3 | 26.6 | 67.0 | 21.1 | 32.1 | [A Large-Scale Corpus for Conversation Disentanglement](https://www.aclweb.org/anthology/P19-1374/) | [Code](https://jkk.name/irc-disentanglement) |
| Linear (Elsner and Charniak, 2008) | 82.1 | 51.4 | 12.1 | 21.5 | 15.5 | [You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement](https://www.aclweb.org/anthology/P08-1095/) | [Code](https://www.asc.ohio-state.edu/elsner.14/resources/chat-distr.tgz) |
| Heuristic (Lowe et al., 2015) | 80.6 | 53.7 | 10.8 | 7.6 | 8.9 | [Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus](http://dad.uni-bielefeld.de/index.php/dad/article/view/3698) | [Code](https://github.com/npow/ubuntu-corpus) |
| Model | VI | 1-1 | Precision | Recall | F-Score | Paper / Source | Code |
| ------------------------------------------------ | :--: | :--: | :-------: | :----: | :-----: | ---------------| --------- |
| BERT + BiLSTM | 93.3 | - | 44.3 | 49.6 | 46.8 | Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems | - |
| FF ensemble: Vote (Kummerfeld et al., 2019) | 91.5 | 76.0 | 36.3 | 39.7 | 38.0 | [A Large-Scale Corpus for Conversation Disentanglement](https://www.aclweb.org/anthology/P19-1374/) | [Code](https://jkk.name/irc-disentanglement) |
| Feedforward (Kummerfeld et al., 2019) | 91.3 | 75.6 | 34.6 | 38.0 | 36.2 | [A Large-Scale Corpus for Conversation Disentanglement](https://www.aclweb.org/anthology/P19-1374/) | [Code](https://jkk.name/irc-disentanglement) |
| FF ensemble: Intersect (Kummerfeld et al., 2019) | 69.3 | 26.6 | 67.0 | 21.1 | 32.1 | [A Large-Scale Corpus for Conversation Disentanglement](https://www.aclweb.org/anthology/P19-1374/) | [Code](https://jkk.name/irc-disentanglement) |
| Linear (Elsner and Charniak, 2008) | 82.1 | 51.4 | 12.1 | 21.5 | 15.5 | [You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement](https://www.aclweb.org/anthology/P08-1095/) | [Code](https://www.asc.ohio-state.edu/elsner.14/resources/chat-distr.tgz) |
| Heuristic (Lowe et al., 2015) | 80.6 | 53.7 | 10.8 | 7.6 | 8.9 | [Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus](http://dad.uni-bielefeld.de/index.php/dad/article/view/3698) | [Code](https://github.com/npow/ubuntu-corpus) |

### Linux IRC

Expand Down

0 comments on commit ca02d5a

Please sign in to comment.