- Martin & Jurafsky, 3era ed. Capítulo 23: Question Answering Link
-
Moldovan, D., & Surdeanu, M. (2002, July). On the role of information retrieval and information extraction in question answering systems. In International Summer School on Information Extraction (pp. 129-147). Springer, Berlin, Heidelberg. Link
-
Libro de IR: Manning, C. D., Raghavan, P., and Schu ̈tze, H. (2008). Introduction to Information Retrieval. Cambridge.Link
-
Karpukhin, V., Og ̆uz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. EMNLP. Link
Abstract Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene- BM25 system greatly by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.
TLDR En este paper se realiza la etapa de IR con un modelo llamado Dense Passage Retriever (DPR), el cual utiliza BERT para los encoders y el algoritmo FAISS para la inferencia. Se agregan en el entrenamiento ejemplos negativos. Los resultados comparados con el estado del arte hasta el 2019 son mejores, comparando con BM25 y también con otros modelos de embeddings. Indica que el tiempo de indexación de FAISS (8,5 hs para 21millones de pasajes) es mucho mayor que el que requiere BM25 (30 min). Una vez indexado, la ejecución es más rápida para FAISS. Además al tiempo de indexación es necesario sumarle el de calcular los embeddings (8,8hs).
- Chen, D., Fisch, A., Weston, J., and Bordes, A. (2017). Reading wikipedia to answer open-domain questions. ACL. Link
Abstract This paper proposes to tackle open-domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.
TLDR En este paper se realiza la etapa de IR con un modelo basado en TF-IDF utilizando una estructura de Hash con bigramas, en lugar de términos. Los resultados mejoran el buscador de wikipedia (implementado en ese momento con ElasticSearch). En los experimentos este método resultó mejor que TF-IDF con unigramas, o BM25, y también pasando preguntas y documentos a embeddings y aplicando distancia coseno. Este último ejemplo no aparece en la tabla de comparación de los resultados. La evaluación de los métodos lo realiza considerando la proporción de preguntas para las que el texto de cualquiera de sus respuestas asociadas aparece en al menos una de las 5 páginas más relevantes devueltas por cada sistema.
- Lin, J., Nogueira, R., and Yates, A. (2020). Pretrained transformers for text ranking: BERT and beyond. arXiv preprint arXiv:2010.06467. Link
Abstract The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In the context of text ranking, these models produce high quality results across many domains, tasks, and settings. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are numerous examples that fall into the first category, including approaches based on relevance classification, evidence aggregation from multiple segments of text, corpus analysis, and sequence-to-sequence models. While the second category of approaches is less well studied, representation learning with transformers is an emerging and exciting direction that is bound to attract more attention moving forward. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.
TLDR Así como su abstract, el documento es largo (130 páginas, no tiene estructura de paper). La idea del documento es presentar el estado del arte de “Text Ranking” o IR. Según la introducción, esta tarea al día de hoy se realiza con BERT, y los grandes motores de búsqueda lo utilizan para la tarea de IR. En la introducción hay una sección de problemas de IR. en particular uno de los problemas es la necesidad de exactitud de términos para el IR sin embeddings. Como solución se propone en primer lugar la expansión de consultas (la idea que teníamos para investigar). De ahí hace referencia al siguiente paper (Query Expansion Using Lexical-Semantic Relations.1995)
- M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994), pages 61–69, Dublin, Ireland, 1994.Link
Abstract Applications such as office automation, news filtering, help facilities in complex systems, and the like require the ability to retrieve documents from full-text databases where vocabulary problems can be particularly severe. Experiments performed on small collections with single-domain thesauri suggest that expanding query vectors with words that are lexically related to the original query words can ameliorate some of the problems of mismatched vocabularies. This paper examines the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in WordNet. Experimental results show this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Less well developed queries can be significantly improved by expansion of hand-chosen concepts. However, an automatic procedure that can approximate the set of hand picked synonym sets has yet to be devised, and expanding by the synonym sets that are automatically generated can degrade retrieval performance.
TLDR En este trabajo se expanden consultas agregándole a la consulta términos relacionados de forma léxica-semántica, utilizando wordNet. Los resultados no aportan beneficios significativos.
- Kamphuis, C., de Vries, A. P., Boytsov, L., and Lin, J. (2020). Which bm25 do you mean? a large-scale repro- ducibility study of scoring variants. European Conference on Information Retrieval. Link
Abstract When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work.
- Bidirectional Attention Flow for Machine Comprehension Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi (2018) Link
Abstract Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed-size vector, couple attentions temporally, and/or often form a uni-directional attention. In this paper we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Our experimental evaluations show that our model achieves the state-of-the-art results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze test.
- Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension. Minjoon Seo, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi
Abstract We formalize a new modular variant of current question answering tasks by enforcing com- plete independence of the document encoder from the question encoder. This formulation addresses a key challenge in machine compre- hension by requiring a standalone representa- tion of the document discourse. It addition- ally leads to a significant scalability advantage since the encoding of the answer candidate phrases in the document can be pre-computed and indexed offline for efficient retrieval. We experiment with baseline models for the new task, which achieve a reasonable accuracy but significantly underperform unconstrained QA models. We invite the QA research commu- nity to engage in Phrase-Indexed Question An- swering (PIQA, pika) for closing the gap. The leaderboard is at: nlp.cs.washington. edu/piqa.
- Shuai Pang, Jianqiang Ma, Zeyu Yan, Yang Zhang, Jianping Shen. BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA (2020) Link
Abstract Khandelwal et al. (2020) use a k-nearest- neighbor (kNN) component to improve lan- guage model performance. We show that this idea is beneficial for open-domain ques- tion answering (QA). To improve the recall of facts encountered during training, we com- bine BERT (Devlin et al., 2019) with a tra- ditional information retrieval step (IR) and a kNN search over a large datastore of an embed- ded text collection. Our contributions are as follows: i) BERT-kNN outperforms BERT on cloze-style QA by large margins without any further training. ii) We show that BERT often identifies the correct response category (e.g., US city), but only kNN recovers the factu- ally correct answer (e.g., “Miami”). iii) Com- pared to BERT, BERT-kNN excels for rare facts. iv) BERT-kNN can easily handle facts not covered by BERT’s training set, e.g., re- cent events.
TLDR utiliza BERT para codificar contextos de las palabras y guardarlos luego en un key-value store. Luego usa IR con el método de los bigramas de Chen para elegir los valores relevantes del datastore. A partir de ahí usa knn para buscar los contextos más cercanos a la consulta, pero usando solamente los devueltos por el paso de IR.
- Urvashi Khandelwal, [email protected], Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis GENERALIZATION THROUGH MEMORIZATION: NEAREST NEIGHBOR LANGUAGE MODELS (2019) Link
Abstract We introduce kNN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a k-nearest neighbors (kNN) model. The near- est neighbors are computed according to distance in the pre-trained LM embed- ding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong WIKITEXT-103 LM, with neighbors drawn from the original training set, our kNN-LM achieves a new state- of-the-art perplexity of 15.79 – a 2.9 point improvement with no additional train- ing. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowl- edge. Together, these results strongly suggest that learning similarity between se- quences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.