Abstract
Extractive Machine Reading Comprehension (MRC) requires models to obtain start and end positions of answers from a given passage. MRC models may tend to rely on position bias as a shortcut, and thus they fail to learn the multi-source knowledge from both passages and questions sufficiently. Recent debiasing methods proposed to exclude the position prior during inference. However, they cannot distinguish the “good” position context and “bad” position bias from the whole prior. In this paper, we propose a novel MRC framework CausalMRC based on causal graph to mitigate position bias. Motivated by causal inference, we design a causal graph for MRC to formulate the position bias as the direct causal effect of passages on answers. Specifically, we mitigate the position bias by subtracting the direct position effect from the total causal effect. Experiments demonstrate that our proposed CausalMRC achieves competitive performance on the biased SQuAD dataset while performing robustly on the original SQuAD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balke, A., Pearl, J.: Counterfactuals and policy analysis in structural models. arXiv preprint arXiv: 1302.4929 (2013)
Cheng, H., Chang, M., Lee, K., Toutanova, K.: Probabilistic assumptions matter: improved models for distantly-supervised document-level question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5657–5667. Online (2020)
Clark, C., Yatskar, M., Zettlemoyer, L.: Don’t take the easy way out: ensemble based methods for avoiding known dataset biases. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 4067–4080 (2019)
Clark, C., Yatskar, M., Zettlemoyer, L.: Learning to model and ignore dataset bias with mixed capacity ensembles. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3031–3045. Online Event (2020)
Dalal, D., Arcan, M., Buitelaar, P.: Enhancing multiple-choice question answering with causal knowledge. In: Proceedings of Deep Learning Inside Out: The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 70–80. Online (2021)
Dasigi, P., Liu, N.F., Marasovic, A., Smith, N.A., Gardner, M.: Quoref: a reading comprehension dataset with questions requiring coreferential reasoning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 5924–5931 (2019)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, pp. 4171–4186 (2019)
Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., Gardner, M.: DROP: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, pp. 2368–2378 (2019)
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Tran, B., Madry, A.: Learning perceptually-aligned representations via adversarial robustness. arXiv preprint arXiv: 1906.00945 (2019)
Feder, A., et al.: Causal inference in natural language processing: estimation, prediction, interpretation and beyond. arXiv preprint arXiv: 2109.00725 (2021)
Feder, A., Oved, N., Shalit, U., Reichart, R.: CausaLM: causal model explanation through counterfactual language models. Comput. Linguistics 47(2), 333–386 (2021)
Fenton, N.E., Neil, M., Constantinou, A.C.: The book of why: the new science of cause and effect. Artif. Intell. 284, 103–286 (2020)
Gardner, M., Artzi, Y., Basmova, V., Berant, J., Bogin, B., Chen, S.: Evaluating models’ local decision boundaries via contrast sets. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1307–1323. Online Event (2020)
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., Wang, J.: Long text generation via adversarial training with leaked information. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, pp. 5141–5148 (2018)
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention. In: Proceedings of the 9th International Conference on Learning Representations, Austria. OpenReview.net (Online) (2021)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531 (2015)
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 2021–2031 (2017)
Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, pp. 1601–1611 (2017)
Kaushik, D., Hovy, E.H., Lipton, Z.C.: Learning the difference that makes a difference with counterfactually-augmented data. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia. OpenReview.net (2020)
Ko, M., Lee, J., Kim, H., Kim, G., Kang, J.: Look at the first sentence: position bias in question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 1109–1121. Online (2020)
Lai, Y., Zhang, C., Feng, Y., Huang, Q., Zhao, D.: Why machine reading comprehension models learn shortcuts? In: Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, pp. 989–1002. Online Event (2021)
Lorberbom, G., Jaakkola, T.S., Gane, A., Hazan, T.: Direct optimization through arg max for discrete variational auto-encoder. In: Proceedings of the 2019 Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, pp. 6200–6211 (2019)
MacKinnon, D.P., Fairchild, A.J., Fritz, M.S.: Mediation analysis. Annu. Rev. Psychol. 58, 593–614 (2007)
Niu, Y., Zhang, H.: Introspective distillation for robust question answering. arXiv preprint arXiv: 2111.01026 (2021)
Pearl, J.: Direct and indirect effects. In: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, Seattle, Washington, USA, pp. 411–420 (2001)
Pearl, J.: The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62(3), 54–60 (2019)
Pearl, J., Glymour, M., Jewell, N.P.: Causal Inference in Statistics: A Primer. Wiley, Chichester (2016)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA, pp. 2383–2392 (2016)
Reddy, S., Chen, D., Manning, C.D.: CoQA: a conversational question answering challenge. Trans. Assoc. Comput. Linguistics 7, 249–266 (2019)
Richiardi, L., Bellocco, R., Zugna, D.: Mediation analysis in epidemiology: methods, interpretation and bias. Int. J. Epidemiol. 42(5), 1511–1519 (2013)
Robins, J.: A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Math. Model. 7(9), 1393–1512 (1986)
Robins, J.M.: Semantics of causal DAG models and the identification of direct and indirect effects. Oxford Statistical Science Series, pp. 70–82 (2020)
Rubin, D.B.: Bayesian inference for causal effects: the role of randomization. Ann. Stat., 34–58 (1978)
Sanh, V., Wolf, T., Belinkov, Y., Rush, A.M.: Learning from others’ mistakes: avoiding dataset biases without modeling them. In: Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, Virtual Event (2021)
Schölkopf, B.: Causality for machine learning. arXiv preprint arXiv: 1911.10500 (2019)
Shinoda, K., Sugawara, S., Aizawa, A.: Can question generation debias question answering models? A case study on question-context lexical overlap. arXiv preprint arXiv: 2109.11256 (2021)
Sugawara, S., Inui, K., Sekine, S., Aizawa, A.: What makes reading comprehension questions easier? In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 4208–4219 (2018)
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-read students learn better: the impact of student initialization on knowledge distillation. arXiv preprint arXiv: 1908.08962 (2019)
Verma, T., Pearl, J.: An algorithm for deciding if a set of observed independencies has a causal explanation. In: Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence, Stanford, CA, USA, pp. 323–330 (1992)
Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: Proceedings of the 2019 Conference on Neural Information Processing Systems, Vancouver, BC, Canada, pp. 3261–3275 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 2019 Conference on Neural Information Processing Systems, Vancouver, BC, Canada, pp. 5754–5764 (2019)
Yue, Z., Zhang, H., Sun, Q., Hua, X.: Interventional few-shot learning. arXiv preprint arXiv: 2009.13000 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, J., Wu, L., Wu, S., Zhang, X., Hou, Y., Feng, Z. (2023). Causal MRC: Mitigating Position Bias Based on Causal Graph. In: El Abbadi, A., et al. Database Systems for Advanced Applications. DASFAA 2023 International Workshops. DASFAA 2023. Lecture Notes in Computer Science, vol 13922. Springer, Cham. https://doi.org/10.1007/978-3-031-35415-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-35415-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35414-4
Online ISBN: 978-3-031-35415-1
eBook Packages: Computer ScienceComputer Science (R0)