Tabular Representation Learning and Large Language Models have recently achieved promising results in solving the Semantic Parsing (SP) task. Given a question posed in natural language on a relational table, the goal is to return to the endusers executable SQL declarations. However, models struggle to produce the correct output when questions are ambiguously defined w.r.t the table schema. Assessing the robustness to dataambiguity can be particularly time-consuming as entails seeking ambiguous patterns on a large number of queries with varying complexity. To automate this process, we propose Data-Ambiguity Tester, a pipeline for data-ambiguity testing tailored to SP. It first automatically generates non-ambiguous natural language questions and SQL queries of varying complexity. Then, it injects ambiguous patterns, extracted from a human-annotated set of relational tables, in the natural language questions. Finally, it quantifies the level of ambiguity using customized performance metrics. Results show strengths and limitations of existing models in coping with ambiguity between questions and tabular data.
Evaluating ambiguous questions in semantic parsing
DBML 2024, 3rd International Workshop on Databases and Machine Learning, in conjunction with ICDE 2024, 13 May 2024, Delft, The Netherlands
Type:
Conférence
City:
Delft
Date:
2024-05-13
Department:
Data Science
Eurecom Ref:
7663
Copyright:
© 2024 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
See also:
PERMALINK : https://www.eurecom.fr/publication/7663