Evaluating ambiguous questions in semantic parsing

Papicchio, Simone; Papotti, Paolo; Cagliero, Luca
DBML 2024, 3rd International Workshop on Databases and Machine Learning, in conjunction with ICDE 2024, 13 May 2024, Delft, The Netherlands

Tabular Representation Learning and Large Language Models have recently achieved promising results in solving the Semantic Parsing (SP) task. Given a question posed in natural language on a relational table, the goal is to return to the endusers executable SQL declarations. However, models struggle to produce the correct output when questions are ambiguously defined w.r.t the table schema. Assessing the robustness to dataambiguity can be particularly time-consuming as entails seeking ambiguous patterns on a large number of queries with varying complexity. To automate this process, we propose Data-Ambiguity Tester, a pipeline for data-ambiguity testing tailored to SP. It first automatically generates non-ambiguous natural language questions and SQL queries of varying complexity. Then, it injects ambiguous patterns, extracted from a human-annotated set of relational tables, in the natural language questions. Finally, it quantifies the level of ambiguity using customized performance metrics. Results show strengths and limitations of existing models in coping with ambiguity between questions and tabular data.


Type:
Conference
City:
Delft
Date:
2024-05-13
Department:
Data Science
Eurecom Ref:
7663
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in DBML 2024, 3rd International Workshop on Databases and Machine Learning, in conjunction with ICDE 2024, 13 May 2024, Delft, The Netherlands and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/7663