Table Representation Learning (TRL) models are commonly pre-trained on very large open-domain datasets comprising millions of tables and then used to address various downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and quality. Our purpose is to support end-users in testing pre-trained TRL models on unseen proprietary data in different tasks. In this work, we present QATCH (Query-Aided TRL Checklist), a toolbox to highlight TRL models' strengths and weaknesses on unseen data. For an input dataset, QATCH automatically generates a testing checklist tailored to two established tasks, i.e., Question Answering and Semantic Parsing. Checklist generation is driven by a SQL query engine that crafts tests of increasing complexity and inherently portable to alternative models and settings. It also introduces a set of cross-task performance metrics evaluating the TRL model's performance with quality measures over its output. Finally, we show how QATCH automatically generates tests for proprietary datasets to evaluate various state-of-the-art models including Tapas, Tapex, and OpenAI ChatGPT.
QATCH: Benchmarking SQL-centric tasks with table representation learning models on your data
NeurIPS 2023, 37th Conference on Neural Information Processing Systems, 11-16 December 2023, New Orleans, USA
© NIST. Personal use of this material is permitted. The definitive version of this paper was published in NeurIPS 2023, 37th Conference on Neural Information Processing Systems, 11-16 December 2023, New Orleans, USA and is available at :
PERMALINK : https://www.eurecom.fr/publication/7463