An LLM-based approach for insight generation in data analysis

Sánchez Pérez, Alberto; Boukhary, Alaa; Papotti, Paolo; Castejón Lozano, Luis; Elwood, Adam
NAACL 2025, Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, April 29-May 4, 2025, Albuquerque, New Mexico

Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.

 

Type:
Conférence
City:
Albuquerque
Date:
2025-04-29
Department:
Data Science
Eurecom Ref:
8154
Copyright:
Copyright ACL. Personal use of this material is permitted. The definitive version of this paper was published in NAACL 2025, Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, April 29-May 4, 2025, Albuquerque, New Mexico and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/8154