Poster / Demo
Gharsallah, Sarra. Robaldo, Adele. Tokareva, Mariia. Gatti Pinheiro, Giovanni. Guendouz, Ilyana; Troncy, Raphaël; Papotti, Paolo; Michiardi, Pietro
Can we trust the judges? Validation of factuality evaluation methods via answer perturbation
EvalLLM 2025, Workshop on Evaluation Generative Models and Challenges, colocated with TALN, 30 June 2025, Marseille, France