Conference
Benechehab, Abdelhakim; El Hili, Youssef Attia; Thomas, Albert; Paolo, Giuseppe; Filippone, Maurizio
Embedding distance as a reward signal can replace verifiers for LLM reasoning
ICLR 2026, 14th International Conference on Learning Representations, Workshop LLM Reasoning, 23-27 April 2026, Rio de Janeiro, Brazil