10.1101/2025.07.18.665446

Correctness is its own reward: bootstrapping error signals in self-guided reinforcement learning

2025-07-23