Enable AI assistants to reflect on, critique, and continuously improve their own performance using Mandoline's evaluation framework.