LLM Digest

Tag: reasoning

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Zhuolin Yang

This 30B parameter model with only 3B active parameters achieves frontier-level reasoning performance, demonstrating that efficient architectures can match much larger models. The cascade reinforcement learning and multi-domain distillation techniques offer practical insights for teams building high-performance models with resource constraints.

Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Xinghao Zhao

You can predict whether an LLM's chain-of-thought reasoning will be correct by tracking whether uncertainty decreases at every step—a simple diagnostic that works better than confidence scores. This 'monotonicity' check gives you a practical way to catch reasoning failures before they impact your application.