funsung: QwQ

Thursday, March 06, 2025

QwQ

https://qwenlm.github.io/blog/qwq-32b/

https://ollama.com/library/qwq

Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models.

Rather than relying on traditional reward models, we utilized an accuracy verifier for math problems to ensure the correctness of final solutions and a code execution server to assess whether the generated codes successfully pass predefined test cases.

No comments:

Subscribe to: Post Comments (Atom)