The inefficiency of RL, and implications for RLVR progress

(dwarkesh.com)

118 points | by cubefox 6 days ago ago

48 comments