By ihsumlee , 16 February 2026 Training Objective(GRPO / PPO-style)與 Online 訓練流程 Tags RL Online Learning
By ihsumlee , 15 February 2026 ## Expectation and Long-Run Average Convergence(離散/連續與 RL) Tags RL Expectation