all tokens update

By ihsumlee , 2 March 2026
content

## What “all tokens” includes

If one rollout produces action tokens like:

- time 1: tokens \((u_{1,1}, u_{1,2}, \ldots)\)
- time 2: tokens \((u_{2,1}, u_{2,2}, \ldots)\)
- …
- time T: tokens \((u_{T,1}, u_{T,2}, \ldots)\)

Then:

- if the rollout succeeds → every \(u_{t,k}\) gets reward 1
- if it fails → every \(u_{t,k}\) gets reward 0

Tags