Problem
During a multi-turn/multi-step trajectory (in Gym), any component invoked during the rollout could fail resulting in the trajectory needing to be recomputed if failure not handled gracefully. This could be the result of a failed tool call or the generation backend failing. For long context rollouts (e.g., 40min per rollout), this would lead to poor efficiency if a 40min rollout had to be started again from scratch
Ideal solution
When one step or turn fails in the trajectory collection, the trajectory collection should be able to resume from the partial state of the collection.
Proposal
Related to #2414
Introduce a data plane to store all the partial trajectories. In the event of a failure during trajectory collection, the collection can be restarted by pulling the last partial state from the data plane.
Alternatives
Opt 1
In the current implementation of gym, we could also potentially enabling trajectory checkpointing by storing the partial states in the ray object store. This way if the gym agent has to be restarted, it can be passed a handle of where to fetch the data in that event. @ananthsub
Problem
During a multi-turn/multi-step trajectory (in Gym), any component invoked during the rollout could fail resulting in the trajectory needing to be recomputed if failure not handled gracefully. This could be the result of a failed tool call or the generation backend failing. For long context rollouts (e.g., 40min per rollout), this would lead to poor efficiency if a 40min rollout had to be started again from scratch
Ideal solution
When one step or turn fails in the trajectory collection, the trajectory collection should be able to resume from the partial state of the collection.
Proposal
Related to #2414
Introduce a data plane to store all the partial trajectories. In the event of a failure during trajectory collection, the collection can be restarted by pulling the last partial state from the data plane.
Alternatives
Opt 1
In the current implementation of gym, we could also potentially enabling trajectory checkpointing by storing the partial states in the ray object store. This way if the gym agent has to be restarted, it can be passed a handle of where to fetch the data in that event. @ananthsub