Hello, may I ask which version of Verl was used for training? I noticed that the inference process uses a continuation-based approach, as shown in the /src/run_deep_agent.py. However, the Verl 0.7 training framework, specifically the agent_loop mode, only supports multi-turn dialogue format for inference.
Hello, may I ask which version of Verl was used for training? I noticed that the inference process uses a continuation-based approach, as shown in the /src/run_deep_agent.py. However, the Verl 0.7 training framework, specifically the agent_loop mode, only supports multi-turn dialogue format for inference.