这个项目用 PyTorch (v0.4.0) 实现了下列经典的 policy gradient (PG) 算法:
- Vanilla Policy Gradient
- Truncated Natural Policy Gradient
- Trust Region Policy Optimization
- Proximal Policy Optimization
作者还在下列评测集上实现了 PG 算法和模型的训练:
- mujoco-py: https://github.com/openai/mujoco-py
- Unity ml-agent: https://github.com/Unity-Technologies/ml-agents
mujoco-py
Algorithm | Score | GIF |
---|---|---|
Vanilla PG | ||
NPG | ||
TRPO | ||
PPO |
Unity ml-agents
Env | GIF |
---|---|
Plane | |
Curved |
Github 仓库地址
仓库地址:https://github.com/reinforcement-learning-kr/pg_travel
本站微信群、QQ群(三群号 726282629):