
这个项目用 PyTorch (v0.4.0) 实现了下列经典的 policy gradient (PG) 算法:
- Vanilla Policy Gradient
- Truncated Natural Policy Gradient
- Trust Region Policy Optimization
- Proximal Policy Optimization
作者还在下列评测集上实现了 PG 算法和模型的训练:
- mujoco-py: https://github.com/openai/mujoco-py
- Unity ml-agent: https://github.com/Unity-Technologies/ml-agents
mujoco-py
| Algorithm | Score | GIF |
|---|---|---|
| Vanilla PG | ![]() | ![]() |
| NPG | ![]() | ![]() |
| TRPO | ![]() | ![]() |
| PPO | ![]() | ![]() |
Unity ml-agents
| Env | GIF |
|---|---|
| Plane | ![]() |
| Curved | ![]() |
Github 仓库地址
仓库地址:https://github.com/reinforcement-learning-kr/pg_travel
本站微信群、QQ群(三群号 726282629):
















