PyTorch 实现各种 Policy Gradient 算法 (REINFORCE, NPG, TRPO, PPO)

PyTorch入门实战教程

image

这个项目用 PyTorch (v0.4.0) 实现了下列经典的 policy gradient (PG) 算法:

  • Vanilla Policy Gradient
  • Truncated Natural Policy Gradient
  • Trust Region Policy Optimization
  • Proximal Policy Optimization

作者还在下列评测集上实现了 PG 算法和模型的训练:

mujoco-py

Algorithm Score GIF
Vanilla PG trpo
NPG trpo
TRPO trpo
PPO ppo

Unity ml-agents

Env GIF
Plane plane
Curved curved

Github 仓库地址

仓库地址:https://github.com/reinforcement-learning-kr/pg_travel

PyTorch入门实战教程

Leave a Reply

Your email address will not be published. Required fields are marked *

返回顶部