![]() ![]() Using action with maximum value (default in DQN) The possible settings are displayed below: In case of the given action has the maximum q-value, the agent chooses the Raise action instead if it is a valid action. These features are described in the Nature paper Human-level control through deep reinforcement learning.įurthermore, as an extra component, we added the opportunity of a more aggressive playing strategy. Second, to make the training more stable, another Q-network is used as a target network in order to backpropagate through it and train the policy Q-network. First, it uses a replay buffer to store past experiences and we can sample training data from it periodically. This implementation is an advanced Q-learning agent in two aspects. We used the RLcard DQN agent written in TensorFlow as a base and created a more powerful, more manageable, and easy to use code in Pytorch. The code for the second milestone is a DQN agent in PyTorch. It is used as a presentation that the chosen environment works and the agent is ready to train. ![]() The presented code for the first milestone is based on the RLcard github repository example code. Team members: László Barak, Mónika Farsang, Ádám Szukics After training, we can play against our pre-trained agent. Our project focuses on reinforcement learning with the aim of training an agent in a poker environment. This repository contains the project for the Deep learning class (course code: VITMAV45) at the Budapest University of Technology and Economics. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |