Code the wheel poker bot

12/20/2023

Using action with maximum value (default in DQN) The possible settings are displayed below: In case of the given action has the maximum q-value, the agent chooses the Raise action instead if it is a valid action. These features are described in the Nature paper Human-level control through deep reinforcement learning.įurthermore, as an extra component, we added the opportunity of a more aggressive playing strategy. Second, to make the training more stable, another Q-network is used as a target network in order to backpropagate through it and train the policy Q-network. First, it uses a replay buffer to store past experiences and we can sample training data from it periodically. This implementation is an advanced Q-learning agent in two aspects. We used the RLcard DQN agent written in TensorFlow as a base and created a more powerful, more manageable, and easy to use code in Pytorch. The code for the second milestone is a DQN agent in PyTorch. It is used as a presentation that the chosen environment works and the agent is ready to train.

The presented code for the first milestone is based on the RLcard github repository example code. Team members: László Barak, Mónika Farsang, Ádám Szukics After training, we can play against our pre-trained agent. Our project focuses on reinforcement learning with the aim of training an agent in a poker environment. This repository contains the project for the Deep learning class (course code: VITMAV45) at the Budapest University of Technology and Economics.

0 Comments

Code the wheel poker bot

Leave a Reply.

Author

Archives

Categories