|The knight’s tour problem is an instance of the Hamiltonian path problem that is a typical NP-hard problem. A knight makes L-shape moves on a chessboard and tries to visit all the squares exactly once. The tour is closed if a knight can finish a complete tour and end on a square that is a neighbourhood of its starting square; Otherwise, it is open. Many algorithms and heuristics have been proposed to solve this problem. The most well-known one is warnsdorff’s heuristic. Warnsdorff’s idea is to move to the square with the fewest possible moves in a greedy fashion. Although this heuristic is fast, it does not always return a closed tour. Also, it only works on boards of certain dimensions. Due to its greedy behaviour, it can get stuck into a local optimum easily. That is similar to the other existing approaches. Our goal in this thesis is to come up with a new strategy based on reinforcement learning. Ideally, it should be able to find a closed tour on chessboards of any size. We will consider several approaches: value-based methods, policy optimization and actor-critic methods. Compared to previous work, our approach is non-deterministic and sees the problem as a single-player game with a tradeoff between exploration and exploitation. We will evaluate the effectiveness and efficiency of the existing methods and new heuristics.