Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach

Sedat Atakan Yildiz; Önder Şahinaslan; Emin Borandag; Fatih Yücalar

doi:10.38088/jise.1753889

Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach

Abstract

This study aims to systematically compare the performance of two deep reinforcement learning algorithms – Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) – across different game environments. To achieve this, eight distinct test environments from the OpenAI Gymnasium library (CartPole-v1, FrozenLake-v1, LunarLander-v3, Taxi-v3, MountainCar-v0, Blackjack-v1, CliffWalking-v0, and Acrobot-v1) were utilized. Each environment was trained over 1,000,000 timesteps. For each algorithm, key performance metrics such as average reward, training time, standard deviation, success rate, and the highest and lowest reward values were calculated and visualized through graphs. Additionally, the strengths and weaknesses of the algorithms in different environments were analyzed. The results indicate that PPO performs more consistently and effectively in tasks requiring continuous actions, whereas DQN achieves faster and more reliable outcomes in deterministic environments with discrete action spaces. This study provides meaningful insights by comparing the performance of PPO and DQN under identical conditions, while most prior research has examined these algorithms separately.

Keywords

References

Aldana Guerra, E. (2024). Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment. arXiv preprint arXiv:2409.16620.
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W. (2017). Hindsight experience replay. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
Arulkumaran, K., Deisenroth, M. P., Brundage, M., Bharath, A. A. (2017). A brief survey of deep reinforcement learning. IEEE Signal Processing Magazine, 34(6), 26–38.
Bertolotti, F., Roman, S. (2024). Balancing long-term and short-term strategies in a sustainability game, iScience, 27(6), 110020.
Boubaker, O. (2013). The Inverted Pendulum Benchmark in Nonlinear Control Theory: A Survey. International Journal of Advanced Robotic Systems, 10, 233.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv:1606.01540.
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P. (2016). Benchmarking Deep Reinforcement Learning for Continuous Control. Proceedings of the 33rd International Conference on Machine Learning (ICML), 1329–1338.
Fan, J., Wang, Z., Xie, Y., & Yang, Z. (2020). A Theoretical Analysis of Deep Q‑Learning. Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:486–489.

Gillen, S., Molnar, M., Byl, K. (2020). Combining Deep Reinforcement Learning and Local Control for the Acrobot Swing-up and Balance Task, 59th IEEE Conference on Decision and Control (CDC), Jeju, Korea (South), pp. 4129-4134.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2025). Mastering diverse control tasks through world models. Nature, 1-7.
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D. (2018). Deep Reinforcement Learning That Matters. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D. (2018). Rainbow: Combining Improvements in Deep Reinforcement Learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 3215–3222.
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S. (2022). Deep Reinforcement Learning for Autonomous Driving: A Survey, IEEE Transactions on Intelligent Transportation Systems, 23(6): 4909-4926.
Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S. (2019). Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. Advances in Neural Information Processing Systems, 32.
Kumar, S. (2020). Balancing a CartPole System with Reinforcement Learning — A Tutorial. arXiv preprint arXiv:2006.04938.
Mahajan, S., Harikrishnan, R., Kotecha, K. (2022). Adaptive Routing in Wireless Mesh Networks Using Hybrid Reinforcement Learning Algorithm, IEEE Access, 10: 107961-107979.
McCarthy, J., Minsky, M. L., Rochester, N., Shannon, C. E. (2006). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine, 27(4), 12.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D. (2015). Human‑level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Nagendra, S., Podila, N., Ugarakhod, R., George, K. (2018). Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem. arXiv preprint arXiv:1810.01940.
Prasetyo, R. E., Sumanto, S., Chaidir, I., Supriyatna, A. (2025). Reinforcement learning for bitcoin trading: A comparative study of PPO and DQN. Jurnal Mandiri IT, 14(2), 159-169.
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1-8.
Rio, A. d., Jimenez, D., Serrano, J. (2024). Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments, IEEE Access, 12: 146795-146806.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Shen, D. (2024). Comparison of Three Deep Reinforcement Learning Algorithms for Solving the Lunar Lander Problem. In 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023) (pp. 187-199). Atlantis Press.
Stapelberg, B., Malan, K. M. (2020). A survey of benchmarking frameworks for reinforcement learning. South African Computer Journal, 32(2), 258-292.
Sutton, R. S., Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.
Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., Younis, O. G. (2024). Gymnasium: A Standard Interface for Reinforcement Learning Environments. arXiv preprint arXiv:2407.17032.
Wang, X., Wang, S., Liang, X., Zhao, D., Huang, J., Xu, X., Dai, B., & Miao, Q. (2024). Deep Reinforcement Learning: A Survey. IEEE Transactions on Neural Networks and Learning Systems, 35(4), 5064-5078.
Zha, D., Lai, K.-H., Huang, S., Cao, Y., Reddy, K., Vargas, J., Nguyen, A., Wei, R., Guo, J., Hu, X. (2020). RLCard: A Platform for Reinforcement Learning in Card, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 5264-5266.
Zhong, L. (2024). Comparison of Q-learning and SARSA Reinforcement Learning Models on the Cliff Walking Problem. In Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing, (pp. 207–213). Atlantis Press.
Zhu, Y., Wang, Z., Chen, C., Dong, D. (2022). Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction, IEEE/ASME Transactions on Mechatronics, 27(2): 846-857.

Details

Primary Language

English

Subjects

Reinforcement Learning

Journal Section

Research Article

Authors

Sedat Atakan Yildiz
0009-0009-4785-375X
Türkiye

Önder Şahinaslan
0000-0003-2695-5078
Türkiye

Emin Borandag ^*
0000-0001-5553-2707
Türkiye

Fatih Yücalar
0000-0002-1006-2227
Türkiye

Publication Date

April 11, 2026

Submission Date

July 29, 2025

Acceptance Date

November 8, 2025

Published in Issue

Year 2026 Volume: 10 Number: 1

DOI

https://doi.org/10.38088/jise.1753889

IZ

https://izlik.org/JA56PT64YE

Cite

RIS / Bibtex

APA

Yildiz, S. A., Şahinaslan, Ö., Borandag, E., & Yücalar, F. (2026). Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach. Journal of Innovative Science and Engineering, 10(1), 138-157. https://doi.org/10.38088/jise.1753889

AMA

1.Yildiz SA, Şahinaslan Ö, Borandag E, Yücalar F. Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach. JISE. 2026;10(1):138-157. doi:10.38088/jise.1753889

Chicago

Yildiz, Sedat Atakan, Önder Şahinaslan, Emin Borandag, and Fatih Yücalar. 2026. “Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach”. Journal of Innovative Science and Engineering 10 (1): 138-57. https://doi.org/10.38088/jise.1753889.

EndNote

Yildiz SA, Şahinaslan Ö, Borandag E, Yücalar F (April 1, 2026) Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach. Journal of Innovative Science and Engineering 10 1 138–157.

IEEE

[1]S. A. Yildiz, Ö. Şahinaslan, E. Borandag, and F. Yücalar, “Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach”, JISE, vol. 10, no. 1, pp. 138–157, Apr. 2026, doi: 10.38088/jise.1753889.

ISNAD

Yildiz, Sedat Atakan - Şahinaslan, Önder - Borandag, Emin - Yücalar, Fatih. “Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach”. Journal of Innovative Science and Engineering 10/1 (April 1, 2026): 138-157. https://doi.org/10.38088/jise.1753889.

JAMA

1.Yildiz SA, Şahinaslan Ö, Borandag E, Yücalar F. Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach. JISE. 2026;10:138–157.

MLA

Yildiz, Sedat Atakan, et al. “Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach”. Journal of Innovative Science and Engineering, vol. 10, no. 1, Apr. 2026, pp. 138-57, doi:10.38088/jise.1753889.

Vancouver

1.Sedat Atakan Yildiz, Önder Şahinaslan, Emin Borandag, Fatih Yücalar. Comparing the Performance of PPO and DQN Algorithms in Different Game Environments Using a Reinforcement Learning Approach. JISE. 2026 Apr. 1;10(1):138-57. doi:10.38088/jise.1753889