A parallel implementation of Q-learning based on communication with cache


  • Alicia Marcela Printista Departamento de Informática, Universidad Nacional de San Luis, 5700 San Luis, Argentina
  • Marcelo Luis Errecalde Departamento de Informática, Universidad Nacional de San Luis, 5700 San Luis, Argentina
  • Cecilia Inés Montoya Departamento de Informática, Universidad Nacional de San Luis, 5700 San Luis, Argentina


Parallel Programming, Communication based on cache, Reinforcement Learning, Asynchronous dynamic programming


Q-Learning is a Reinforcement Learning method for solving sequential decision problems, where the utility of actions depends on a sequence of decisions and there exists uncertainty about the dynamics of the environment the agent is situated on. This general framework has allowed that Q-Learning and other Reinforcement Learning methods to be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others. Despite its interesting properties, Q-learning is a very slow method that requires a long period of training for learning an acceptable policy. In order to solve or at least reduce this problem, we propose a parallel implementation model of Q-learning using a tabular representation and via a communication scheme based on cache. This model is applied to a particular problem and the results obtained with different processor configurations are reported. A brief discussion about the properties and current limitations of our approach is finally presented.


Download data is not yet available.


[1] A. G. Barto, S. J. Bradtke y S. P. Singh. "Learning to act using real-time dynamic programming". Artificial Intelligence, 72: 81-138, 1995.
[2] D. P. Bertsekas. "Distributed dynamic programming". IEEE Transactions on Automatic Control, 27: 610-616, 1982.
[3] D. P. Bertsekas y J. N. Tsitsiklis. "Parallel and Distributed Computation: Numerical Methods". Prentice Hall, Englewood Cliffs, NJ, 1989.
[4] M.Errecalde, M. Crespo y C. Montoya. "Aprendizaje por Refuerzo: Un estudio comparativo de de sus principales métodos". Proc. del II Encuentro Nacional de Computación (ENC´99). Sociedad Mexicana de Ciencia de la Computación. México, 1999.
[5] I. Foster "Designing and Building Parallel Programs". Addison Wesley – 1995.
[6] R. Guerrero, F. Piccoli y M. Printista. " Parallelism and Granularity in an Echo Elimination System". Proceedings of 12th. International Conference on Control Systems and Computer Science. Vol. II – pags. 232-237, Bucharest, Romania, 1999.
[7] L. P.Kaelbling, M. Littman y A. Moore. "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research 4 (1996) - 237-285 - Mayo 1996.
[8] L.P.Kaelbling. Learning in Embedded Systems. MIT Press. Cambridge, MA, 1993.
[9] L. - J.Lin. "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching". Machine Learning - Volumen 8 - Número 3/4 - Mayo 1992.
[10] R.Maclin y J. W. Shavlik. "Creating Advice-Taking Reinforcement Learners". Machine Learning - Volumen 22 - Págs. 251 - 282. 1996.
[11] T.Mitchell. "Machine Learning". Capítulo 13. (Versión preliminar).
[12] A.Moore y C. Atkeson. "Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time ". Machine Learning - Volumen 13 - Número 1 - Octubre 1993.
[13] J.Peng y R. J. Williams. "Efficient learning and planning within the Dyna framework". Adaptative Behavior, 1(4), Págs. 437-454, 1993.
[14] Power Mouse.in details "http://www.parsytec.de/top/products/pm-detail.htm
[15] S.Russell y P. Norvig. "Artificial Intelligence. A modern Approach".Prentice –Hall –1995.
[16] R.Sutton y A. Barto. " Reinforcement Learning: an introduction". The MIT Press, 1998.
[17] R.Sutton. "Dyna. an Integrated Architecture for Learning, Planning, and Reacting" Working Notes of the AAAI Spring Symposium, pp.151-155, 1991.
[18] R.Sutton. "Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming". – Proceedings of the Seventh Int. Conf. On Machine Learning, pp. 216-224, Morgan Kaufmann, 1990.
[19] S.Thrun y K. Moller. "Active exploration in dynamic environments". Advances in Neural Information Proccesing Systems, 4, pags. 531 - 538. San Mateo, CA, Morgan Kaufmann, 1992.
[20] S.Thrun. "The role of Exploration in Learning Control". Handbook in Intelligent Control: Neural, Fuzzy, and Adaptative Approaches, White, D. A., & Sofge, D. A. (Eds.).
[21] C. J. C. H. Watkins. "Learning from Delayed Rewards". PhD thesis, Cambridge University, 1989.
[22] C. J. C. H. Watkins y P. Dayan. "Q-Learning". Machine Learning, 8: 279-292, 1992.
[23] B. Wilkinson y M. Allen "Parallel Programming: Techniques and Application using Networked Workstations and Parallel Computers". Prentice Hall. – 1999.




How to Cite

Printista, A. M., Errecalde, M. L., & Montoya, C. I. (2002). A parallel implementation of Q-learning based on communication with cache. Journal of Computer Science and Technology, 1(06), 11 p. Retrieved from https://journal.info.unlp.edu.ar/JCST/article/view/969



Original Articles

Most read articles by the same author(s)

1 2 > >>