Using combination of actions in reinforcement learning

  • Marcelo J. Karanik Artificial Intelligence Group, National Technological University Resistencia, Chaco, Argentina
  • Sergio D. Gramajo Artificial Intelligence Group, National Technological University Resistencia, Chaco, Argentina
Keywords: SARSA, Reinforcement Learning, Optimal Policy, Action Combination

Abstract

Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.

Downloads

Download data is not yet available.

References

[1] A. McGovern, E. Moss and A. G. Barto, "Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts", Machine Learning, vol 49, No 2, 2002, pp 141–160.
[2] C. J. C. H. Watkins and P. Dayan, "Q-Learning", Machine Learning, vol 8 No. 4, 1992, pp. 279-292.
[3] I. Erev and A. Roth, "Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique Mixed Strategy Equilibria", American Economic Review 8, 1998, pp. 848-881.
[4] J. A. Boyan and M. L. Littman, "Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach", Advances In Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA, 1994, pp. 671-678.
[5] J. Peters, S. Vijayakumar and S. Schaal, "Reinforcement Learning for Humanoid Robotics", In Proceeding Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, 2003, pp. 2002.
[6] L. J. Lin, "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning, and Teaching", Machine Learning, vol 8, 1992, pp. 293-321.
[7] L. P. Kaelbling, L. M. Littman and A. W. Moore, "Reinforcement Learning: a Survey", Journal of Artificial Intelligence Research, vol. 4, 1996, pp. 237–285.
[8] M. Bowling and M. Veloso, "An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning", Technical report CMU-CS-00-165. Computer Science Department, Carnegie Mellon University, 2000.
[9] P. A. Agre and D. Chapman, "What are plans for?", Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back., Cambridge MA: MIT Press., 1990, pp. 17-34.
[10] R. Fitch, B. Hengst, D. Suc, G. Calbert and J. Scholz, "Structural Abstraction Experiments in Reinforcement Learning", In Procceding Australian joint conference on artificial intelligence No18, Sydney, vol. 3809, 2005, pp. 164-175.
[11] R. H. Crites and A. G. Barto, "Improving Elevator Performance Using Reinforcement Learning", Advances in Neural Information Processing Systems 8, Conf., MIT Press, Cambridge, Mass., 1996, pp.1017-1023.
[12] R. Kozierok and P. A. Maes, "Learning Interface Agent for Scheduling Meetings", In Proceeding 1st international conference on Intelligent user interfaces, Orlando, Florida, United States, 1993, pp.81-88.
[13] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge. Massachusetts: MIT Press, 1998.
[14] S. Peshkin and V. Savova, "Reinforcement Learning for Adaptive Routing", In Proceedings of the International Joint Conference on Neural Networks (IJCNN ), 2002, pp. 1825-1830.
[15] S. Russell and P. Norvig, Inteligencia Artificial: Un enfoque moderno. Naucalpan de Juárez Edo. Mexico: Prentice Hall, 1996.
[16] T. G. Dietterich, "The MAXQ Method for Hierarchical Reinforcement Learning", In Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp. 118-126.
[17] W. Zhang and T. G. Dietterich, "A Reinforcement Learning Approach to Job-Shop Scheduling", In Proceeding 1995 International Joint Conference on Artificial Intelligence, AAAI/MIT Press, Cambridge, MA, 1995, pp. 1114-1120.
Published
2010-04-01
How to Cite
Karanik, M. J., & Gramajo, S. D. (2010). Using combination of actions in reinforcement learning. Journal of Computer Science and Technology, 10(01), p. 19-23. Retrieved from http://journal.info.unlp.edu.ar/JCST/article/view/711
Section
Original Articles