“Model-Free All-Source-All-Destination Learning as a Model for Biological Reactive Control”

Authors: Martinius Knudsen, Sverre Hendseth, Gunnar Tufte and Axel Sandvig,
Affiliation: NTNU, Department of Engineering Cybernetics and NTNU
Reference: 2021, Vol 42, No 4, pp. 197-204.

Keywords: Control,Reinforcement learning, Underactuated systems

Abstract: We present here a model-free method for learning actions that lead to an all-source-all-destination shortest path solution. We motivate our approach in the context of biological learning for reactive control. Our method involves an agent exploring an unknown world with the objective of learning how to get from any starting state to any goal state in shortest time without having to run a path planning algorithm for each new goal selection. Using concepts of Lyapunov functions and Bellman's principle of optimality, our agent learns universal state-goal distances and best actions that solve this problem.

PDF PDF (373 Kb)        DOI: 10.4173/mic.2021.4.5

References:
[1] Bellman, R. (1952). On the theory of dynamic programming, Proceedings of the National Academy of Sciences of the United States of America. 38(8):716. doi:10.1073/pnas.38.8.716
[2] Bellman, R. (1958). On a routing problem, Quarterly of applied mathematics. 16(1):87--90. doi:10.1090/qam/102435
[3] Bertsekas, D.P. (2016). Dynamic Programming and Optimal Control, Number3 in Athena Scientific Optimization and Computation Series. Athena Scientific, Belmont, Mass, fourth ed edition.
[4] Britannica, E. (2020). Information theory - Physiology, https://www.britannica.com/science/information-theory.
[5] Camacho, E.F. and Alba, C.B. (2013). Model Predictive Control, Springer Science & Business Media.
[6] Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K., and Clune, J. (2018). Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, In Advances in Neural Information Processing Systems. pages 5027--5038.
[7] DeepAI.org. (2021). What is narrow ai? https://deepai, org/machine-learning-glossary-and-terms/narrow-ai, 2021.
[8] Deo, N. and Pang, C.-Y. (1984). Shortest-path algorithms: Taxonomy and annotation, Networks. An International Journal. 14(2):275--323. doi:10.1002/net.3230140208
[9] Dijkstra, E.W. etal. (1959). A note on two problems in connexion with graphs, Numerische mathematik. 1(1):269--271. doi:10.1007/bf01386390
[10] Floyd, R.W. (1962). Algorithm 97: Shortest path, Communications of the ACM. 5(6):345. doi:10.1145/367766.368168
[11] FordJr, L.R. (1956). Network flow theory, Technical report, Rand Corp Santa Monica Ca.
[12] Freeman, R.A. and Primbs, J.A. (1996). Control Lyapunov functions: New ideas from an old source, In Proceedings of 35th IEEE Conference on Decision and Control, volume4. IEEE, pages 3926--3931. doi:10.1109/cdc.1996.577294
[13] Huang, X. and Weng, J. (2002). Novelty and reinforcement learning in the value system of developmental robots, 2002.
[14] Izquierdo-Torres, E. and DiPaolo, E. (2005). Is an Embodied System Ever Purely Reactive? In M, S. Capcarrere, A.A. Freitas, P.J. Bentley, C.G. Johnson, and J.Timmis, editors, Advances in Artificial Life, Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pages 252--261. doi:10.1007/11553090_26
[15] Krebs, R.M., Schott, B.H., Schutze, H., and Duzel, E. (2009). The novelty exploration bonus and its attentional modulation, Neuropsychologia. 47(11):2272--2281. doi:10.1016/j.neuropsychologia.2009.01.015
[16] McCloskey, M. and Cohen, N.J. (1989). Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem, In G.H. Bower, editor, Psychology of Learning and Motivation, volume24, pages 109--165. Academic Press. doi:10.1016/S0079-7421(08)60536-8
[17] Roussou, M. (2004). Learning by doing and learning through play: An exploration of interactivity in virtual environments for children, Computers in Entertainment. 2(1):10. doi:10.1145/973801.973818
[18] Schiavone, G., Grossekathoefer, U., aCampo, S., and Mihajlovic, V. (2015). Towards real-time visualization of a juggler's brain, Brain-Computer Interfaces. 2(2-3):90--102. doi:10.1080/2326263X.2015.1101656
[19] Spong, M., Hutchinson, S., and Vidyasagar, M. (2005). Robot Modeling and Control, Wiley.
[20] Sutton, R.S. and Barto, A.G. (2018). Reinforcement Learning: An Introduction, Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts, second edition edition.
[21] Toth, P. and Vigo, D. (2002). The Vehicle Routing Problem, SIAM.
[22] VanLaarhoven, P.J. and Aarts, E.H. (1987). Simulated annealing, In Simulated Annealing: Theory and Applications, pages 7--15. Springer. doi:10.1007/978-94-015-7744-1_2
[23] Zipser, D. (1991). Recurrent network model of the neural mechanism of short-term active memory, Neural Computation. 3(2):179--193. doi:10.1162/neco.1991.3.2.179


BibTeX:
@article{MIC-2021-4-5,
  title={{Model-Free All-Source-All-Destination Learning as a Model for Biological Reactive Control}},
  author={Knudsen, Martinius and Hendseth, Sverre and Tufte, Gunnar and Sandvig, Axel},
  journal={Modeling, Identification and Control},
  volume={42},
  number={4},
  pages={197--204},
  year={2021},
  doi={10.4173/mic.2021.4.5},
  publisher={Norwegian Society of Automatic Control}
};