Creating reliable and cooperative intelligence with machine learning

In 2017, DeepMind’s AlphaGo defeated the world’s top player, artificial intelligence (AI) technology received much attention. The technology used for AlphaGo was a method called “deep reinforcement learning” that combines deep learning and reinforcement learning.

Deep learning is a machine learning method that became a hot topic in 2012 because Google identified “cats.” Reinforcement learning is a method of strengthening behaviors that lead to success in a certain situation based on the memory of human success experiences.

Progress in artificial intelligence research is often talked about in terms of winning or losing against people in board games. However, board games are an environment where people can make the moves whatever they think they can, even if the search space is huge, without disturbance (from the outside). Such an environment is called a closed world or a static environment.

On the other hand, what about autonomous driving these days? Roads in Japan are narrow and have low visibility, and are mixed traffic flows where bicycles and pedestrians come and go. In other words, the search space may be larger than Go. Besides, driving has a complicated problem that people cannot control as they would expect like a Go player, such as road friction due to weather and vision obstruction. Instead of the discrete knowledge that traditional reinforcement learning has dealt with, we must deal with continuous data’s complicated relationships.

In a situation where AI is entering human society, technology must consider human characteristics while ensuring safety and efficiency.

In our laboratory, we apply reinforcement learning, which can explain the results to various system designs such as automatic operation, disaster prevention, and energy conservation management. Safety is of the utmost importance in such life-related technologies. It is crucial to explain the results produced by a system regarding why the system did that.

Besides, my own academic interest lies in finding and implementing optimal solutions to problems based on collective knowledge, when various interested individuals come together in a group. This situation is the so-called “two heads are better than one.” In general society, people sometimes say that the ideal communication is not a top-down style. Instead, individual members of a community should seek a better solution for the whole community while resolving conflicts of interest among themselves. This ideal concept overlaps with one of the multi-agent systems in charge of information integration in artificial intelligence.

For this reason, we are trying to apply the process of finding the optimal solution to various system designs using machine learning techniques such as reinforcement learning, inverse reinforcement learning, and imitation learning, in which agents cooperate.

Figure 1：An experiment of proposing an automated driving strategy using reinforcement learning: a road-to-vehicle communicator measured the number and type of automated and manual vehicles, and shared information upstream and downstream to calculate vehicle density, etc [2].

In particular, inverse reinforcement learning can mechanically estimate the design of rewards, which is a challenging task in reinforcement learning and is useful for elucidating behavioral principles from the behavior tracking of skilled drivers and other organisms such as bees. Using the behavioral principles revealed by inverse reinforcement learning, we can use reinforcement learning methods to design an autonomous system control that produces optimal strategies even in an unknown environment.

We have used machine learning methods to conduct collaborative research with companies on automated driving and autonomous unmanned submersibles. In the future, we will apply this knowledge to research on drones. Our goal is to build a platform for robots and humans to coexist and co-evolve in a wide range of water, land, and air spaces.

Reference

[1] Y. Nakata and S. Arai: Bayesian inverse reinforcement learning using expert trajectories in multiple environments, Proc. of the Japanese Society for Artificial Intelligence, Vol. 35, No. 1, p. G-J73_1-10 (2020.1.1), DOI: https://doi.org/10.1527/ tjsai.G-J73
[2] Shota Ishikawa, S. Arai, and S. Arai : A Learning Method for Automatic Driving Strategies to Achieve Road-to-vehicle and Vehicle-to-Vehicle Coordination to Reduce Congestion, Transactions of the Japanese Society for Artificial Intelligence, Vol34, No.1, p.D-155_1-9 (2019.1), DOI : https://doi.org/10.1527/ tjsai.D-I55
[3] Daiko Kishikawa and Sachiyo Arai, Comfortable Driving by Using Deep Inverse Reinforcement Learning, The 4th IEEE International Conference on Agents (ICA 2019), 18-21, October 2019 at Jinan, China, (2019).

Profile

Sachio Arai

Received PhD in Engineering from Tokyo Institute of Technology Graduate School of Science and Engineering after working for Sony Corporation. After worked at the University of California at Berkeley, Carnegie Mellon University, Kyoto University, Stanford University, and the National Institute of Informatics, currently work as a professor at the Graduate School of Engineering, Chiba University. She specializes in autonomous and distributed systems and machine learning for multi-agent systems.