Skip to main content

Hitachi
Research & Development

In recent years, reinforcement learning (RL) has been demonstrating performance beyond that of humans in the areas of games and robotics. However, there are still many hurdles which need to be overcome before we can apply RL to real-world systems. In this technotalk, Professor Aaron Courville from the Department of Computer Science and Operations Research at the Université de Montréal joins Takashi Watanabe from the Research & Development Group of Hitachi, Ltd. to discuss the challenges being faced in applying AI to increasingly complex real-world systems such as societal infrastructure and IT service operations, and the possibilities in doing so.

(Published 23 March 2023)

Reinforcement learning today, and future directions

Takashi
Thank you very much, Aaron, for joining me today to discuss the expectations and the challenges for the application of RL in society. I’d like to start the discussion with the kinds of problems RL technology is currently able to solve, and then move on to what direction we expect for its development.


In my view, RL in the field of gaming and robotics is very remarkable. Expectations are emerging that these technologies will enable decision-making, not only in a deterministic world such as games and robotics, but also in uncertain situations in the real world.


For example, in our factories and stores every day there are big fluctuations in demand, and because of that, our target for manufacturing and shipping of contents and workloads changes dynamically depending on these demand changes. This fluctuation is getting bigger and bigger with globalization. Currently professional leaders are controlling these situations, but at the same time, with the advancement of digitalization, we think machine learning techniques will have more chances to be applied to these situations.


And, as we take one step forward, we expect RL to be very promising in supporting humans in these situations. How do you expect machine learning will be used in real-world applications?

Aaron
Thank you very much for inviting me to this forum. Essentially, I think you’re asking how we can use RL to take on a lot of our current challenges in business, industry, and society. It’s a very interesting question.So, let’s start with the case where we know that RL works, as you mentioned, in game scenarios and very simple controlled settings.

Aaron COURVILLE


When we train these RL agents, we’re training them almost always in a simulation setting where we run them over many, many, often millions, of frames of practice. We collect a lot of data and train them on that kind of data. Then we iterate this process, so they have an amazing amount of what is essentially equivalent to real-world time in training these systems. These are settings where we think RL can work well: in cases where the environment is simple enough and we have enough data and enough experience with that environment where these systems can figure out through a brute force kind of search what strategies end up working well.

Hurdles in applying RL techniques to real-world systems

Takashi
There are many technical challenges in this field. As you explained, simplification of the real world is very important to start with. Currently, I think in machine learning the machine is learning from the past and from humans. We are making a strategy from the real world and real processes, and the machine learns from these human policies and human strategies. But in the real world, there are many cases and processes. Because of that, we need to make a model for each process. And to do so, we need many researchers to make different models to analyze and mimic the real-world models. But this is very difficult to do, and we have a limited number of researchers in this world.


This leads us to have expectations for inverse RL, some kind of automatic learning from data about human activities. At the same time, the world is becoming more and more complex, so decision-making becomes more difficult. Therefore, there are expectations that RL can be used to pre-verify or pre-learn the wide range of possibilities to enable better decision-making by taking into account a large amount of information. As you mentioned, simulation will be a very good technique to analyze what might happen in this world. For this, we need to think about the kinds of hurdles in applying RL techniques to existing real-world systems, along with the emergence of deep learning and its rapid expansion into real-world applications. But why is RL needed?

Aaron
I think the real promise that RL gives us is to go beyond what humans can already do. As you mentioned, we can do inverse RL, that's a particular way of approaching the problem. But we can also do what's called behavioral cloning or more generally imitation learning where we actually take recordings of humans performing certain behaviors and then train systems to just emulate those. And we're essentially in that case turning a RL problem into what we're better at, which is a standard supervised learning problem. It's just easier because we can actually tell the system to do this very particular thing in this context. That's a much easier learning problem than a general RL problem, because in general RL, you have no idea if the rewards you're getting are particularly high or particularly low. You only get a notion of that with more experience. And so it's a much weaker learning signal, which on one hand, makes it a much harder problem. But on the other hand, it makes it a much more powerful framework because it allows us to set up problem domains where we don't even know what the right solutions are.


If we think about the well-known example of this in the game of go, AI systems in fact discovered strategies that weren't even known to the best human experts in the world. And that's the promise that we see with RL in potentially every domain where we can imagine a world where we discover strategies that are just better than what we've come up with up till now. This can be in small ways or in very, very important, very large ways that could change the way business interactions happen or change the way certain kinds of scheduling problems happen. Actually, discovering systems that come up with new ways of doing things is really where the promise for RL is. I think that's the reason why we want to work on it. In addition to being a challenging problem, I think that's the real promise that it has.

Takashi
Yes, I believe that is a very important point. For example, at Hitachi, the person in charge of the distribution warehouses makes a decision based on the overall understanding of what workers are working on, and what their targets are. There are many fluctuations depending on customer demand. We change the plan for what will be made depending on what we expect will be shipped in the near future.

WATANABE Takashi


Currently the human leader is controlling this process, but we need to understand what they're thinking or what their control point is. As you mentioned, RL would help us to understand the thought process within the human brain to improve throughput or create a better world. Because of that, we want to increase our use of RL. But we feel at the same time that it is very difficult to define what aspect we need to measure or think about. It also depends on what kind of simulator we should make. What kind of accidents do we need to incorporate in the simulators? This is very difficult to define currently. As you mentioned, generalization would be one way. Do you have any ideas or expectations in this area?

Aaron
You raised a really good point about the underlying importance of simulation. Right now we're in a paradigm where if we expect our RL systems to do anything interesting at all, we're going to be doing this by training them with a large amount of simulation. And exactly as you say, we have to give it lots of different scenarios. For example, autonomous driving is one of the major potential applications of RL right now. We're in a situation where 99% of the time our vehicles can probably drive autonomously, with no problems at all.

Aaron COURVILLE


But it's that 1% or even 0.1% of the time that something unexpected happens that these vehicles will fail to do the right thing in that particular moment. One of the critical aspects, and this is true of machine learning in general, but in particular for RL, is that sometimes the way these systems fail can be pretty counterintuitive. That undermines our confidence in how they behave. So making sure that we simulate all possible scenarios, or at least the vast majority of possible scenarios, becomes important in those kinds of settings. That's really the challenge. In an open dynamic world, you really can't cover every possible scenario. So being able to generalize from the experience you have to a new experience becomes really important. But they go hand in hand, right? We need these rich simulations with a great deal of experience together with the ability to generalize from the experience in simulation to being out there in the real world. That's a critical combination that we're going to need to go forward to improve our RL systems.

Reinforcement learning in society: the Human-AI relationship

Takashi
There are two ways we may go. One is making AI that can do everything; the other is making AI that can cope with the fluctuations in any one task. We currently believe deep learning techniques can be used to recognize the world or make some kind of decision better than humans, and we utilize these techniques to automate our processes. But at the same time we feel there is some difficulty in generalizing each task to bigger tasks. So, we need to make a simulator for the upper layer to automatically analyze the holistic processes with RL. This question is very difficult, but this is what we need to focus on.

Aaron
We're in an interesting scenario where AI is being used in these limited domains. It's still not that easy, but we find that we're able to achieve this kind of superhuman performance. But again, it's because these domains are so limited, and in a game of go or Atari, the environment is highly controlled. It's a fixed domain. But as soon as we go out into the real world, things become much more complicated, both on the perception side and on the control side. On the perception side we can lean on a lot of the advances we've made in deep learning and machine learning in general. In those cases, I'm very optimistic about our ability to make relatively rapid progress because we can translate all of the work we've done on the computer vision side into perception for RL. We've already done that to a certain extent. You can see this in the rapid progress we've made going from what are called state space systems where we're encoding joint angles to similar scenarios where we're using raw pixels as input from a vision system.


On the control side, we're making good progress, but I think we still need to make quite a few more gains before we can start seeing really, truly practical large-scale applications of RL.

Takashi
Yes, we need to learn many things from AI applications. And I think the starting point is how in this world RL will be used with humans. We are currently using AI as our friend in the field of games and robotics, such as in shogi in Japan. It is well-known that humans have been learning from AI how to make the moves for several years. We learn from AI to get a better understanding of the game. This will be one area of collaboration between AI and humans. There may be a possibility for teaching an RL-based agent and have people monitoring decisions and learning from the agent to accomplish many tasks. This is very interesting and very fun for future developments.

Aaron
Right. There's actually a really interesting interaction between learning systems and RL in the behavior they learn. This is something that as a research community, we're starting to get into. It's the interaction actually between what amounts to game theory and RL. There are some pretty interesting dynamics that happen here. You can end up with an interaction where you have different learning systems that are going to be interacting with each other that might not actually share the same rewards. They each could have slightly different interests. This turns out to be the setting where you have humans potentially interacting with these agents.

Aaron COURVILLE


This is going to come up as a potential scenario and you have to take these kinds of interactions into account and the fact that these two behaving systems might not actually necessarily share the same understanding of the task, or might not share exactly the same goals. It turns out that these interact with RL in some pretty interesting ways.


In other words, if we consider different agents having different goals in mind, then we enter the world of game theory where these two agents might not necessarily have aligned goals - which means that they are not necessarily always destined to cooperate. The interaction between RL and game theory is something that we've really not explored very much at all. And I think it's going to become extremely important once we start deploying many of these systems in the real world because they'll be interacting with each other. As soon as you have that, you have the opportunity for some to start learning to exploit others in some ways. A whole new generation of issues comes up there. But I think we’re still a little bit away from having to deal with those kinds of issues for now.

Takashi
You have machines that feel differently from humans. One thing we found is that in the logistics warehouses, there are many fluctuations in the workload every day because humans demand change. For example, once it rains, the workload of the next day for the shipping of items will increase because when it rains our customers buy something from the stores, so stores run out of stock. And they demand that we ship new stock for the next day. We believed rain was very closely related to our shipping workload. But by using AI, we found that the rain itself is not important. It was the forecast that was important.

Aaron
That's a very interesting observation. That's a fun thing to discover.

Takashi
Yes, by introducing this kind of AI, our system is getting better and it’s getting more intelligent. With the world becoming more complex, AI may be expected to solve such complex decision-making problems that are difficult to solve by human judgment. We have a limited ability to think about everything in the world. But AI can notice the world more because they have unlimited amounts of data in some cases. So, we think that we can expect AI to solve these kinds of complex problems for us in collaboration with us. In future systems, this will be very important for social infrastructure and social innovation, I believe.

WATANABE Takashi

Aaron
I think that's a very interesting perspective on AI, and I think it's actually absolutely right that we want to have these kinds of AI systems that can learn these complex predictions from these very complex scenarios. There's reason to believe that we can get there. There's no real reason to imagine that there's going to be a limit to what we can predict in terms of complexity. One of the ways I like to think about this idea of managing complexity is what these neural net systems do that deep learning is based on and what most of AI is really built on. They manage this complexity. We see this best in these kinds of perception systems. When people started working on object detection systems, they would start writing rules down about how to detect things, like a kind of old school AI. We thought that it would be relatively simple to just write a few rules down: if you detect this kind of a shape then it's going to be a cup or something like that. But it turns out if the lighting changes slightly or if the camera angle changes slightly, then those rules don't apply, and you need to add more rules on top of that. If you try to build in rules in that way, it you quickly get into this huge array of rules that contradict each other and they become completely unmanageable.


Large perception systems, these deep learning perception systems can be based on either computational neural maps or maybe more modern methods like transformers. These kinds of models really are about detecting little features and then features on top of those little features and then features on top of those features. By their managing capacity of all the different kinds of inputs they receive they're able to learn about the invariances in the world and about becoming robust to different light conditions and to different camera positions.


We can take that concept and apply it to other systems. In the computer vision system, one of the reasons why we thought the problem is easy is because humans just naturally happen to have good vision systems in our brains. So, our deep learning systems can solve that problem. Maybe to us it doesn't seem like quite as impressive a problem as some of these other ones, but that's a really complex problem. The way that these neural systems are solving it is through these kinds of layering of complex decisions on top of complex decisions. We can take that perspective or that strategy and apply it to other systems which our human brains have not necessarily evolved to deal with very well. Now there's a caveat, of course, that these neural systems that we're building are inspired by human brains. So there might be interesting interactions in the sense that they might actually have a hard time solving these more complex systems that we're not very good at either. But I agree with you that there is no real reason why these systems can't be extended beyond what we're able to do and make these complex decisions that would be very difficult for at least a single human.

The Need for Explainability and Systematic Generalization

Takashi
Yes, the technology is getting better and better. It is surpassing our ability and getting beyond our limitations. Because of that, at the same time, the issue of operational quality assurance of these models is very important. For example, RL model behavior and quality assurance are still in discussion and approval stages at this time. Hitachi is also conducting research and development on the explainability of RL models. We think AI can help humans in the world. We need to make technology based on rules to support humans to understand AI and accept AI.

Aaron
I agree. It's actually interesting because we have to be a bit careful when it comes to explainability. Some of the early explainability research work was along the lines of just trying to produce a result that sounds like an explanation. But of course, sometimes the actual decision that these methods come up with and the reason they make that decision do not necessarily correlate super well with the explanations they offer. So, coming up with true explainability is a very challenging problem. One of the things that my group concentrates on is what we call systematic generalization. This is trying to get these systems to behave a little bit more like how we believe humans behave in terms of those kinds of generalizations we make. And I think that is maybe a more robust path towards building confidence in these systems.


I mentioned before how some of these AI systems, when they fail, they fail in counterintuitive ways. That's because the way most of these systems seem to be learning right now is really a bit unique compared to the way humans learn. We typically rely on rules and then we apply rules in new contexts. They're much more focused on the distribution of the training data. And they tend to degrade in terms of their prediction performance as they get away from the training data. Humans behave a little bit like this. But the sort of way AI systems degrade becomes much more catastrophic than the way humans would degrade. For example, if you've never trained a system to see a particular pattern of cat or a particular pattern of dog, very likely if the system was presented with that pattern of dog or cat, it's enough out of the distribution that they fail to recognize it as a dog or cat. What's a real issue for these kinds of systems is being able to go back and learn what makes a cat a cat, what makes a dog a dog based on more or less a set of rules and be able to then apply it to an example that is not from the same distribution that it was trained on. I don't think we're quite there yet.

Aaron
It's an interesting question, and even in these very large-scale language models that we see coming out now, it’s a very open question as to whether or not those systems are able to do this kind of systematic generalization. In the context of these language models, we often call this compositional generalization where we see different individual elements in the training set, but unique combinations of them aren't necessarily in the training set.

Aaron COURVILLE


And we're asking, do they generalize to those unique combinations under a test scenario? It's a challenging area to do research with these very large-scale models because they're trained on so much data. It's hard for us to devise scenarios to ensure that they actually haven't seen data similar to what we're testing them on in their training set. It's an area that we're actively engaged in, this question of what I call systematic generalization. I think once we get there, once we have these kinds of systems, then we're going to start to have systems that make mistakes that are closer to how humans make mistakes which are a bit more intuitive. And we're going to start to build trust in these systems a little bit more because of that. I think that is maybe a path towards a wider acceptance of these methods. First of all, they need to be more robust. We need to be able to trust that under these kinds of scenarios we can expect to see that they're going to behave in predictable ways and not fail because there's a weird shadow that occurred in one instance. Did the car drive off the road or hit a truck because of that weird shadow? That's the danger that we currently are struggling with. Different people are pursuing different pathways and one of the directions I'm really interested in is systematic generalization.

Academia and industry collaboration on RL for societal innovation

Takashi
If we make a better AI that can realize better accuracy, we can expect more accuracy, and we can rely on its result. This is a difficult point. But at the same time, as you mentioned, machine learning cannot realize 100% reliability or 100% correspondence with our understanding. This leads to some misunderstandings with AI’s outcome. This will cause some errors or problems in the future. We cannot avoid this difficult discussion for the future. Because of that, we need to discuss things from different viewpoints: from the business side or the engineering side or from academia or the philosophical side. This is very important to be able to advance these technologies and use these technologies in the real world.


So, what direction do you think academia and industry should pursue to realize progress in implementing societal innovation through RL? In Hitachi’s view, science and technology are the two wheels of social innovation, and we believe it is veryimportant for academia and business to work together to turn these two wheels. From this perspective, I feel that the collaboration between Mila and Hitachi is very meaningful and very important. There are pressing issues at the actual work site because companies tend to get bogged down. Because of that we think in order to make innovations we need to step back and think about the whole view, think more globally, and solve the problems for the future. Joint research is expanding these perspectives very much, so this is a very important collaboration. As I mentioned, thinking from different viewpoints is very important. What is your opinion about this?

Aaron
I completely agree. I think it's extremely important that we have all these different viewpoints coming together. I very much enjoy working with and collaborating with Hitachi and other companies because they bring a certain perspective and real, challenging problems. One of the aspects that I like best about being in academia is that we can think about these problems, and sometimes what we do is abstract them away.

Aaron COURVILLE


But it's in the interaction that I think we see the most progress being made because we need to be grounded in real-world problems if we're going to be making an impact in society. I think we're all interested in making that kind of impact and so talking and interacting with people in companies about their real-world problems is really interesting and it fuels our interest and our curiosity into what the real problems are out there right now. In fact, to a certain extent, a lot of my interest in this question of systematic generalization right now feels like we often deal with those kinds of questions with fairly toyish experiments in an abstract way. To some extent we set up very artificial toyish scenarios when we're studying it. But this field was really inspired by companies that would come in and look at their real data scenarios and they would say, “OK, well, we train these models in the laboratory setting. We test them in a laboratory setting. We get really good performance and then we deploy them out there, and suddenly the performance deteriorates dramatically.” They become very frustrated with these machine learning systems that were supposedly so accurate in the lab. And it's exactly that kind of real-world problem that we've become interested in solving because there's something else going on there. There's something that these systems aren't doing. It's in that interaction that these real problems come to light. So I think it's extremely important that we have these kinds of interactions with what the real challenges are in the real world when we try to deploy these systems.


And then we can take them into academia and chew on them for a little while. It's a slow process to make these kinds of advances. I found that you can chew on a problem for a while without a whole lot of luck or a whole lot of progress. And then, all of a sudden, things move very quickly. Something gets unlocked. And then there's rapid progress that's made. It's very difficult to predict what needs to be unlocked and when we can expect that kind of rapid progress. But that's why these kinds of long-term relationships are so valuable from my point of view. Because it's through those processes that we make real contributions, I think.

Takashi
I absolutely agree with your message. We start with a very simple question and a very simple model that is a kind of abstraction of real problems. But that is a very important starting point. We sometimes provide our actual problems, and in discussion with you cultivate and develop new technologies and a deeper understanding of the real problems. Getting back to the simple model again will be very challenging, but it is very important to abstract things again to think about or understand the real problems: what real humans are thinking in order to learn things.


This is a very important aspect, I think, because we cannot do everything by ourselves. On the abstract side, you and other teams are very necessary.

Aaron
Definitely. This kind of partnership is extremely important for that.

In order for reinforcement learning (RL) to take on complex, real-world problems, it must progress from simple, controlled environments to incorporate systemic generalization and manageability to deal with fluctuating conditions. What may seem a simple problem for humans cannot satisfactorily be solved by a rule-based system. Just as the computer vision system succeeded by layering features upon features, RL may be able to go beyond humans by layering complex decisions upon complex decisions. Services and systems are becoming increasingly large, complex, and sensitive to regional fluctuations due to the globalization of businesses. This will create situations where it will be difficult for humans alone to deal with intuitively, and AI-based layered-decisional solutions will become more important. RL is expected to be deployed in real-world problems to support the realization of managed-services. Collaboration on these developments between academia and industry allows for diverse viewpoints to be incorporated into RL systems, making them more robust and useful to humans for societal innovation. Future challenges include the interaction of RL and game theory; application to social innovation; improving reliability and trust of AI and RL; explainability; and improving predictive behavior.

Profiles

(As at the time of publication)

Aaron COURVILLE

Aaron COURVILLE, Ph.D.

Associate Professor,
Department of Computer Science and Operations Research,
Université de Montréal

Aaron Courville is an Associate Professor in the Department of Computer Science and Operations Research (DIRO) at the Université de Montréal. He received his PhD from the Robotics Institute, Carnegie Mellon University. He is a founding member of Mila, a community of more than 900 researchers specializing in machine learning and dedicated to scientific excellence and innovation. Aaron is also a fellow of the CIFAR program on Learning in Machines and Brains. Together with Ian Goodfellow and Yoshua Bengio, he co-wrote the seminal textbook on Deep Learning.

His current research interests focus on the development of Deep Learning models and methods. He is particularly interested in the study of systematic generalization within neural networks, but he is also interested in deep generative model and multimodal ML with applications such as computer vision and natural language processing. Aaron holds a CIFAR Canadian AI chair and his research has been supported by Microsoft Research, Hitachi, Samsung and a Google Focussed Research Award.

 WATANABE Takashi

WATANABE Takashi

Head of the Intelligent Information Research Department
Advanced Artificial Intelligence Innovation Center
Research & Development Group, Hitachi, Ltd.

Takashi Watanabe joined the Central Research Laboratory of Hitachi, Ltd. as a researcher in 1999 after completing his M.S. degree in the Interdisciplinary Graduate School of Engineering Science (Information Science), Kyushu University, Fukuoka, Japan.

His research career began with the development of smart card security systems, from hardware to software level, especially in the area of side-channel analysis. He has also published educational textbooks on the topic. Other research topics include Image recognition for visual inspection, satellite imagery analysis for agricultural and environmental decision support, holistic operational process optimization including demand forecast in logistics warehouses, industrial operation automation by unit- or swarm of robots, and business financial system innovation. These connect to current business innovation using AI and related technologies. He is also the recipient of the 2006 Young Researcher's Award from the Institute of Electronics, Information and Communication Engineers (IEICE), Japan.

Takashi is currently a member of IEEE and IEICE.