11 May 2021

It’s time for businesses to chart a course for reinforcement learning

In this article, Jacomo Corbo, Oliver Fleming and Nicolas Hohn from QuantumBlack, a McKinsey company, share how an advanced artificial intelligence technique is quickly becoming accessible to organizations as a tool for speeding innovation and solving complex business problems as part of techUK's AI Week. #AIWeek2021

Leaders looking for new ways artificial intelligence (AI) can provide a competitive edge may have found the 2021 America’s Cup Match as exciting for one team’s groundbreaking use of reinforcement learning as for its radical boat designs and close races.

To remain competitive, sailing teams in the America’s Cup contest, like all businesses, must push the boundaries of what is possible. They also face similar constraints, including a steep development curve and a small window of opportunity, meaning teams can pursue only one or two big experiments to up their performance in the sport’s most important competition.

For the 2021 edition of the America’s Cup, reigning champion Emirates Team New Zealand ventured that reinforcement learning, an advanced AI technique, could optimize its design process. The technique delivered, enabling the team to test exponentially more boat designs and achieve a performance advantage that helped it secure its fourth Cup victory

Unlike other types of machine learning, reinforcement learning uses algorithms (which often train AI agents or bots) that typically do not rely only on historical data sets, either labeled or unlabeled, to learn to make a prediction or perform a task. They learn as humans often do, through trial and error. In the last few years, the technology has matured in ways that make it highly scalable and able to optimize decision making in complex and dynamic environments.

Besides accelerating and improving design, reinforcement learning is increasingly being incorporated into a broad range of complex applications: recommending products in systems where customer behaviors and preferences change rapidly; time-series forecasting in highly dynamic conditions; solving complex logistics problems that combine packing, routing, and scheduling; and even accelerating clinical trials and impact analysis of economic and health policies on consumers and patients.

We have seen how quickly the technological environment can shift. Only a few years ago, another AI technique, deep learning, vaulted onto the business scene. Today, 30 percent of high-tech and telecom companies and 16 percent of companies in other industries we surveyed have embedded deep-learning capabilities.

Executives who today understand the potential of reinforcement learning will, like Emirates Team New Zealand, be better positioned to find the edge in their industries (see sidebar “Notable examples of reinforcement learning applications”). Understanding the team’s experience can help leaders gauge where and when to use the technology because many organizations will travel a similar path: implementing more traditional technologies first to solve a problem and then applying reinforcement learning to ascend to a previously unattainable tier of performance. Thus, we begin by recounting Emirates Team New Zealand’s journey, after which we offer ideas for where and how businesses should consider applying reinforcement learning.

Emirates Team New Zealand’s journey to a 2021 victory

Emirates Team New Zealand designers were not new to advanced technologies. In 2010, the team had built its state-of-the-art digital simulator to test boat designs without physically building them. This was a key to the team’s 2017 America’s Cup win, but the simulator had limitations. Multiple sailors were needed to operate it optimally, which was a significant logistical challenge given the sailors’ scheduled practices, travel, and competitions. As a result, designers typically iterated on new designs in the absence of simulator performance data and then tested their best ideas in batches when they could carve out large blocks of time with the sailors. Moreover, the sailors’ performance could vary between tests, as human performance often does, making it difficult for designers to know whether a marginal improvement in boat response was due to a design tweak or to variances in human testing.

As Emirates Team New Zealand prepared for the 2021 match, they knew if they could get an AI system to run the simulator, it would free the designers to test more design ideas faster and more consistently than they could with the digital simulator alone. The team was unsure at the outset if the idea was feasible, but as conversations about the technology swirled, team members agreed: the potential payoff was transformative and made trying worthwhile. Using reinforcement learning, experts from Emirates Team New Zealand, McKinsey, and QuantumBlack (a McKinsey company) successfully trained an AI agent to sail the boat in the simulator (see sidebar “Teaching an AI agent to sail” for details on how they did it).

While design rules for the America’s Cup specify most components of the boat, they leave enough freedom for designers to make radical choices on some key elements such as hydrofoils. These wing-like structures attach to the hull and lift the boat above the water, enabling the vessel to reach speeds of over 50 knots (60 miles or 100 kilometers per hour). Hydrofoils can be a significant factor in the race, but race rules allowed teams to build only six full-size hydrofoils in all.

Using the reinforcement learning–trained agent to control the simulator, Emirates Team New Zealand designers could evaluate thousands of hydrofoil design concepts instead of just hundreds in their quest for a winning design. This gave them valuable insight into how a boat might perform on the water before engaging in a costly build and, in the process, would dramatically reduce the design price tag for future races. In addition, as the Emirates Team New Zealand agents’ knowledge of sailing increased over time, the sailors began learning maneuvers from the agents that they had not considered, enabling them to improve their performance for a given design.

Where businesses can use reinforcement learning

The heart of Emirates Team New Zealand’s challenge was to solve a complex business problem in a dynamic environment where the variables change in unpredictable ways, the ideal end state is only loosely defined, and the only way the system could learn about its environment was to interact with it.

That situation is analogous to problems facing retailers, manufacturers, utilities, and companies in many other industries. For example, whereas once retailers could reasonably expect that past consumer behaviors would indicate future preferences, they now operate in a world where consumer purchase patterns and preferences evolve rapidly—all the more so as the COVID-19 pandemic repeatedly redefines life. Manufacturers and consumer-packaged-goods companies are under pressure to build dynamic supply chains that account for climate, political, and societal shifts anywhere in the world at a moment’s notice.

Each of these challenges represents a complex and highly dynamic optimization problem, which, with the right data and feedback loops, is well suited for solving with reinforcement learning.

The appeal of reinforcement learning for problems with many possible actions and paths is that the AI agent does not need to be explicitly programmed. Because it learns from examples and teaches itself through trial and error, it can propose novel and adaptive solutions, oftentimes faster than humans could do so.

You can read all insights from techUK's AI Week here

Return to listing