Getting it right: MLOps in energy and materials

| Article

Many energy and materials companies use artificial intelligence and machine learning (AI/ML) models to make difficult decisions about core processes or operations. In mining, for example, blast optimization models can help ensure safety while maximizing productivity and efficiency, and maintenance monitoring models can help predict and prevent equipment failure to ensure the safety of employees and minimize maintenance costs.

A unified approach to managing machine learning models can help companies not only control risk but also make operations more reliable, efficient, and easy to scale. MLOps establishes key practices across the application life cycle that increase productivity, speed, and reliability while reducing risk. Our research shows that when MLOps receives the right support and encompasses the entire AI life cycle—including data management, model development and deployment, and live model operations—it can dramatically enhance the performance of energy and materials companies.1Scaling AI like a tech native: The CEO’s role,” McKinsey, October 13, 2021.

Understanding risk: Quality and governance challenges

Governance practices can help reduce risk and ensure that the impact from analytical solutions is sustainable. But in our experience, many companies have limited or no AI or model management practices in place, and many lack company-wide standards for documenting, testing, implementing, and monitoring risk management (see sidebar “Differing levels of risk preparedness”).

Across industries, rigorous performance management is needed once AI systems are implemented, including frameworks for continuous monitoring.2 That said, implementing these frameworks can be challenging for companies that lack data science teams or the required digital tools. Although companies often have manual monitoring systems in place during the launch of new models, these can quickly become untenable as data sets grow in size and complexity, preventing systems from scaling and allowing warning signs to slip through the cracks.

Implementing model risk management practices

No matter how far along they are in digital implementation, companies that use AI systems grapple with a common set of risk issues.

  • Mining companies use AI to optimize decision making in haul cycle and dispatch. Input to the models includes data from the fleet management system, sensors in haul trucks, and shovels and auxiliary equipment (such as temperature and pressure).
  • Utility companies use predictive models to prevent equipment failure by monitoring the health of assets. Failure to make timely decisions based on model output can lead to higher maintenance costs and insufficient availability or use of equipment. For example, running an overheated processor could result in failures that disrupt the operation.
  • Oil and gas companies use AI and digitalization to increase water treatment capacity and optimize the volumes of water discharge. Failure to make operational decisions in time would result in noncompliance with regulations and a negative environmental impact.

With these examples in mind, companies can implement four risk management practices to help ensure sustainable impact from digital tools: taking inventory, tiering, monitoring, and implementation.

Model inventory covers the main characteristics of the models

The model inventory serves as a summary of key information on all models, such as the relevant characteristics and other critical metadata. Thus, an inventory should be in place to begin the model life cycle management process.

As an example, an energy company with more than 200 advanced monitoring models and AI tools built a repository to save all relevant information about the model. For a predictive maintenance model, the inventory will likely capture the model type (scorecard, regression, ML, or AI), platform, status, and applicable locations. In addition, model dependency will include upstream and downstream models that feed into or from the model output. And key stakeholders can be listed to trace vendor selection, developer, reviewer, user, implementer, and business sponsor.

Model inventory should be comprehensive and include all models running in production processes that are used in business decision making. For example, a mining company could rely on an optimized truck allocation model at the beginning of each shift to decide how many trucks to deploy and where. The model considers the geology of the mine, the road conditions of the day, and truck drivers’ driving patterns to optimize throughput.

Across all models there should be one document that is continually updated to account for new models and updates to existing models, such as when they are decommissioned. This could occur with a model that has proved to be ineffective, such as a road-quality model that lacks variables to capture dynamic changes at the mine. Model managers can decommission such ineffective models and build new ones that consider more relevant variables.

With these points in mind, the preliminary structure of the model inventory includes the following dimensions: model characteristics, tier, governance, and reference documentation. Each of these dimensions is then further split into subdimensions, including item description, model use, model dependency, and key stakeholders.

Model tiering provides structure to assign risk ratings

The model tier determines review frequency, depth, and escalation pathways (Exhibit 1). The primary level consists of two parts: model materiality and model risk, both of which are informed by a number of dimensions that are typically assigned low, medium, or high ratings.

Model materiality and risk indicate the severity of consequences and the probability of the model not performing as designed.

Model materiality serves as an indicator of the severity of consequences if the model does not perform as intended. It is based on the following three factors:

  • External impact assesses the impact of the model on third parties or potential reputational loss. It assesses the impact caused by the failure of a model from a critical external-party perspective.
  • Model criticality determines overall materiality, such as the size of the portfolio and the profit-and-loss (P&L) impact that the model addresses. Criticality assesses how much exposure the model is applied to or the potential loss that the model failure will lead to.
  • Model reliance checks the degree of reliance on the model output for the overall business decision. This assesses the reliance placed on the model outcomes in determining the extent to which the model outcomes are relied upon within business decision processes.

Model risk indicates the probability that the model will not perform as designed. For example, models that typically require the manual input of temperature are prone to error and are more likely to make incorrect predictions. By contrast, models that rely on AI can get automatic readings from nearby weather stations. With this in mind, model risk is assigned based on the following four factors:

  • Model input assesses the quality of the input data and whether its source was automated or manual.
  • Model design assesses the complexity of the model methodology.
  • Model implementation assesses the stability and controls of the implementation—for instance, model complexity and data validation checks along the process.
  • Model use assesses the operational reliance on the model—such as whether it’s a closed loop or requires human interaction.

Both model materiality and risk tiering can be assessed using a questionnaire that covers all factors and aggregation rules (which define how a measure is integrated in relation to one or more dimensions). For example, a questionnaire could seek to assess external impact by asking, “Are there any regulatory requirements that are applicable to the operational or business decision based on the model output?”

Model monitoring enables risk tracking with metrics, frequency, and procedures to detect model performance

The life cycle of model development includes data management, development, deployment, and live model operations (Exhibit 2).

Key practices across the application life cycle can increase productivity, speed, and reliability and reduce risk.

As part of the final steps of the life cycle, companies can regularly monitor or review model performance and business engagement to ensure they reflect business goals. With this in mind, the following topics can help companies monitor and review different models:

  • Preproduction review. Models are reviewed before they are released to production. This covers dimensions such as data quality and backtesting model performance. Those that perform below standards go through more frequent periodical reviews. The review also includes validation of model implementation and review of model tier to ensure it is set appropriately.
  • Continuous monitoring and periodic review. A verification process assesses whether the algorithm is working as intended and whether its use continues to be appropriate in the current environment.

Automated monitoring checks can capture potential issues in models and trigger root-cause analyses. In addition, trigger event thresholds can be adjusted for each model tier so that the level of monitoring matches the needs of the risk tier. Depending on the findings of the analysis, the model can then be recalibrated. Manual monitoring can also keep track of potential changes to model use, tier, and pipeline as well as reflect the changes of anything related to the model inventory.

Finally, a periodic review aims to identify potential issues before they arise in production, such as any changes to model usage that might affect model tier. Review frequency is determined by model tier and any performance issues identified during development (for instance, models that are identified as having poor performance can be reviewed more frequently).

Once the model is deployed and in use, it can be monitored and maintained. For example, after a predictive maintenance model is deployed, its input and output can be collected in real time, assumptions can be validated, and trigger warnings can let the user know when levels breach certain thresholds. Every recall of a piece of equipment to the factory for maintenance provides an opportunity to collect data and determine whether the model’s prediction was accurate. For instance, the model might say the tires of a vehicle have two months of use left and should be replaced to ensure safety. The accuracy of these predictions can then be entered into the machine learning tool to refresh for accuracy and set the schedule for the next maintenance check. Finally, recalibration or training might be needed to address any issues found in model reviews.

  • Based on monitoring results, business leaders can set up continuous monitoring and take four potential actions. The default set of monitors and triggers for all models would likely cover the following elements: model output, data output, user feedback, model use, and regulatory. The resulting actions include sending notifications to take the model out from production, initiating immediate root-cause analysis, sending warning notifications with conservative thresholds, and sending warning notifications with more relaxed thresholds.
  • Continuous-monitoring metrics are the same for all models, and the actions to be taken differ across materiality tiers. These metrics typically fall into two categories: automatic and manual. The former can track things such as output anomalies, missing data, and model acceptance rate, and the latter can help ensure that the model is used as intended and that regulatory changes are reflected in the model input, output, and assumptions.
  • Root-cause analysis aims to understand issues identified during continuous monitoring or periodic review and find ways to mitigate them. Product owners are responsible for sharing consolidated and prioritized issues with model developers and for providing an expected time frame for each issue. Model developers are responsible for performing the root-cause analysis to understand the underlying reason for identified issues (see sidebar “Analysis outcomes”).

Model implementation enables ongoing monitoring

Once the model moves to the production environment, companies can adopt new tools and structures to support its use and ensure that results are accurate and up to date. Depending on the process, these tools can have complex interactions (Exhibit 3).

From data management to live operations, monitoring tools have complex interactions.

Monitoring processes—such as development and deployment, root-cause analysis, and even automated monitoring—require people to oversee them. These roles include a data science team and a data engineering team, as well as product owners, model inventory and tiering owners, and users. Current data science and engineering team members without a background in machine learning for operations can be upskilled to learn relevant practices.

Taking the first steps against risk, while getting the full potential from ML/AI models

Companies looking to implement model risk management can take a number of actions to improve practices and prevent risk.

They can start by cataloging, tiering, and monitoring their existing models to gain a better understanding of their technological strengths and weaknesses and their current risks.

Companies that already rely on advanced analytics to make product decisions can also upskill data science and data engineering teams to monitor models at scale using a unified set of tools.

More broadly, companies should also conduct continuous change management with users so that model managers can recognize early-warning signs of productivity degradation and other risks and then use model outputs as intended. Companies need to build awareness of risk across the organization so that more people feel empowered to notice and report instances in which the model has gone awry.3Rewired: The McKinsey guide to outcompeting in the age of digital and AI, Hoboken, NJ: John Wiley & Sons, 2023. Continuous change management is increasingly important in the era of generative AI. The increasing complexity of the tools and models at companies’ disposal heightens the importance of responsibly managing them.


For highly technical industries that rely on complex data sets and highly optimized processes, correct predictions can mean the difference between smooth performance and complete shutdown. Understanding and managing model risk is a critical element in overall operations—and it will only become more important as the technologies continue to evolve.

Explore a career with us