Skip to main content
When seeking to enhance the resilience of businesses that rely on the use of equipment and facilities, such as those in the manufacturing and energy sectors, managing failure risk for these assets has an important role to play. This is an area where Hitachi is already helping customers improve the efficiency of maintenance practices through the abnormality diagnostic and other techniques for equipment with the use of the IoT. This article describes a risk management solution for such assets that was developed to help customer businesses become more resilient. The solution features probabilistic lifetime models built from IoT data and a physical understanding of failure mechanisms together with a service that uses these models to predict when equipment faults will occur. When combined with insurance, this service allows customers to manage asset risks appropriately.
In the context of business, resilience can be defined as the ability of an organization to manage disruptive related risk(1). Risks to businesses cover a wide range, from geopolitical risk through to disaster risk. In business sectors that rely on the use of equipment and facilities, such as manufacturing, energy, and freight and logistics, the risks also include failure or other problems with these assets.
Condition-based and predictive maintenance are based on the use of data to assess the condition of assets and these practices have become increasingly common over recent years in tandem with advances in areas such as sensing, the Internet of Things (IoT), and artificial intelligence (AI) and other forms of data analytics. They represent one of the ways in which the risk of asset failure can be managed, being recognized as a means of minimizing unexpected failures or problems as well as avoiding excessive maintenance work.
Another way in which businesses can improve their resilience is by taking out insurance. Hitachi has devised a new concept for insurance products in which the insurance is combined with the abnormality portent diagnostic techniques for facilitating predictive maintenance(2). Development of this service has already commenced(3).
This article describes how Hitachi is seeking to boost the resilience of businesses with a newly developed data analytics technique in which the concept described above is augmented with risk management features, and a cloud-based IoT system that uses this new technique and is made available in the form of an insurance-linked failure prediction service.
The new risk management solution is provided in two stages, with lifetime-to-failure modeling and failure simulation being performed offline and failure prediction being performed online (see Figure 1).
To begin with, lifetime-to-failure modeling is performed to establish how information on equipment operation obtained from IoT data is related to the length of time the equipment or components of concern can operate before failing. The resulting model (the lifetime model) is then used to perform a medium- to long-term failure simulation that takes account of maintenance. As well as enabling appropriate risk thresholds to be established that can serve as criteria for when to perform preventive maintenance, this simulation can also estimate the benefits of adopting the failure prediction service in terms of key performance indicators (KPIs) for failure frequency and costs over specific time periods. These results can be used by customers to calculate the service’s cost-benefit and by insurers to design more fine-grained insurance products.
Figure 1 — Overview of Risk Management SolutionWith lifetime-to-failure modelling as its core technology, this solution combines failure simulation using this model with online failure prediction, also providing a high level of integration with insurance services.
The next step is to move failure prediction online. This informs the customer as to how much equipment life has already been used up and routinely updates the probability of a failure occurring within a given future timespan. This enables them to schedule preventive maintenance at an appropriate timing based not only on the equipment operating conditions, but also on factors such as when maintenance staff are able to attend. Should a failure nevertheless occur before the preventive maintenance work can be done, it also provides a quantitative assessment of how accidental it is that the failure should happen at this time. The insurance company can make use of this information in its claim assessment.
This use of the solution enables the customer to manage the risk of equipment failure appropriately based on actual operating conditions while the insurance company is able to develop new products that are predicated on being able to visualize insured risks dynamically.
Lifetime-to-failure modeling is the core technology that enables this solution. The concept behind this modeling is significantly different to that used to diagnose abnormality portent, which is based on techniques such as unsupervised machine learning. In the abnormality portent diagnosis, it tends to be based on the divergence degree from the normal condition (the degree of abnormality), indicated by condition monitoring sensors for equipment operation.
In contrast, the main form of lifetime-to-failure modeling used by Hitachi works on the basis that failures are the result of some physical mechanism. Based on the assumption from reliability engineering that failure occurs when the load (damage) imposed by operation exceeds what the equipment and facilities are inherently able to withstand, a damage model able to accurately express the lifetime-to-failure of is formulated in terms of both physical and empirical failure rules. The residual lifetime-to-failure for the equipment is predicted by estimating how much of its life has already been used up based on how much cumulative damage has already been inflicted during operation. However, many different physical phenomena can lead to failure depending on the type of equipment involved and the fault mode. Unfortunately, there are major practical constraints on the extent to which these can be addressed individually because of the need for access to the relevant design information and the involvement of people with expertise in the equipment and the types of failure being considered.
Instead, Hitachi has adopted damage-based survival analysis (DbSA), a technique to estimate damage model, for its own data-centric approach. This technique, the technical details of which have been published elsewhere(4), combines modeling techniques derived from machine learning and data analysis techniques from the field of reliability engineering to estimate a damage model from fault and maintenance records and from time-series measurement data. The output of all this is a probabilistic lifetime model (see Figure 2).
Figure 2 — Overview of DbSAMachine learning is used on time-series measurement data and failure and maintenance records to estimate lifetime model (damage-based) that minimizes lifetime variability. Use of this model improves the accuracy of lifetime prediction.
Although being a form of supervised machine learning means that the modeling technique requires a certain quantity of teaching data (failure and maintenance records), measurements acquired from condition monitoring sensors are not a prerequisite. This means that modeling can proceed even in cases where weather or other public data is all that is available. In one application where it was used for equipment lifetime-to-failure modeling at a chemical plant(4), the technique was able to reduce the variability of equipment life estimates to less than 50% that achieved by time-based prediction. In this case, plant process data was used as the time-series measurement data.
Lifetime models generated by DbSA have other uses beyond equipment life prediction. Because modeling also indicates which of the time-series measurements have the greatest effect on how much equipment life has already been used up, it can provide insights into cause of failure when it is unclear. Moreover, as DbSA uses a technique from reliability engineering that is based on the Weibull distribution, the failure mode analysis can also be performed. If, for example, the results of model building indicate that the failure mode is classified as a random failure, preventive maintenance that is based on cumulative damage or cumulative operating time is unlikely to do much good. In such cases, the technique can also be used in the revision of maintenance practices, such as lengthening routine replacement intervals.
Figure 3 — Predicted Benefits of Using Failure Simulation ServiceThe lifetime model is used to perform failure simulations that predict the frequency of failures and preventive maintenance work, with the associated costs then being calculated to determine the optimal thresholds for when to perform preventive maintenance.
The next step is to perform medium- to long-term failure simulation on the basis that the customer will utilize the failure prediction service using the generated lifetime model. Figure 3 shows how this works. As the model expresses the lifetime to failure as a probability distribution, it can be used to generate failures in a Monte Carlo* method that replicates the occurrence of failures within given timeframes on a computer. This is done to estimate the extent to which the number of failures will be reduced when preventive maintenance is conducted on the basis of a specified threshold failure probability.
By progressively estimating whether or not a failure or maintenance work will occur during each time step (at hourly intervals, for example, or any other arbitrary duration), the simulation can provide quantitative predictions for indicators such as how often a failure will occur or maintenance work be performed during the period of interest. Moreover, being computer-based means the simulation can trial a wide variety of different preventive maintenance regimes.
First of all, the relationship between the threshold used as the basis for preventive maintenance and the frequencies of failures and maintenance work can be assessed by running the simulation for a range of different threshold values. Likewise, if information about the respective costs of failures and of maintenance work is incorporated, the relationship between costs and threshold can also be assessed and the transition to the failure prediction service accomplished using the threshold that minimizes losses in terms of cost. Moreover, by using the Monte Carlo method to make progressive assessments, the simulation is able to handle complex maintenance conditions. Because the simulation can consider detailed constraints such as preventive maintenance only being able to be performed on certain dates or on particular times of day or days of the week, for example, its results can be used not only to determine the threshold to use as a basis for preventive maintenance, but also to assess different maintenance practices at the workplace. Hitachi also has plans to extend the simulation so that it can consider constraints on maintenance resources such as labor or equipment and materials.
Once it is determined that performing preventive maintenance on the basis of the lifetime model will deliver adequate benefits, the next step is to start the operation of the failure prediction service in which asset failure predictions based on IoT data are provided as an ongoing diagnosis.
This service is delivered via a cloud-based IoT system. Time-series measurement data is collected along with failure and maintenance records on a public cloud via a web-based application programming interface (API) or e-mail interface. Current and future risk indicators are calculated for equipment based on the lifetime model in the cloud and provided to users as feedback. Along with the model generated by DbSA using its data-centric approach, the system can also run other models that have been created based on user knowledge or physical considerations. Moreover, it can also be used in the fleet management of construction machinery or other vehicles and wind power generation systems, providing medium- to long-term predictions for failure rates across the entire fleet as well as risk assessments for individual items of equipment.
Maintenance work or failures that occur while using the service may be covered by insurance. To allow for this, this system includes functions for providing the insurance company with quantitative information such as the failure’s probability and the remaining life of the equipment when the failure occurs. This enables the insurance company to initiate the claims process dynamically and reduces their claims assessment workload because it means they can be confident that appropriate risk management was in place when the event occurred. For service users, this can reduce the amount of work needed to make insurance claims and expedite their processing while also minimizing variations in maintenance costs over a shorter timeframe.
This article has described a risk management solution for assets in which modeling the lifetime-to-failure of equipment is a core technology. Through the ongoing development of risk analysis techniques that consider the associated supply chains and value chains as well as individual items of equipment, and by combining them with insurance and other financial services, Hitachi intends to help enhance the resilience of customer businesses.