Reliability Engineering

System performance analysis often demands more than traditional reliability block diagramming (RBD) tools can juggle. It’s critical to take into account the relationship between components. How does the reset of one component affect the whole system? How can you identify potential failures? How can you accurately predict and manage the risks around assets that could fail and cause unnecessary and expensive downtime? How can you resolve uncertainty in multi-stage manufacturing systems?

The Reliability module in ExtendSim Pro is the missing link bridging reliability block diagrams with the pinpoint accuracy of simulation to mimic the behavior of systems using dynamic reliability modeling. Maintenance reliability professionals, asset managers, and predictive maintenance teams are turning to simulating RBDs in ExtendSim to help manage their asset reliability program, reduce rate failures, optimize alternate flow paths, deal with intermediate product storage, and improve the reliability of plant assets.

Procter & Gamble


Procter & Gamble (P&G) has used ExtendSim to "model all product lines - from soap to nuts" for design of equipment and production lines, scheduling, commerce, quality, etc. The models they build are used to interface with engineers who are not necessarily simulation experts, but can use it for analysis and design. Ultimately becoming simulation experts while using it because of ExtendSim's design.

Military aircraft

Parts Failure

A consultant for a branch of the US Military is using ExtendSim to build inventory and reliability models of the frequency of failure of parts on military aircraft. Once a part has failed, how does it get replaced and will the replacement part be in inventory or not? And if the aircraft is out of service, will there be another one to replace it?

Smart Grid

Smart Grid

Deployment of digital smart grid sensing, communication, and control technologies that improve electric grid security, reliability, and efficiency is growing exponentially. ExtendSim is being used to dynamically monitor grid operations - identifying appropriate security controls based on parameters and constraints then simulating mission assurance indicators before and after defense actuation to gauge effectiveness.

Dow Chemical

Failure Impact

Dow Chemical Company performs reliability modeling in ExtendSim to identify and understand the impact of different failures on overall production capabilities in chemical plants. The model is used for understanding the key equipment components that contribute towards maximum production loss and for analyzing the impact of change policies, such as the installation of new equipment or an increased stock level for failure-prone components. A Failure Summary Report provides information for further phases of the analysis.

Simulating Reliability-Based Systems

  • Accurately determine the MTTR and MTBR of systems with shutdown.
  • Model the relationship between components and how the behavior of one component affects the other.
  • Identify and document areas of potential failures.
  • Quantifiably report changes in performance given changes in design, capacity, operations, maintenance, or logistics.
  • Rate failures in terms of likelihood and consequence.
  • Maintain assets in a safe, efficient manner.
  • Consider system up and down times.
  • Measure and maintain system repeatability.
  • Dynamic reliability modeling.
  • Predict future performance.
  • Complex manufacturing production lines.
  • Simulate intricate communication networks.

Case Studies

Case Studies

System Safety Analysis


In the first pilot project of its kind, DNV GL found all possible root causes for critical system failures directly from design documentation. They made a list of cut sets from an ExtendSim model of the signal flow where fault tree is the result, not the input. This project resulted in a more reliable while being less expensive safety analysis of the system.


Operational Reliability Assessment of a Remotely-controlled Siphon System for Draining Shallow Storage Ponds

Linlong Bian, Vivek Verma, Aditia Rojiali, Sumit R. Zanje, Dongukan Ozecik, & Arturo S. Leon - Florida International University

World Environmental and Water Resources Congress 2021

July 2021

Siphon flow can only start or stop when all the components work in a certain sequence. If any component of the siphon system fails to perform its function, it can lead to the discharge of water from the water storage units potentially causing a lack of abundant storage capacity for flood mitigation or insufficient water volume to maintain the aquatic life. Either scenario could threaten the safety of human life and property and/or damage the ecological environment. As most of the components are exposed to the natural environment, they face hazards from potential unknown factors (such as short circuits, animal and insect bites, and chemical corrosion, etc.) which could cause the malfunction of the siphon components. Therefore, it is significant to understand the operational reliability of the siphon within the life cycle.

The ExtendSim Reliability module simplified the modeling of the maintenance process for this complex system. An optimized architecture of a siphon system was modeled providing accurate results of the system's component reliability, the mean time to failure, and the operational reliability of both non-repairable and repairable systems. Plus, ExtendSim allowed over 10,000 simulation runs to be completed in a short amount of time.

Bian Siphon Model
Bane Nor

Reliability Modelling of ERTMs/ETCS

Raja Gopal Kalvakunta

MSc. Reliability Availability Maintainability and Safety (RAMS), Norwegian University of Science & Technology

June 2017

The European railway industry is continuously advancing and in recent years, they have adopted a new system called European Railway Traffic Management System/ European Train Control System (ERTMS/ETCS) for the interoperability of railways among different European nations. Currently, this has been used more extensively for transportation by commuters and for freight. The foremost quality of such transportation system is to operate in a reliable manner and maintain punctuality. In this context, Bane Nor (Norwegian National Rail Administration) is planning to convert the entire conventional signalling system to ERTMS signaling system, as a part of their ERTMS National Implementation project.

ERTMS/ETCS is a complex infrastructure of various systems on trackside, lineside and train onboard and these systems have different sub systems comprising of software, hardware, network and signalling components. Due to its complexity, determining the failures and resolving them is challenging. An existing line operated on ERTMS is taken as case study from Bane NOR for developing a reliability model.

Primarily a reliability block diagram method is used to model the Østfoldbanen Østre Linje (ØØL) ERTMS pilot line as a case study in ExtendSim's Reliability module incorporating a combination of single station and bidirectional (BiDi) sections, then conducting 1000 simulations to assess ØØL ERTMS infrastructure. It is estimated from the results that this model has the potential to determine the performance of the infrastructure, and it is deduced that predominant infrastructure failures that cause delays are due to partial interlocking fail, maintenance and track fracture, followed by failure of balise, axle counters and points.

Kalvakunta model
University of Oslo

Safety Instrumented Systems Operated in the Intermediate Demand Mode

Kristine Tveit

University of Oslo - Master of Science in Modeling and Data Science

December 2015

The frequency of demands are crucial when analyzing a safety instrumented system (SIS). IEC 61508 distinguishes between low and high demand mode when calculating risk for such a system. In reality there are systems that can not clearly be placed in one of the two modes. These types of systems are called intermediate demand mode systems, which are analyzed in this thesis. Not many published SIS reliability studies focus on the problems related to this borderline. Oliveira predicts somewhat strange behavior for the hazard rate in the intermediate demand mode, as well as with a focus on the demand duration.

The results from the analyses of a redundant system show that the standard Probability of Failure on Demand (PFD) formulae are usable for very low demand rates, but become increasingly more conservative as one moves into the intermediate mode, while the Probability of Failure per Hour (PFH) is non-conservative. This can cause major consequences for the operator of a safety system in the sense of not obtaining the optimal testing strategy, or even worse encounter a hazard.

For more complex systems with several components the Markov approach has its limits, choice of distributions and maintenance details are also restricted. Discrete Event simulation can deal with such complex systems, and also the rare event problem that often is a challenge for safety system analysis can be handled satisfactorily.

By use of Harel Statechart and discrete event Monte Carlo simulations for different safety systems, it is shown that the intermediate demand mode is dependent on the relationship between the proof-tests, demands and repair duration. When a demand rate increases to a significant level, demands can be used as tests. With Harel Statecharts we can calculate realistic models that go beyond what a Markov model is capable of.

Tveit model
Winter Simulation Conference

Reliability Modeling with ExtendSim

David Krahl & Anthony Nastasi, Imagine That Inc.

Winter Simulation Conference 2014

This paper discusses general reliability concepts and introduces the basis upon which the Reliability module in ExtendSim was developed.

University of Calgary

Studying the Impact of Uncertainty in Operational Release Planning

An Integrated Method and its Initial Evaluation

Ahmed Al-Emran, Puneet Kapur, Dietmar Pfahl, & Guenther Ruhe

Information and Software Technology, April 2010

International Conference on Software Process (ICSP) 2010

Uncertainty is an unavoidable issue in software engineering and an important area of investigation. This paper studies the impact of uncertainty on total duration (i.e., make-span) for implementing all features in operational release planning, including:

  • the number of new features arriving during release construction
  • the estimated effort needed to implement features
  • the availability of developers
  • the productivity of developers.

An integrated method is presented combining Monte-Carlo simulation (to model uncertainty in the operational release planning (ORP) process) with process simulation (to model the ORP process steps and their dependencies as well as an associated optimization heuristic representing an organization-specific staffing policy for make-span minimization). The method allows for evaluating the impact of uncertainty on make-span. The impact of uncertainty factors both in isolation and in combination are studied in three different pessimism levels through comparison with a baseline plan. Initial evaluation of the method is done by an explorative case study at Chartwell Technology Inc. to demonstrate its applicability and its usefulness.

Results. The impact of uncertainty on release make-span increases – both in terms of magnitude and variance – with an increase of pessimism level as well as with an increase of the number of uncertainty factors. Among the four uncertainty factors, we found that the strongest impact stems from the number of new features arriving during release construction. We have also demonstrated that for any combination of uncertainty factors their combined (i.e., simultaneous) impact is bigger than the addition of their individual impacts.

The added value of the presented method is that managers are able to study the impact of uncertainty on existing (i.e., baseline) operational release plans pro-actively.

Al-Emran model
Procter & Gamble

Improved Manufacturing Processes Save Company One Billion Dollars

October 12, 2011

Procter & Gamble partnered with the Energy Department's Los Alamos National Laboratory (LANL) in the 1990s. LANL scientists helped P&G engineers develop simulations to improve the reliability of P&G's complex production lines. P&G's 150 facilities worldwide saw a 44 percent increase in plant productivity and 30 percent increase in equipment reliability since they started using the software.

The pairing of the lab and corporations' data led to the creation of simulation software called Reliability Technology in 1993. With the software, engineers could configure both the machines and their maintenance schedules based on reliability. In addition, engineers could foresee and possibly avoids product jams, intervals of a component breakage or variations in a machine speeds. In other cases, engineers could triage the production line. Large-scale implementation of the technology helped save P&G $1 billion in manufacturing costs, according to Procter & Gamble. These cost-saving benefits are applicable towards production lines across the manufacturing sector.



Play Video

Modeling Reliability with ExtendSim

ExtendSim SimCast describing the different methods used to model reliability in ExtendSim. It features examples that use both blocks and items to represent failures in the system or process. This SimCast includes a first look at the Reliability module in ExtendSim Pro while it was still early in its development stage.