Generalized Stochastic Petri Nets for Reliability Analysis of Lube Oil System with Common-Cause Failures

A very high level of availab ility is crucial to the economic operation of modern power plants, in view of the huge expenditure associated with their failures. Th is paper deals with the availab ility analysis of a Lube oil system used in a combined cycle power plant. The system is modeled as a Generalized Stochastic Petri Net (GSPN) taking into consideration of partial failures of their subsystems and common-cause failures; analyzed using Monte Carlo Simulation approach. The major benefit of GSPN approach is hardware, software and human behavior can be modeled using the same language and hence more suitable to model complex system like power p lants. The superiority of this approach over others such as network, fault tree and Markov analysis are outlined. The numerical estimates of availability, failure criticality index o f various subsystems, components causing unavailability of lube oil system are brought out. The proposed GSPN is a promising tool that can be conveniently used to model and analyze any complex systems.


Introduction
Modern process plants must be operated at high levels of availability in view of the huge cost of their installation, operation and maintenance. In this context, a reliability study should not only give an estimate of its availability, but also propose a means of discovering potential comb inations of events which might result in catastrophic failures and evalu ating the probabilit ies of their occurrence. The assessment procedure should be able to evaluate other performance measures and include cost-related aspects. Some of the important modeling approaches in reliab ility analysis are Network models, Fault Tree and Event Tree analysis (FTA and ETA), State-transition diagram and Petri Nets (PNs).
Network models are function-oriented. These models can tackle structural failures which lo wer the system performan ce. It is almost impossible to incorporate maintenance actions, software and human error and other cost-related aspects in network models.
Fault trees are event-oriented. The repair actions and the dependence between components cannot be easilyincorpora ted in the model. Standby redundancies, time-delay conditio ns and other dynamic behavior cannot be easily modeled using fault trees, since they are static in nature.
The biggest drawback of Markov models is the explosion of state space. Though it is possible to capture the dynamic behavior and dependence among components in this formulat ion, state-space explosion limits its usage. When formulat ing a Markov model o f a co mp lex system, it is difficult to ensure that all the possible comb inations of events in a subsystem have been considered. Moreover, it is very difficult to use state-transition diagrams for model validation.
Out of network, FTA and Markov models, only FTA are widely used for safety and reliability studies of complex system since 1960s. The comp lete review of literature pertaining to FTA is provided in the reference [20]. However, FTA modeling approach is not useful for systems where components have interdependencies. The real-world systems will not comply to these requirements. Hence, there is a need for better modeling technique which can take care of real world co mp lexit ies such as dependencies amongcomponent s, modeling of repair actions, modeling of software and human related failures and events. Generalized Stochastic Petri Nets (GSPNs) are well suitable and could take care of these complexit ies in their modeling and gaining acceptance fro m research to industrial applications [21].
In this study, we emp loy Generalized Stochastic Petri Net, a graphical and mathematical modeling tool is used for studying a comp lex system, wh ich is concurrent, asynchron ous, distributed, parallel and nondetermin istic. The use of Petri Nets for reliability analysis simplifies the task of the modeler considerably. It involves drawing a net representing a model of the system and marking it with the corresponding firing times of the transitions. If algorith ms to construct the set of all reachable markings of a PN were availab le and if tools to automate the process of finding the probability of the markings could be built, then the analyst can concentrate more on reliab ility issues instead of writing and solving the equations for the underlying stochastic process. A systems approach is possible with PNs since hardware, software and human behavior can be modeled using the same language. It is also possible to incorporate safety and fault tolerance requirements.

Petri Nets
As per [1], [2], [3] and [4], Petri Nets have, over the last four decades, attracted the attention of researchers in several areas ranging fro m co mputer science to social sciences. PN can be introduced either algebraically or graphically. They are defined algebraically in terms of the following elements.
A A standard PN consists of a set of "places" P drawn as circles, a set of "transitions" T drawn as bars and a set of directed arcs A. An arc connects a transition to a place or a place to a transition. Place may contain "tok ens", which are shown as dots. The "marking" or the state of a PN is defined by the number of tokens contained in each place and is denoted by M. The construction of a PN model requires the specification of the "initial marking" M 0 .
A place is called an "input place" to a transition if an arc e xists from it to the transition. A place is an "output place" if an arc exists fro m a transition to the place. A transition is said to be "enabled" when all its input places contain at least one token. If the enabled transition is "fired", it removes one token fro m each input place and deposits one token in each output place. The firing of a transition modifies the distribution of tokens in places and thus produces a new marking for the PN.
For a g iven in itial ma rking M 0 , the "reachability set" S is defined as the set of all markings that can be reached fro m M 0 by a sequence of transition firings. As per reference [8] and [9], in a Stochastic Petri Net (SPN), the firing time is an exponentially distributed random variab le. Thus the marking sequence in a SPN obtained fro m the firings, is isomorphic to a continuous time Markov Chain. As per [7], in a Generalized Stochastic Petri Net (GSPN), the t ransition firing rates can be instantaneous or random firing time based on some distribution. Therefore the set of transitions can be partition ed into a set of random timed transitions (with fin ite firing rate) and a set of immediate transitions. However, for any marking at which there are several enabled immediate transitions, a probability distribution must be specified, according to which firing of the transitions are selected.

System Overview
The lubrication requirement for the comb ined cycle power plant is provided by a single lubricating oil system. A separate, enclosed, forced-feed lubrication module provides the lubricating and hydraulic oil requirements for the turbine power plant. This lubricat ion module, co mp lete with tank, pumps, coolers, filters, valves and various control and protection devices, supplies oil to the gas turbine, steam turbine and generator bearings and accessory equipment. This oil absorbs the heat rejection fro m the bearings and shaft seal oil system. A portion of the pressurized flu id is diverted and filtered again for use as lift o il. The system is having more than 36 co mponents. The system has to operate during start-up, normal operation, normal shut down and emergency shutdowns.
The following are the sma ller subsystems associated with the lube oil system.
1. Lube oil tank assembly 2. Lube oil pump system 3. Lube oil cooler and filter assembly 4. M ist eliminator 5. Lift oil assembly 6. Lube oil clearance control The construction of functional block diagram for the Lube oil subsystems as in Figure 1 is the first step towards its availability analysis. First, the components that can cause unavailability of each subsystem are identified. The reliability data for these components are taken from published sources and from the in-house records of the plant. Each component of the subsystem is considered to be in one of two states: good or complete failure. The redundancies are taken into consideration in calculating subsystem reliability parameters such as MTBF and MTTR. The failure of a co mponent may cause system failure depending upon the functional configuration of the system. A co mmon-cause failu re may also occur due to deficiency in equip ment design, operation and/or maintenan ce error and/or an external catastrophe.

Literature Review
The literature survey has revealed that Petri Net was considered as a powerfu l modeling tool and finds many applicat ions in flexib le manufacturing systems,communicat ion protocols, co mputer hardware and software system. Reference [19] used Timed Petri Nets in modeling and analysis techniques to safety-critical real-t ime systems. These procedures allow safety, recoverability and fault tolerance. A hierarchical model for system reliability, maintainability and availability using GSPNs was proposed by [10]. Reference [11] proposed reliab ility models using timed Petri nets for a variety of fault-tolerant software, System with Common-Cause Failures including mechanisms such as recovery blocks. The availability analysis of the core veneer manufacturing system in a plywood manufacturing system was performed by [16]. Reference [17] evaluated reliab ility parameters of a butter manufacturing system in a diary p lant considering constant failure rates of various components. Semi-Markov processes and regenerative point technique are used to analyze three-unit standby system of water pu mps in which two units are operative simu ltaneously and the third one is cold standby for an ash handling plant. The reliability and availability assessment of pod propulsion system using FMEA, FTA and Markov analysis is carried out by [18]. Reference [5] has analyzed pulping system using Petri Nets. The modeling and performance evaluation of thermal power plant using Markov approach is provided in reference [6].
Most of the models discussed in literature for estimat ing the availability and other reliability measures are based on the Markov approach and very few literatures are availab le for co mple x systems using Petri Nets. Reference [13] proposed a methodology based on Petri nets to evaluate the reliability parameters of a screening system in paper industry using GSPNs. The effects of failures and courses of action on the system perfo rmance have also been investigated.
This paper deals with the availability analysis of a Lube oil system used in a comb ined cycle power plant. The system is modeled as a Generalized Stochastic Petri Net (GSPN). The partial failures of the subsystems and common-cause failures are taken into consideration in the modeling and analysis and hence this research is more close to reality in modeling and analysis aspects.

GSPN Specification
The failure mechanis m and repair process model of the lube oil system is given in Figure 2. The init ial marking of the net contains tokens in the p laces P 0 to P 5 and P 19 . This indicates that subsystems 0 to 5 are working initially. The token in the p lace P 19 indicates that the system is working normally. Tokens in the places P 0 and P 19 may enable the transition t 0 , which corresponds to the partial failure of the subsystem 0. If the transition t 0 is fired, then it removes a token each fro m p laces P 0 and P 19 and deposits a token each in the p laces P 6 and P 17 . The token in the place P 6 indicates the component 0 is in the partial failure mode and the one in the place P 17 indicates the system is in partial failed state. The token at P 6 can enable the transitions t 7, t 8 or t 9. The transition t 7 corresponds to the repair complet ion of the partial failed subsystem 0, whereas t 8 corresponds to the complete failu re of co mponent 0. If the transition t 7 fires then it removes a token each fro m the places P 6 and P 17 and deposits a token each in the places P 0 and P 19 . This means that the component 0 is repaired and the system starts working normally. Suppose if the transition t 8 fires then it removes a token each fro m the places P 6 and P 17 and deposits a token each in the places P 12 and P 18 . The presence of token in these places can enable the transition t 17. The repair action of the complete failure of the subsystem 0 is described by t 17. If the transition t 17 fires then it removes a token each fro m the places P 12 and P 18 and deposits a token each in the places P 0 and P 19 . This means subsystem 0 is alright and the system is working normally. The co mmon-cause failure of components 0 and 1 is described by the transition t 1. If the transition t 1 is fired, then it removes a token each from p laces P 0 , P 1 and P 19 deposit a token each in the places P 14 and P 18 . The common-cause repair action is depicted by the transition t 19. The failure and repair act ions for the other subsystems are represented in a similar manner.
In this model the presence of a token in the p lace P 19 indicates that the system is in good state. Its complete failu re is indicated by the presence of a token in the place P 18 and the partial failure of the system is indicated by the availability of the token in place P 17. If, T o -is the mean t ime of a token is available in the places P 19. T r -is the mean time of a token is available in the places P 17 and T f -is the mean time of a token is available in the places P 18 Then, the availability of the Lube oil system is given by, Here, T o is equivalent to the MTBF of the Lube oil system and (T r + T f ) is equivalent to its MTTR of the Lube oil system.

Generation of Reachability Tree
The first step in the analysis of PNs is the generation of the reachability tree. This is a set of markings that are possible fro m the initial marking. The nodes of the reachability tree represent the markings of the net, the root representing the initial marking. The d irected edge fro m one marking to another indicates the firing of the corresponding transition. The analysis of the reachability tree will generate a lot of informat ion about the system and a close examination enables verification of PN as a valid representation of the system being modeled. Thus, it is used for checking whether the model is a good representation of the system. The reachability tree is generated as follows.
Beginning with the in itial marking, transitions which are enabled by this marking are identified and new markings that result fro m the firing of each of the enabled transitions are generated. Each new marking is added to the tree and the directed edges from the markings are drawn. The algorithm for generating the reachability tree is given below. The set of reachable markings along with its arc sets and reachability graph generated using the algorithm fo r the lube oil system are provided in Table 1

GSPN Simulation
At the beginning of the simulat ion run, the algorith m identifies all the enabled transitions from the in itial marking. The firing time fo r each transition is determined by sampling fro m exponentially d istributed firing intervals. The minimu m firing time is selected and the corresponding transition is fired. The system moves to the next marking. The state of the system (good or complete failure) is ascertained. Failed subsystem, if any, will undergo repair. After repair the subsystem is as good as new. These events are simulated for thirty years. In order to reduce the standard deviation of the estimates of system down time and up time, a Variance Reduction Technique (VRT), viz., antithetic variate is used. The simu lation is rep licated a sufficient number o f t imes to achieve convergence of results. The reliability data used in the simulat ion experimentation is given in the Table 3. The entire program is written using GPSS/H. The algorith m for the simulat ion is given below: marking = initial marking for j = 1 to t do firing_time(j) = -1 while (simulation run not ended) do for j = 1 to t do if transition j is enabled, then if firing_time(j) < 0 then generate firing_interval firing_time(j) = clock + firing_interval endif else (if not enabled) firing_time(j) = -1 endif endfor find minimum firing_time(t) fire transition t reset firing_time(t) = -1 endwhile

Results and Discussion
The results concerned with system down time, obtained fro m the simulation experiments are given in the Table 4.
The first colu mn is the replication number. The second column corresponds to simulat ion results using thetic random numbers and third column corresponds to simu lation results using antithetic random nu mber. The average value given in the 4 th colu mn is finally considered as the simu lation result of replication 1. Like this 30 replications are carried out to get steady state. The system availability graph is provided in the Figure 4. The system availability was found to be very high as 0.998825. It is estimated that 28.8 failures in 30 years. System downing events are calculated for various subsystems and the failure criticality indexes are assessed. These results are given in the Table 5 and Figure 5. We can now use the GSPN model to study the effects of the various component failure rates on the availability of the system. PCVs, DCV (co mponents in Oil cooler and filter assembly) and pressure loss in the piping system were found to be major reasons for unavailability. The failure modes of PCVs are fails open and fails close. The majo r failu re modes for DCV are struck and fail to seal. A close monitoring and maintenance actions are required to minimize these failures. The proposed GSPN model has been successfully used for the estimat ion of the availab ility of the system. Any changes in the system configuration such as redundancy or replacement of a co mponent by a more reliable one can easily be incorporated into the model and their effects analyzed. It is also possible to analyze the system when different maintenance strategies and repair policies are adopted.

Qualitative Comparison of Various Modeling Methods Used in Availability Studies
Modeling is the p rocess of constructing a representation of a real-world system, reflecting its properties to the desired degree of detail. The model may be physical or abstract. Physical models are largely useful for purposes of teaching or training. Abstract models are useful in design, implementation and operations. These models bridge the gap between the real system and theoretical analysis. A number of modeling approaches such as network, fault t ree, Markov and Petri Nets have been developed for the computation of reliability characteristics of co mp lex technical systems. These models are either structure-oriented or event-oriented.
The structure-oriented models allow us to tackle structural failures that cause undesirable deviation fro m the expected performance. Network models are the best examp les for this category. Event-oriented ones can, not only model hardware failures but also model undesirable situations that may develop due to error in software, operation or maintenance. The nature of the problem, the objectives and the size p lay a vital ro le in selecting a model. This study has been devoted to the estimation of reliability/availab ility of co mple x systems. Model, suitable for real-world co mple x problems, have been proposed. Despite a lot of earlier work in this field, there is a scarcity of methods to tackle a co mple x p roblem with all hardware and software failures, hu man errors and other dynamic features such as standby redundancies, repair actions and operator corrective actions. It is very d ifficult to acco mmodate repair actions and dynamic features into the network models. For complex systems, fault tree is used in the safety analysis for chemical / nuclear industry, is chosen as the tool for analysis. It is very d ifficult to include repair actions in the fault tree representation. The need for an analytical model in this context led to the Markovian approach. Markov models are capable of including all the real-world co mplexit ies, but the state space exp losion limits its usage. Petri Net, a mathematical modeling tool, is adequate for the development of methodologies for prediction and evaluation of RMA of the system. GSPNs are used to find the availability of the lube oil system. This is an effective modeling tool which has immense potential for reliability studies. Using this, one can satisfy or at least try to satisfy all the reliability requirements.

Summary
The use of PNs for modeling co mple x systems for the purpose of availab ility assessment is demonstrated. The superiority of the GSPN over other approaches such as FTA and Markov models is brought out. The numerical estimates of the availability of the Lube oil system are obtained by simu lating the GSPN. In th is study the partial failure of subsystems and common-cause failures and repair actions are modeled using GSPN and analyzed. Ho wever, the modeling has the capability to incorporate software and human related failures and events. Thus, the proposed model can be conveniently used for modeling, analy zing and evaluating any complex stochastic systems.