Improving the Identification Performance of an Industrial Process Using Multiple Neural Networks

Modelling or identification of industrial plants is the first and most crucial step in their implementation process. Artificial neural networks (ANNs) as a powerful tool for modelling have been offered in recent years. Industrial processes are often so complicated that using a single neural network (SNN) is not optimal. SNNs in dealing with complex processes do not perform as required. For example the process models with this method are not accurate enough or the dynamic characteristics of the system are not adequately represented. SNNs are generally non-robust and they are sometimes over fitted. So in this paper, we use multiple neural networks (MNNs) for modelling. Bagging and boosting are two meth-ods employed to construct MNNs. Here, we concentrate on the use of these two methods in modelling a continuous stirred tank reactor (CSTR) and compare the results against the SNN model. Simulation results show that the use of MNNs improves the model performance.


Introduction
In recent decades, artificial neural networks (ANNs) have been extensively used in numerous applications. One of the important applications of ANNs is finding patterns or tendencies in data. ANNs are well suited for predict ion and forecasting requirements such as: sale forecasting, industrial p rocess control, oil and gas industry [1][2][3], hand-written word recognition, target marketing, pharmaceutical industry [4], etc. In this paper, we use ANNs to identify an industrial process. One problem with ANNs is their instability. It means that small changes in the training data used to construct the model may result in a very dissimilar model [4]. Also due to the high variance of SNNs, the model may exh ibit quite a different accuracy facing unseen data (validation stage) [4]. Furthermore, in nu merous cases, a SNN lacks precision. Breiman [5] has shown that for unstable predictors, combining the outputs of a number of models will reduce variance and give more precise predictions.
However, it is required that the indiv idual neural networks in aggregation should be dissimilar and there is no advantage in aggregation of the networks if they are all identical [4]. The purpose of th is paper is to identify an industrial plant. Because of the stated reasons, we will use MNNs o r ensemb le neu ral n etwo rks fo r ident ificat ion . Thereare several d ifferent ensemb le techn iques, but the most popular ones include some elaboration of bagging [6][7][8][9], and boosting [10][11][12][13][14][15][16][17][18]. In this wo rk, we parallel the use of bagging and boosting methods in modelling a chemical plant (CSTR) and co mpare the results against the corresponding SNN model. This paper consists of the following : in sections 1 and 2 a detailed study of bagging and boosting is presented. In section 3 the desired industrial process is introduced and then the bagging and boosting algorith ms for identification are applied to this process. So me results and conclusions are presented in sections 4 and 5. Finally, references are given in section 6.

Bagging
Bagging (an abbreviation of bootstrap aggregation) is one of the most extensively used ANN ensemble methods. The main idea in bagged neural networks is to generate a different base model instance for each bootstrap sample, and the final outputs are the average of all base model outputs for a given input [19][20][21]. So me of the advantages of bagging algorithm are as follows: • Bagging reduces variance or model inconsistency over diverse data sets fro m a g iven distribution, without increasing bias, wh ich results in a reduced overall generalization error and enhanced stability.
• The other benefit of using bagging is related to the model selection. Since bagging transforms a group of over-fitted neural networks into a better-than perfectly-fitted network, the tedious time consuming model selection is no longer required. This could even offset the computational overhead needed in bagging that involves training many neural netwo rks.
• Bagging is very robust to noise. • Parallel execution: although the boosting algorithm (discussed in the next section) has better generalization ability than the bagging algorithm, the bagging algorithm has the benefit of training ensembles independently, hence in parallel.
Presume the training dataset T is composed of N instances ( 1 , 1 ) , … , ( , ), where x and y are input and output variables, respectively. It is required to acquire B bootstrap datasets. As a first step, each instance in T is assigned a probability o f 1/N, and the train ing set for each of the bootstrap member TB is created by sampling with replacement N times fro m the original dataset T using the above probabilities. Hence, each bootstrap dataset TB may have many instances in T repeated a number of times, while other instances may be omitted. Individual neural network models are then trained on each of TB. Therefore, for any given input vector, the bootstrap algorith m offers B d ifferent outputs. The bagging estimate is then computed by determining the mean of B model predictions (see Figure 1). (1)

Boosting
Contrary to bagging, boosting dynamically tries to generate complementary learners by training the next learner on the inaccuracies of the learner in the preceding iteration. At each iteration of the algorithm the samp ling d istribution depends upon the performance of the learner in the preceding iteration [22]. In spite of bagging algorith m which operates in parallel,boosting algorithm is executed sequentially. In boosting, instead of a random sample of the training data, a weighted sample is used to emphasis learning on the most difficult examp les.
There are numerous different versions of the boosting algorith m in the literature. The original boosting approach is boosting by filtering and is explained by Schapire [23]. It requires a large nu mber of train ing data, wh ich is not practicable in many cases. This restriction can be overcome by using another boosting algorithm known as the AdaBoost [10]. Init ially the boosting algorithm was developed for binary classification problems. Then boosting algorith mssuch as AdaBoosl.M1 and AdaBoost.M2 [22] were developed for mu lti-class cases. In order to solve regression problems, Freund and Schapire [24]extended AdaBoost.M2 and called it AdaBoost.R. It solves regression problems by converting them to classification ones.
In this paper, we use AdaBoost.R2 fo r identificat ion. This method is a modification ofAdaBoost.R and is described in [25,26]. A description of this algorith m (shown in Figure 2) is as follows: g iven that the training dataset T consists of N instances ( 1 , 1 ) , … , ( , ), where x and y are input and output variables, respectively. Initially each value in the dataset is allocated the same probability value so that each instance in the in itial dataset has an equal likelihood of being samp led in the first training set; that is, sampling distribution, at step = 1, is equal to 1/ over all i, where = 1 to N. The boosting algorithm can be summarized as: 1) Input the labelled target data set T of theextent N, the maximu m nu mber of the iteration B, and a base algorith m learner. Unless otherwise stated, set the initial weight vector 1 such that 1 = 1 � for 1 ≤ ≤ . 2) For = 1, … , : (B can be determined by a trial and error method).
2.1) Fill the new train ing set with the distribution , and obtain a hypothesis ℎ ( ) ∶ → . 2.2) Co mpute the adjusted error for each instance:

Process Modelling Using Bagging and Boosting
In this section, the bagging and boosting (Adaboost.R2) algorith ms are used to identify a CSTR. Since, this paper is aimed at a black bo x model of the process, we need to collect data directly fro m the process or generate it by using simu lation. For this purpose, data is taken fro m the Daisy website[27] which is an identification database.
The process is a CSTR where the react ion is exothermic and the concentration is controlled by adjusting the coolant. The input variab le is the coolant flow rate (lit/ min) and the output variable is the product concentration (mol/lit). Sampling time is 0.1 minutes and the number of samples is 7500. Th is data is in the form of a (3 × 7500) matrix. The first colu mn of this matrix consists of time-steps, the second and third columns are input (coolant flow rate) and output (concentration) variables, respectively[27]. The input and output variables are shown in Figures 3 and 4. As shown in Figure 3, the input is constantly changing. So, the system is operated in dynamic mode.
In this paper, the bootstrap method is used to form subsystems in the bagging algorith m. The bootstrap procedure involves choosing random samples with replacement fro m a data set and analysing each sample the same way.
For the bagging algorith m, we use ten independent networks (B=10) and weights of each network are initiated randomly. Each network is composed of one hidden layer and the activation function of output layer is linear (purelin). However, the number of hidden neurons, their activation functions and learning algorithms are d ifferent. These specifications are listed in Table 1. For the h idden layer hyperbolic tangent sigmoid (tansig) and log sig mod (logsig) transfer functions are employed. For thenetwork training, Levenberg-Marquardt backpropagation (trainlm) and Bey-sian regulation backpropagation (trainbr) are used. Each individual network is trained for 10 iterat ions.  As mentioned before, the system is in dynamic mode, so the previous inputs and outputs affectthe output ( ) in the present time. In this case, the inputs to the network are the inputs in the present time ( ) , the previous input ( − 1, and the previous output ( −1). So, the neural network consists of three inputs ( ( ) , ( − 1 ) ( − 1)) , and one output ( ).
To determine the final predicted output of the trained ensemble, an average is taken over the predictions fro m individual networks.The results and conclusions are given in the following sections.
For system identificat ion using the Adaboost.R2 algorith m, the nu mber of iterations B should be determined. We select B to be equal to 20. A ll sequential networks are trained using the Levenberg-Marquardt back-propagational gorithm. Each network consists of two layers. Activation functions in the output layer are "purelin" and in the hidden layer are "tansig"or "logsig". The stopping goal for the single and every individual network is the mean squared error (mse = 0.000001). As described in the previous section, the lag space is equal to one; therefore, the nu mber of sampled data points employed for modelling is 7499. So, the input-output data points are in the form of [( ( ) , ( − . At the first iteration, all of the data points have equal chance to be selected. So, the probability of each data point to be elected is equal to 1 � = 1 7499 � = 1.333 × 10^(−4). Using the Adaboost.R2 algorith m described in the previous section, we will see that the value of prior to the final iteration is less than the threshold value of 0.5, however at the final iterat ion > 0.5. Hence, at this point the algorithm stops.  1) The performance measuresof the SNN (the comp lete data set is used to train the SNN) and ten indiv idual neural networks used to construct the bagged neural network and the final MNN are shown in Table 2. As shown in this table, the accuracy of the M NN is co mparable to the accuracy of the SNN. However, the variance of error in the M NN, compared to the SNN, has been significantly reduced. So, we can conclude that the bagging algorithm is successful in reducing the variance of error. In addition, thevalues tabulated for the square of the correlation coefficients(R-Squared) indicate that the regression in the MNN is better than the SNN.

Results
2) The performance measures of the SNN and eleven sequential neural networks for constructing the boosted neural network or the final M NN are shown in Table 3. Using the Adaboost.R2 algorith m, we see that the value of prior to the eleventh iteration is less than the threshold value of 0.5, however at the eleventh iteration = 0.7356287 . Hence, at this point the algorithm stops. Details of all the steps and the final results are shown in this table. The second column of the table shows that using the boosting algorithm leads to reductions of both of the error variance and modeling error in the MNN when co mpared against the SNN. It is clear fro m this table that regression of the MNN is better than the SNN (R-Squared closer to 1).

Conclusions
In this paper, identification of an industrial p lant (CSTR) was performed using the SNN and MNN techniques. Industrial processes can be very complex and may have highly nonlinear properties. Hence, a single neural network cannot identify industrial processes with sufficient accuracy. We used multip le neural networks instead of a single neural network. We performed modelling by employing the bagging and boosting algorithms. As shown in this work, the MNNs generated by these two algorithms outperformed the performance of the single neural network. When bagging is emp loyed, the accuracy of the MNN is co mparable to the accuracy of the SNN. Ho wever, the variance of error in the MNN, compared to the SNN, has been significantly reduced. The boosting algorith m leads to the reductions of both of the error variance and b ias of the M NN when compared against the SNN.Although the boosting algorith m ensures better generalizat ion capability than the bagging algorith m, the latter algorithm has the benefit of training the ensembles individualistically, hence in parallel.