The Methodology on Statistical Analysis of Data Transformation for Model Development

Generally, the main idea of this paper is attempt to give some idea for the researchers with different fields who want to make their own model based on the multiple regression of data transformation model. There is always being problems for researchers who want to perform their significant statistic analysis with different type of model equation (e.g. by using exponential, square root and etc.). Consequently, by following the steps of statistical methodology that given in this paper, the researcher should be able to create the calibration and validation model for their research purposed. Hence, the model developed can be use with sensitivity analysis with the appropriate variables that had been selected. Finally the chart, concept or theory can be further developed based on final multiple regression of data transformation model.


Introduction
Data transformation has been used in a variety of fields (e.g. in traffic, pavement and geological engineering). Basically, the analysis of linear regression is always used in develop the model such as [1], [3] and [5]. However, researchers such as [11], [13], [2], [4], [7] and [14] had used the method of multiple regression of data transformation in their application for mathematical modification to the values of a variable. The specific scientific questions of why that the data transformation is used had been raised regarding with the difficulties of develop model using others method (e.g. linear regression) where the calibration and validation of the model is not successful. As such, this research adopted the method used by [14] in data transformation to the values of a variable but with slight modification in its application to estimate the result from the researcher's studies with different fields (i.e. develop weaving section flow model of conventional roundabout). From [14] studies, the basic theory and issues surrounding the use of data transformation is explained thoroughly including the application such as square root, log and inverse transformation. Hence, the analytical results, multiple regression of data transformation analysis were used to initiate the new model development from various types of data. The basic research method on how the process of data transformation step by step is presented in following section.

Process of Data Transformation
The statistical analysis process using MiniTab, is explained in Figure 1 as follows; Generally, this study had found and elaborated that there are eight steps in order to develop new model which based on multiple regression of data transformation.
Step 1: Retrieve the data from fieldwork (either from primary or secondary data source). For this study, the fieldwork data obtained (i.e. from many sources which are retrieved by laboratory data from video capture technique, geographical information systems (GIS) and questionnaire).
Step 2: At this stage, after the data reduction was made, the outliers of data reduction need to be checked thoroughly in MiniTab. Basically there are many ways to check the outlier whether by graph of Histogram, Boxplot or Stem and Leaf (see Figure 2). Normally the outliers occur when there exist device or human errors during process of step 1.
Step 3: The outliers only can be omit, when there have strong reason to do such as not following the standards guideline or etc. After there are no outliers, the data need to be defined through descriptive statistic (see Figure 3) in order to define its mean, median, maximum and etc. It is important to know the limitation of the minimum and maximum for each data variables. Basically, the range of data variables can be determined from traffic theory, studies from previous researchers or in technical guideline. Otherwise, the data reduction from fieldwork data collection needs to be screened thoroughly in order to know exact range of each data variables.  Step 4: In this stage needs to identify which variables are dependent or independent. Normally, the scatter plot or fitted line plot can be used to determine the dependent or independent variables (see Figure 4 and 5). The dependent and independent variables are normally determined based on visual observation or the value of standard deviation S and coefficient of determination R 2 . The decision for dependent and independent are made for the purpose of research studies. Correlation is a statistical tool used to describe the degree to which one variable in a linear relation to another variable [10]. Step 5: Next, the decision on the significant relationship of each parameters with the data reduction are identified through scatter plot and regression analysis process. In determining the dependent and independent variables, the multiple regression of data transformation is applied in order to make it easier to be visualized and to improve interpretability. The standard error value is equal to the coefficient of the unit. Value relative to the standard error of this coefficient is very important to identify the reliability of the statistic analysis [6]. Step 6: After the final selected dependent and independent variable had been decided, in order to perform the multiple regression of data transformation model (see Figure 6), the data transformation is required to perform well based on P, T and R 2 value.
Step 7: There are two types of validity that need to be checked, which are model development and model validation. At this stage, after the multiple regression of data transformation model had been made, then the model needs to be calibrated and analyzed to make sure that the model development for weaving section flow at roundabout is valid. In model development, this validity can be checked well through normality test: if P value less than 0.05 (means accept H o ), Anderson Darling and Kolmogorov Smirnov test > 0.150 and Durbin Watson test value in the range 0 to 4 (see Figure 7). If one of test is not valid, the model needs to be re-checked again through from Step 2.   Step 8: In this stage, the model needs to be validated with other new fieldwork data. Normally the minimum data is half of the model development sample [16]. The comparison result of new fieldwork data needs to be in similarity with model development data through t-paired test. The t-paired test is thus, an indicator for this model validation where the value should be more than 0.05 (see Figure 8). If the value is less than 0.05, then the data need to be re-checked again in Step1 especially the range of minimum and maximum value should not be less or more than model development value. That is why many model equations had their limitation or range values based on their fieldwork conditions. Finally, after all steps had been accepted, the Final Multiple Regression of Data Transformation Model can become as successful model. In this study, model validation was performed by comparing the developed model output with new fieldwork data. [11], [13], [2] and [15] had explained that validation is the process whereby model outputs (e.g other fieldwork or simulation) are compared to actual field data of model developed to determine how well the model replicates real-world conditions.

Assessing the Need of Transformation
There is some reason that some equations are needed to be transformed in order to be successfully due to Step 7 (see Figure 1). Therefore, the researcher needs to have a full concentration of understanding from previous researcher's concept or from their own fieldwork data results in order to be success in developing the model equation. Even though, the researchers had success at Step 7 (see Figure 1), the validation from new fieldwork is needed through Step 8 (see Figure 1) in order to know whether the model developed is capable enough for another fieldwork data.

Numerical Results and Applications
Here are some of developed models results on multiple regressions of data transformation models in different fields (see Table 1): Through three of these models, the results and analysis according from Section 2 are elaborated further.
Step 1: Retrieve the data from fieldwork or laboratory (either from primary or secondary data source). See Table 2.
Step 2: In this stage, after the data analysis was made, the outliers of data analysis need to be checked thoroughly in MiniTab. Figure 9 shows the sample result of three models through histogram plot. From the results, Figure 9 shows the histogram plot for the overall models data which are normal distribution condition and represents a good data in order to develop the model. When the reasonableness of regression model is satisfied, it is important to investigate and determine whether the residual is appropriate as the normal distribution assumption [12].  Step 3: The limitation of the minimum and maximum for each data variables is shown in Table 3. Basically, the range of data variables can be determined from related theory, studies from previous researchers or in technical guideline. Otherwise, the data reduction from fieldwork data collection needs to be screened thoroughly in order to know exact range of each data variables. Step 4: In this stage, the parameters are identified either dependent or independent variables. From Figure 10, the scatter plot or fitted line plot are used to determine the dependent or independent variables. The dependent and independent variables are normally determined based on visual observation or the value of standard deviation S and coefficient of determination R 2 . The decision for dependent and independent are made for the purpose of research studies.   Step 5: Next, the decision on the significant relationship of each parameters with the data reduction are identified through fitted line plot and regression analysis process. In determining the dependent and independent variables, the multiple regression of data transformation is applied in order to make it easier to be visualized and to improve interpretability (see Figure 11).
Step 6: After the final selected dependent and independent variable had been decided, in order to perform the multiple regression of data transformation model (see Table 4), the data transformation is required to perform well based on P, T and R 2 value.
Step 7: There are two types of validity that need to be checked, which are model development and model validation. At this stage, after the multiple regression of data transformation model had been made, then the model needs to be calibrated and analyzed to make sure that the models development are valid. In models development (see Figure  12), this validity can be checked well through normality test: if P value less than 0.05 (means accept H o ), Anderson Darling and Kolmogorov Smirnov test > 0.150 and Durbin Watson test value in the range 0 to 4. If one of test is not valid, the model needs to be re-checked again through from Step 2. Kolmogorov Smirnov test performed to determine the residual that has a high value for Goodness-of-fit [8]. Step 8: In this stage, the model needs to be validated with other new fieldwork data. Normally the minimum data is half of the model development sample [16]. The comparison result of new fieldwork data needs to be in similarity with model development data through t-paired test. The t-paired test is thus, an indicator for this model validation where the value should be more than 0.05 (see Figure 13). If the value is less than 0.05, then the data need to be re-checked again in Step1 especially the range of minimum and maximum value should not be less or more than model development value. That is why many model equations had their limitation or range values based on their fieldwork conditions. For the purpose of the comparing analyse, this is the best way for collecting new data set which is used for model development and validation [9]. Finally, after all steps had been accepted, the Final Multiple Regression of Data Transformation Model can become as successful model. In this study, model validation was performed by comparing the developed model output with new fieldwork data. [11], [13], [2] and [15] had explained that validation is the process whereby model outputs (e.g other fieldwork or simulation) are compared to actual field data of model developed to determine how well the model replicates real-world conditions.

Conclusions
This paper deliberated thoroughly the steps on model development and validation process of data transformation. The parameters from develop model are basically needs to be understood which comprises relevant and significant behind the concept in order to develop the model (e.g. in scatter plots, the limitation of minimum and maximum of each variable were resolute and the independent variables were determined based on the relationship with the response variables). In determining the integration between variables, the stepwise regression of data transformation method was used and applied in MINITAB software. Hence, the models in this paper have been successfully developed, and sensitivity analysis of the model will be investigated in next stage of research study.