Entropy Applications for Customer Satisfaction Survey in Information Theory

This study shows an application of information theory in the field of survey scale. Based on customer satisfaction scale it is found out that according to the calculated entropy values, it is possible to reach the aimed information through fewer questions. In brief, the possibility of reaching the same information through fewer questions is shown.


Introduction
Information theory is the branch of mathematics that describes how uncertainty should be quantified, manipulated and represented. In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable. Ever since the fundamental premises of information theory were laid down by [1], it has had far reaching implications for almost every field of science and technology [2]. Information theory has also had an important role in survey scale studies [3].
Surveys are used to collect quantitative information about items in a population. Developing a survey is as much an art as it is a science. In addition, just as an artist has a variety of different colors to choose from the palette, you have a variety of different question formats with which to question an accurate picture of your customers, clients and issues that are important to them. A good survey question should be short and straightforward [4]. Further it should not be too long. The scale used in survey is defined by a set of two or more survey items that cohere in terms of individual's responses. A scale combines an individual's responses to a number of survey items into one score.
In this paper we apply information theoretic concept of entropy, to determine the number of questions in a selected

Basic Concepts of Information Theory
Shannon entropy is a quantitative measure of uncertainty in a data set. This section briefly defines Shannon entropy, relative entropy (Kullback-Leibler Divergence), joint entropy and mutual information. Let X be a discrete random variable, taking a finite number of possible values x 1 , x 2 ,…, x n with respective probabilities p i ≥0 for i =1, 2…, n and 1 in the works [7], and [1]. The joint entropy measures how much entropy is contained in a joint system of two random variables. If the random variables are X and Y, the joint entropy H(X, Y ) given in [7] is The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. The interpretation is that when mutual information is absent, marginal distributions are independent and their entropies add up to the total entropy. When mutual information is positive, marginal distributions are dependent as some combinations occur relatively more often than other combinations do, and marginal entropies exceed total entropy by an amount equal to the mutual information. Mutual information I is evaluated by the formula non-commutative measure of the divergence between two probability distributions p and q. KL is also sometimes called the information gain about X if p is used instead of q. It is also called the relative entropy in using q in the place of p. The relative entropy is an appropriate measure of the similarity of the underlying distribution. It may be calculated from The properties of the relative entropy equation make it non-negative and it is zero if both distributions are equivalent namely p = q. The smaller the relative entropy is the more similar the distribution of the two variables and vice versa [8].

Application
A scale that measures customer satisfaction is dealt with in this study. The number of questions for intended information to be reached using this scale together with entropy values was investigated. The usual measures of customer satisfaction involve a survey with a set of statements using a likert technique or scale [10,11].
Scaling in this survey was examined under 4 subscales titles and the subscales were named "Marketing Services Assessment (MSA)", "Operation Services Assessment (OSA)", "Accounting Service Assessment (ASA)" and "General Assessment (GA)" [9]. First, the probability distribution tables were constructed by use of the answers given by the customers concerning the subscales of MSA, OSA, ASA and GA. By using these tables, the Shannon entropy, joint entropy, relative entropy and mutual information values were calculated.
The survey was applied to 60 customers in order to measure customer satisfaction. It was composed of 18 questions and its Cronbach's coefficient α was determined as α= 0.77 where n is the number of components, 2 x σ is the variance of the observed total test scores and i y σ 2 is the variance of component i. Each question was evaluated with 1 to 3 scores in such a way that it would be one of the scales of "bad, not bad -not good, good". The attitude or information scores of the respondents of the survey was added separately and ordered. In addition, several subscales were determined for these 18 questions. The scaling included in the survey was examined under 4 subscale titles. The first subscale was mentioned in the literature as "Marketing Service Assessments" (MSA). The second is "Operation Service Assessments" (OSA), third is "Accounting Service Assessments" (ASA) and the final scale as "General Assessment" (GA). The subscale MSA was composed of a total of 5 questions, OSA was composed of a total of 7 questions whereas the subscale ASA had 3 questions and finally the subscale GA was composed of 3 questions.

Results and Discussion
The questions representing the subscales in the survey were determined separately and probability distribution tables were constructed separately for each subscale from the frequency values calculated considering the scores of the questions representing each subscale. Using these probability distribution tables, Shannon entropy values were computed for MSA, OSA, ASA and GA. With a view to examining what kind of entropy values the subscales of MSA, OSA, ASA and GA had with gender, joint probability distribution tables were constructed separately from the frequencies obtained from Gender -MSA, Gender -OSA, Gender -ASA and Gender -GA scores. The joint entropy values of all subscales and gender were calculated separately from the joint probability distribution tables constructed by means. Mutual information values were computed separately for all subscales and gender using the same joint probability distribution tables.
15 out of 60 customers undertaking the questionnaire were females and 45 of them were males. The subscales of MSA, OSA, ASA and GA were regarded as random variables in the study in order to compute the entropy values. The Shannon entropy values were calculated by using the probability distributions constructed for the random variables of MSA, OSA, ASA and GA. The frequencies, probabilities and entropy values of these random variables are given in Table 1. The entropy value 1.26 of MSA indicates that it is enough to ask two questions for MSA. Likewise, the entropy values found for OSA (1.34), ASA (1.11) and GA (1.31) also indicate that it would be sufficient to ask two questions in order to be informed to this end. In the scale applied, 5 questions were asked in order to be informed about MSA, 7 questions were asked in order to be informed about OSA, 3 questions were asked in order to be informed about ASA and 3 questions were asked in order to be informed about GA. As a consequence this part, it was sufficient to ask two questions so as to be informed about each of these variables.
To investigate what kind of entropy values the variables of MSA, OSA, ASA and GA had with gender, joint probability distribution tables were constructed separately from the frequencies obtained from Gender -MSA, Gender-OSA, Gender -ASA and Gender -GA scores. Table 2 gives joint entropy values of all subscales and gender. Table 3 gives joint entropy values of all subscales and gender. The result in the joint entropy H(X,Y) = 2.0644 with X = Gender and Y = MSA means that on average it would require two questions to guess the level of both variables. The same result is also valid for OSA, ASA and GA. 29 out of 60 customers undertaking the questionnaire were working to management and 31 of them were working to organization. To investigate what kind of entropy values the variables of MSA, OSA, ASA and GA had with working position, joint probability distribution tables were constructed separately from the frequencies obtained from Position -MSA, Position -OSA, Position -ASA and Position -GA scores. Table 4 demonstrates Position-MSA joint probability distribution of Position-OSA, Position-ASA and Position-GA. The result in the joint entropy H(X,Y) = 2.2529 with X = Position and Y = MSA means that on average it would require two questions to guess the level of both variables. The same result is also valid for OSA, ASA and GA.
Mutual information values were computed gender -all subscales and position--all subscales. These values are given in Table 6. In probability theory and information theory, the mutual information or transformation, of two random variables is a quantity that measures the mutual dependence of the two variables. If X and Y are independent, then knowing X does not give any information about Y and vice versa, so their mutual information is zero. The mutual information value calculated for the MSA -Gender variables, which are not independent, can be interpreted as follows. The variables MSA and Gender seem not to have a lot of information in common, only 0.0066 bits of information. The mutual information values also found for OSA -Gender, ASA -Gender and GA -Gender are interpreted in the same way. Table 6 exhibits shared information between pairs of variables. The pair sharing the most information is OSA -Gender, while the least is ASA -Gender. The mutual information value calculated for the MSA -Position variables, which are not independent, can be interpreted as follows. The variables MSA and Gender seem not to have a lot of information in common, only 0.0063 bits of information. The mutual information values also found for OSA -Position, ASA -Position and GA -Position are interpreted in the same way. Table 6 exhibits shared information between pairs of variables. The pair sharing the most information is GA -Position, while the least is ASA -Position.
The relative entropy is an appropriate measure of the similarity of the underlying distribution. If the distribution f and g are similar, the difference between D(f || g) and D(g || f) is small. In this study, the marginal probability distributions of both genders were found depending on each subscale. The marginal probability distribution of both genders for the subscale of MSA is given in Table 7. In order to investigate whether these distributions are similar or not, the relative entropy (Kullback -Liebler distance) values are computed.
The fact that these values are found to be close demonstrates that both genders show a similar distribution. Likewise, the relative entropy values found for genders according to OSA, ASA and GA are found. D The marginal probability distributions of both working position were found depending on each subscale. The marginal probability distribution of both position for the subscale of MSA is given in Table 8. In order to investigate whether these distributions are similar or not, the relative entropy

Conclusions
The analyses performed in this study prove useful to find the degree of uncertainty and to determine the number of questions in a selected scale with entropy method. It was found out that if we only want to be informed about the level of customer satisfaction, the number of questions in the scale to be designed has to be fewer while the number of questions concerned has to be increased if it is desired to determine the level of customer satisfaction together with gender.
For other studies, the survey can be reorganized by designing the scale with a new number of questions determined by the entropy method and reliability analyzes can be made again and information on customer satisfaction can be accessed in a shorter period of time.
In addition to all these, the entropy values were interpreted within the scope of the information theory and various recommendations were made for the researchers, who may apply such a study in the future, pertaining to the number of questions of the new scales to be designed as to rapidly access information about customer satisfaction.