Positive Information vs. Negative Information: Whether Humans Can Learn from Mistakes

In this study, we analysed an experience-based learning process in which part icipants received immediate feedback in formation as a reaction to right choices and to mistakes made. Informat ion was measured as non-randomness in the distributions of right and wrong choices. The data obtained provide evidence that the majority of the part icipants could use both positive and negative informat ion while learn ing. A small but conspicuous proportion of the part icipants could learn exclusively through the use of negative feedback informat ion, i.e., through their own mistakes.


Introduction
The role of negative experience in learning has been a topic of nu merous research studies. One of the most studied areas in th is research field stems fro m the Rescorla-Wagner model by Robert Rescorla and Allan Wagner [1]. They studied Pavlovian conditioning and suggested a frequently cited model to describe experimental data: where ΔV A stands for changes in the associative strength of the reaction to the stimu lus A in the presence of a compound stimu lus AX, with λ 1 being an asymptote of associative strength and α A ,β 1 being learning rate parameters. This formu la (1) is often referred to as a model of "error-driven learning". It should be noted, however, that Rescorla and Wagner themselves never interpreted the disparity between λ 1 and V AX as a measure of any errors, never discussed the very concept of "error-driven learning" and never even used the notion "error" in the work. The absence of this notion in their theory is quite reasonable because in Pavlovian conditioning, wh ich was the research subject, there is no place for errors. It would be quite inadequate to say that a dog that does not salivate upon hearing a bell sound "makes an error".
Many followers of the work by Rescorla and Wagner (e.g., [2], [3]) have ext rapolated on the topic of category learning in wh ich the concept of "erro r-driven learning" looks quite relevant and fruit ful. Also, the concept is used in mathematical linguistics [4] where the nu mber of errors is a measure of correct ly capturing informat ion in the course of natural language processing. Important achievements have been made in studies of error-driven learn ing using the modern techniques of EEG [5] and event-related potentials [6]. So me researchers speak of an "error-processing system" in which the system is "involved in detecting the fact that an error has occurred in a given task and in using that error informat ion to imp rove performance at the task" ( [5], pg. 680).
However, we failed to find in the literature a clear definit ion of what "learn ing driven by errors" actually means. There is a general understanding that errors/mistake s may be noticed by subjects and used to improve their task performance, there are equations relating errors and performance (for rev iew see [7]) but how errors co mbine to produce a progress were not reported.
There is a large literature on machine learning exp loring various methods of how to use mistakes to improve mach ine performance. But the term is mostly associated by default with error min imisation, not with understanding of a suggested task. For examp le, as applied in adaptive network studies, error-driven learn ing imp lies "learn ing to minimise the difference (i.e. the error) between a desired outcome and what the network actually produced" [8]. It seems still that minimising errors and learn ing are not comp letely the same.
In the study presented, we prefer to use the term "mistake" rather than "error". Fro m our perspective, insufficient attention has been paid to the possibility that mistakes may not merely be wastes, which have to be minimised, but may play a mo re constructive role. Specifically, we address the question: Can hu mans learn fro m mistakes? That means that the mistakes may be used not just to avoid them but to understand and recognise a hidden pattern.
Certainly, the answer to the question in simp le contexts would be "yes". For examp le, if we are speaking about learning not to touch sizzling hot surfaces or not to taste poisonous food, learn ing fro m mistakes is possible. In these examples, the consequences would be enough to minimise mistaken actions. But humans are often challenged by complex learning situations that demand abstract thinking and differentiat ion between what matters and what does not.
If we go beyond the simple cases, the very defin ition of "to learn fro m mistakes" becomes less clear. A random trial as well as a purposeful search will both produce mistakes, but how can we discern the former fro m the latter and determine wh ich mistakes will bring about a positive result?
Both earlier ( [9], [10]), and more recent (e.g. [11]) studies of classical reinforcement learn ing, as well as studies of animal cognition, use an experimental design in which a subject gains positive and negative experiences in the course of learning. The subject receives some kind of reward for approaching a goal and some sort of punishment is applied for straying fro m the goal. In terms of informat ion theory, one may say that the subject receives positive or negative informat ion. In other words, positive informat ion is associated with correct actions of the learning indiv idual, while negative information is associated with erroneous choices or behaviours, i.e., with mistakes made by the individual. The sum of the positive and negative information co mposes the total information flow of the learning process.
The term "informat ion" first appeared in the mathematical and technical sciences in the first half of the 20th century. However, the ideas about information penetrated rather quickly into mo re hu manitarian spheres, such as intellect theories, learn ing theories, psychology, and behavioural sciences. The successful use of the concept of informat ion in these scientific spheres will eventually depend on how well research succeeds in handling the problem of measuring info rmation, which was masterfully shown as early as 1955 by George Miller [12]. If one were able to estimate the amount of information that comes with positive and negative experiences, it would be possible to compare the relative contribution of positive and negative informat ion to learning and to the achievement of success.
A number of methods for measuring information are widely known. A mong them are methods by Ralph Hart ley [13], Claude Shannon [14], and Andrey Ko lmogorov [15]. One of the most popular approaches to measuring informat ion is the probabilistic approach developed by Shannon, which is based on the idea that the quantity of informat ion is the negative logarithm of the probability that some event will happen. The probabilistic approach allows one to estimate the informat ion quantity in complicated systems in which events happen with a variety of probabilit ies. As follows fro m this idea, a less probable event bears more info rmation. To apply the approach, researchers have to distinguish elementary events and to estimate their probabilities.
In a paper by Gavrikov and Kh lebopros [16], a kind of learning environ ment called a "research problem" has been suggested. A typical research problem includes the need to understand a principle of the functioning or logic of something that is not yet known. The research problem in our previous work included a method of problem solving that we called a "semi-binary d ialogue". The mode of such a dialogue allo ws for the d ivision of the co mp licated learning process into elementary events, the estimat ion of the probabilities of these events and, in princip le, the measurement of informat ion. Th is learning environ ment stimulated what is referred to as experience-based learning.
To answer the main question of the study, a few sub questions have to be considered. Does negative information carry a significant value per se? Does negative information constitute a larger or a s maller share of the learning p rocess as compared with positive info rmation? While learn ing, is it possible that individuals use only positive or only negative informat ion? We hypothesize that at least some individuals would be capable of using the negative in formation to solve problems within an interactive environ ment.

Learning Environment
A computer-based technique used in this research was described in detail in [13]. Here we give an idea of the method, as well as a description of the approach used to estimate information, which was not described in the previous work.
A learning environment was provided by an interactive computer program (called RWR -right/wrong responder) available on the Internet. The program consecutively presented to participants sets of nine geometrical figures in the form of a three by three matrix (Figure 1). The geometrical figures were circles, squares, and triangles. Each of the figures had three grades of gray colour: light, mediu m, and dark. They also had three grades of size: s mall, mediu m, and large. Thus, the variety of figures consisted of 27 variants. The participants had to choose via mouse click any of the displayed figures. As a rep ly, the program co mmunicated either "right choice" or "Wrong choice". "Right" and "wrong" had a conventional mean ing and were determined by a determin istic algorith m in the code. The algorith m was unknown to the participants and presented the following sequence: "small light gray figure" → "med iu m-sized med iu m gray figure" → "large dark g ray figure" → "med iu m-sized med iu m g ray figure" → "s mall light gray figure", and so on. It was also unknown that neither the position in the matrix nor the shape of the figure itself was of any significance. In the case of a right choice, a new set of figures was displayed.
The only instruction given to the participants before they began the test was that they should try to get only "right" responses. Therefore, the problem itself was first to differentiate between significant (size and colour) and non-significant (position and shape) features and then to grasp the sequence in which the right figures alternated. We considered those participants who made six right choices in succession to have successfully solved the problem, as had been indicated in the instructions. Our decision to use this particular technique was based on a number of considerations. Firstly, it g ives to the participants enough freedom to show their best performance. Secondly, it has been found in preliminary trials that the technique possessed a sufficient discrimination power. It means that the problem was enough easy to allow a successful solution and enough hard to prevent a random success. Thirdly, the technique itself is flexib le enough to allow further mod ifications as, e.g., use of words instead of graphic images.

Participants
The participants were students of the Institute of Economics, Management and Environ mental Studies (Siberian Federal University, Russia). An assistant presented the program to the students in a regular co mputer class simp ly by giving them the Internet address of the program. The students who did not solve the task on their first attempt could continue working with the program on the Internet at any convenient time. It is important to mention that participating in the program and solving the prob lem were totally voluntary on the part of the students. As a reward for the participation, all the students received ext ra course credit.
The students solved the problem in the spring semester of 2011. A ltogether, 102 students took part in the study, but many of the protocols appeared to be unsuitable because they were too short. If a protocol was less than 90-100 clicks in length it was not possible to treat it statistically, and such protocols were discarded. The long enough data from 58 students was selected for further analysis. The successful solution was found in the protocols of 45 students, of which 40 successful attempts were taken for analysis and 5 were discarded due to inadequate length.
The age of the students ranged from 20 to 21 years. Seventeen of them were male and 41 were female.

Measuring Information
The elementary events of learning in the study's environment can be p resented as a sequence of units and zeros of the sort "...1011100101000...", where 1 stands for an error and 0 stands for a right choice. Each symbol in the sequence bears some informat ion because it is a message fro m the program that represents its reaction to a human choice that has been consciously made. In this study, we analysed the positive information, i.e., the sequences of "right" messages, as well as the negative information, i.e., the sequences of "wrong" messages.
A random procedure gathered the figures into one set, and each set contained at least one, and sometimes more, right figures, but with lower p robability. On average, the probability to occasionally choose a right figure was theoretically estimated and equalled 1,344/9 ≈ 0,149. Likewise, the probability of making an occasional error would be (9-1,344)/9 ≈ 0,851.
Having determined the elementary events and their probabilit ies, we can apply the Shannonian approach: where I is the quantity of informat ion resulting fro m some i-th event, the probability of which is P. Using the formu la (2), one can estimate, for examp le, that a single "right" message bears log 2 (0,149) ≈ 2,74 bits, wh ile a double "right" message bears log 2 (0,149*0,149) ≈ 5,49 bits, and so on.
If a participant was to randomly choose the figures, the sequences of zeros and units would still appear in any case. Still, it would not be reasonable to consider them to bear any useful information. Rather, random clicking would most probably produce informational noise. To d ifferentiate between the noise and the useful information, we used a comparison of frequency distributions for a random process and the distributions resulting fro m part icipants' activity. Suppose P r (N) is a distribution resulting from random clicking, where N is the length of the series of zeros or units, e.g., "00" has the length 2, "000" and "111" the length 3, etc., P p (N) will then stand for a distribution resulting from a real participant's attempt. Therefore, the task is to compare P r (N) and P p (N) to ascertain whether the participant's work differs fro m random clicking.
A simp le way to get the answer is to build a confidence interval of the sort LL(N)≤P p (N)≤UL(N), where LL and UL stand for the lower and upper limits of the interval, respectively. To estimate the LL and UL, we perfo rmed mu ltip le co mputer calculat ions using the same algorith m that was implemented in the RW R program. For right choices we modelled the rando m clicking of 100 individuals, and for wrong choices we modelled 200 indiv iduals; the latter was required due to the g reater variety of wrong choice sequences. This approach is often called Monte Carlo modelling.
If P p (N) was greater than UL(N) for a part icular N, we interpreted this as the participant having received useful informat ion and calculated its amount with the help of formula (2). A graphical exp lanation of the comparison is given in Figure 2. The classical entropy-based view of information may be formulated as "what is not known" because a message that is known beforehand bears no in formation. We may summarise the explanations above by defining useful information as "what is beyond random".

Results and Discussion
The students were free in shaping their schedule, including the attempts they made, the quantity of wh ich was not limited. In the data, 33 part icipants made only one attempt, and 21 part icipants made fro m t wo up to ten attempts. Therefore, we had to treat the data of the participants who made a unique attempt and of those with mu ltip le attempts jointly.
In regard to learning through multip le attempts, two extreme v iews are possible. The first is that an individual's attempts are isolated, so the indiv idual does not remember much o f what was done in prev ious attempts. The second is that the individual remembers all previous attempts and learns fro m all of them. It was not our aim to decide which statement is closer to reality. Each of these views is only partially right. However, they present useful reference points fro m which to evaluate the data obtained. Figure 3 shows the distribution of isolated attempts, both successful and unsuccessful, on the axis "sum of negative informat ion" (SNI) against the axis "sum of positive informat ion" (SPI). The data help to answer so me of the questions posed above. First of all, the amount of negative informat ion, i.e., the information above random frequencies of some mistake series, does constitute a significant value in many cases. Sixty-three out of 68 attempts shown in Figure 3 have a significant share of negative informat ion. An additional 37 attempts having neither significant positive informat ion nor significant negative information are not shown in the figure and are discussed below.
Co mpared to positive informat ion, negative information is quite a peculiar thing in itself. In regard to the former, participants knew the goal -to get as many right clicks in sequence as possible. So, we can reasonably suppose that each participant consciously tried to get not merely an isolated right click but to get longer sequences of right clicks. It was the sequence of right clicks that determined the end of the test. In contrast, no one regulated wrong clicks. No rewards or punishments, except for possible discomfort, were associated with wrong clicks. Therefore, while it was unlikely that wrong clicks were the goal, if anything, they may have been an instrument to get right clicks. Every wrong click bore defin ite info rmation, specifically that the chosen combination of shape, colour, and size of the previous figure was wrong. If part icipants were ab le to put forward hypotheses and properly treat the received negative informat ion, they would be likely to always produce a definite series of wrong clicks before the next right one, which would result in a non-randomly h igher share of such series in the overall distribution. That is what we would define as "to learn fro m mistakes".
Also, as follows fro m Figure 3, the share of positive informat ion was certainly larger than that of negative informat ion in the majority of successful attempts. St ill, some successful attempts showed only negative information flow. Most unsuccessful attempts showed a prevalence of negative information. Generally speaking, larger shares of positive information can be expected because the positive informat ion came to participants in larger proportions, due to the lower probability of right clicks in the problem. Figure 4 shows a distribution of participants' results on the axis "sum of negative information" against the axis "sum of positive information". The in ferences made on the basis of Figure 3 are partially valid here as well because of sufficient overlapping in the data (part icipants with only one attempt). More importantly, because the data are based on individual people, they allow us to pay attention to the main question of the study. It is not very surprising that for successful experience-based learning, indiv iduals require positive informat ion, i.e., in our case, mu ltip le confirmat ions of the right choices. Three successful participants used only positive information. Perhaps it is also not surprising that some indiv iduals required a co mbination of positive and negative informat ion for learning. The significant shares if negative information are noteworthy in our case: it means that a great majority of the participants did use negative informat ion.
The surprise is those six successful participants who seemed to use solely negative informat ion. Their right clicks did not go beyond random clicking until the successful series of right clicks. Their wrong clicks were not random, which may mean that they were ab le to p roperly t reat the mistakes they made and so achieve the goal. Such unusual behaviour requires more attention. So, the six p rotocols of the participants were once again investigated and at least five of them aroused no suspicion of cheating. Therefore, 8-9% of the whole participant population can learn exclusively fro m mistakes, at least within the context of the research problem presented in this study.
Finally, we would like to ment ion the participants with zero information attempts. On the whole, there were 5 unsuccessful and 11 successful participants who made atte mpts in which neither positive nor negative non-randomness was registered.
There are two sources of such randomness. The first is t rue randomness. It could be that a participant really clicks randomly or close to that, and this could exp lain the failure of participants but not the success. The second source is compensation. All of the participants with zero information made rather lengthy attempts lasting fro m 300 clicks to over 1000, and sometimes they spent up to an hour on the work. It is likely that they showed one type of non-randomness in one sequence of the data and another type in another sequence of the data. In the calculation of an overall distribution, the sequences could have balanced each other out so that the resulting distribution fell co mpletely with in the confidence interval. All of th is requires a more detailed analysis, which lies outside of the scope of this study.

Conclusions
To conclude, the data provided evidence that useful negative informat ion constitutes a significant value for many learning indiv iduals. Most probably, negative information contributes less in the total informat ion flow as co mpared to positive information. However, a conspicuous portion of the participants seemed to use solely negative information during experience-based learning.
We would like also mention some advantages and limitat ions of the study.
We believe that introduction of informat ion science approaches will favour the development of behavioural sciences. It is necessary however to remember that the estimates of in formation values depend on the way in which informat ion is measured. St ill, it seems important to use the units accepted in information theory whenever possible in behavioural research. Psychological studies that use their own units or no measurements at all make it hard to compare their results with others in a broad scientific context. At the same time, the theory of information as a solid natural science provides such a basis for wide co mparisons.
A certain limitation may be seen in the very scheme of the study. Because the conditions for the part icipants were very easy the study exp lores rather observation of natural behaviour than experimenting in a strictly defined environment. The results obtained should be tested in a laboratory experiment.
Another limitat ion concerns the adopted treatment of the data. The treatment model imp lies that the results would be available when a part icipant has fin ished the problem solving, i.e., only post factum inferences are possible rather than a real-t ime mon itoring of the problem solving. To allow the latter, we should develop a different model of the informat ion estimat ion.