Second-Order Separation by Frequency-Decomposition of Hyperspectral Data

In this paper, we consider the problem of blind image separation by taking advantage of the sparse representation of the hyperspectral images in the DCT-domain. Blind Source Separation (BSS) is an important field of research in signal and image processing. These images are produced by sensors which provide hundreds of narrow and adjacent spectral bands. The idea behind transform domain is that we can restructure the signal/image values to give transform coefficients more easily to separate. This work describes a novel approach based on Second-Order Separation by Frequency-Decomposition, termed SOSFD. This technique uses joint information from second-order statistics and sparseness decomposition. Furthermore, the proposed approach has the added advantages of the DCT and second-order statistics in order to select the optimum data information. In fact, representing the hyperspectral images in well suited database functions allows a good distinction of various types of objects. Results show the contribution of this new approach for the hyperspectral image analysis and prove the performance of the SOSFD algorithm for hyperspectral image classification. On the opposite of the original images that are represented according to correlated axes, the source images extracted from the proposed approach are represented according to mutually independent axes that allow a more efficient representation of information contained in each image. Then, each source can represent specifically certain themes by exploiting the link between the frequency-distribution and structural composition of the image. This application is of utmost importance in the classification process and could increase the reliability of the analysis and the interpretation of the hyperspectral images.


Introduction
A fundamental problem in remote sensing discipline, as well as in many other app licat ions (b io med ical signals, t eleco mmu n icat io n , et c …), is t o f in d a s u it ab le representation of mu ltivariate-observed data to extract the useful informat ion within the observed data [1]. Given that, this informat ion is subject of several perturbations, it is in general, not d irectly accessible. The main aims of this work are; firstly , to ident ify the transfer function o f lin king signals o f in terest (sou rces ) to th e observat ions, and secon d ly to resto re t he valuab le in fo rmation [2]. To overcome these problems, we develop an approach based on the Blind Source Separat ion (BSS) wh ich describes techniques that aim at separating signals if no informat ion is available about the original sources [3]. This technique is an important field of research in signal and image processing. It was introduced and formu lated by Bernard Ans, Jeanny Herault and Christian Jutten [4,5] since the 80's, and it raises now great interest. This situation is common to communicat ion signals [6,7], b io medical signals [8,9] and astrophysical data analysis [10,11]. Recently, this technique is adapted to remote sensing (mult ispectral and hyperspectral imag ing) to obtain more accurate representation of the geological and vegetative ground surfaces [12,13]. In fact, the large dimension of hyperspectral images needs and the heterogeneity of ground surfaces need to use various methods to describe image features.
In order to solve the problem of source separation, we seek to maximize the statistical independence between the different components of the estimated sources. An alternative approach to the BSS problem is to assume that the sources have a sparse expansion with respect to some basis (or dictionary) [14]. Briefly, a signal is said to be sparse according to a g iven basis if most of h is entries (or elements) have no significant amp litudes. To take advantage of hyperspectral images, we propose to exp lore a novel approach based on source separation in a frequency-domain. Thus, independence and sparsity which are the main hypotheses of all the source separation techniques are not required for the source images themselves, but rather for their spectra [14,15]. The proposed approach has the added advantages of the DCT and second-order statistics. The first exploits the inter-pixel correlation and the second exploits the inter-band redundancies. Both theoretical and algorith mic comparisons between separation in the spatial and DCT-do main are given. Thus, we associate statistical methods of BSS to different classification techniques to achieve a better result fo r the classificat ion of hyperspectral data. This work is reached by employing Second-Order Separation by Frequency-Decomposition. Termed SOSFD, this technique uses joint information fro m second-order statistics and sparseness decomposition. In this regard, this paper is organized as follows: First, we present a formulat ion of the problem of BSS and we clarify the related theoretical elements. Second, we establish the frequency-based approach on hyperspectral data. Finally, we study the results to show the contribution of this new approach on hyperspectral images and we prove the performance of the SOSFD algorith m.

Source Separati on Princi ple
The source separation method can be applied to hyperspectral imag ing to separate the components and ma ke them statistically independent. This method is the most appropriate for our study, since the observation images show a strong correlation between them. The principle of source separation technique consists in the ext raction of unknown source signals from their instantaneous linear mixtures by using a min imu m of prior informat ion: The mixture should be "blindly" processed [7,[16][17][18] Figure 1.
To ensure that the problem of BSS is well posed, the hypothesis that is generally accepted is that the sources s [k] kϵN , are statistically independents [20][21][22][23][24]. The realism of this assumption in a nu mber of real-world problems is obviously fully justified. But, a d ifficulty arises when the mixing matrix A is unknown, so how can we invert a matrix that is unknown? That it leads to determining Â -1 , an estimate of the inverse of the mixing matrix A -1 . Observations will be transferred to the system that perfo rms Â -1 to infer an estimate of the sources ( Figure 2). This gives: 1 1ˆˆ. In the BSS approach, the instantaneous linear mixing hypothesis has and continues to stimu late a great interest, both in terms of application and methodology [2]. No wadays, we have number of BSS tools wh ich prove performers in theory, but it remains to study their behaviour in practical situations such as in remote sensing domain [25,26]. Therefore we emphasize, in this work, on the method of source separation using second-order statistics for hyperspectral images to obtain more accurate representation of the ground surfaces.

Source Separati on Methods
In the beginning of 80s, research in the domain of BSS has been initiated by Bernard Ans, Jeanny Hérault and Ch ristian Jutten in the modelling of decoding movement in vertebrate's problem. The authors proposed an approach to separate sources based on neural networks [27,28]. Since this work, many BSS algorith ms have been developed [4,5]. We will p resent, in this section, some methods of blind source separation such as SOBI, JADE and fastICA.
• SOBI (Second-Order Blind Identificat ion, Belouchrani and al. 1997) exp loits not one but several covariance matrices of the observations. The authors show that after whitening observations, a joint diagonalizat ion criterion was used to estimate the mixing matrix [18].
• FastICA (Fast Independent Component Analysis, Hyvärinen and Oja 1997) explo its the principle of neguentropy approximated by the absolute value of kurtosis of the estimated sources [32,33].
Hyperspectral data [34][35][36] can be modelled in the form of instantaneous physical mixtures. The required sources have a physical orig in and their mixing coefficients are the unknown proportions. The intrinsic content of the sources is temporally or spatially correlated, moreover the mixtures exhibit localized spectral info rmation. Th is description affects the choice of the algorith m.
Therefore, one can use the second-order statistics which consider the spatial or temporal correlation [18,37], by applying the following SOBI algorith m wh ich is well adapted to this situation and provides robust solution for sources separation. In addition, data such as hyperspectral images suggested further development derived fro m SOBI with mixing the second-order statistics and an orthogonal inverse transformation like DCT. Th is study will be largely explained in the following sections.

Second-Order Separation Approach
The Second-Order Blind Identification (SOBI) is one of the well known second-order based approach to calculate the separating matrix. Therefore, we can assert that the separation is comp lete when the estimated sources are as spatially independent as possible. Accordingly, the separation task is achieved in two steps; the first step consists of whitening the signal of observation by applying a whitening matrix. The second step is to apply the joint diagonalization of several covariance matrices of wh itened signal vector [20,[37][38][39][40].

Whitening
This step consists of "whitening" the observed signal x[n]. This is achieved by mu ltip lying x[n ] by an n×m whitening matrix W wh ich satisfies where H denotes the conjugate transpose and R S (0)=I. Being a linear t ransformat ion, the wh itening step is performed to decorrelate and enforce a unit variance of variables of the vector x[n]. Consequently, through the whitening procedure, we only need to estimate the unitary mixing matrix WA=U, with U is an n×n matrix instead of estimating an m×n mixing matrix parameters. The matrix A can be taken as where, # denotes the moore-penrose pseudo-inverse.

Joint Di agonalizati on
The whitening operation as described in the previous section consists in finding an affine transformat ion that associate to x[k] a vector process whose covariance matrix is identity. Therefore, the new system (z (t)) guarantees a unitary mixing matrix as fo llo w (8) And W can be estimated fro m the covariance matrix of the signal x (init ial process). In fact, the covariance matrix is diagonalizable by U, and certifying its existence: is the auto-covariance of s i and diag [.] is the diagonal matrix formed by the elements of its argument. Thereafter, the question is how we can find the matrix U fro m the diagonalizat ion of the covariance of the whitened process at a given delay τ ?
The favourable solution that overcomes this problem is equivalent to diagonalizing jo intly several covariance matrices with several delays which increases the robustness of the separation. Then the estimate of the sources will be possible after the estimat ion of the matrix U.
In this manner, the source separation technique using second-order statistics is achieved using statistical informat ion availab le on sources at any time lag.

Sparsity Representation
Recently, the sparse representation of signals and images is a problem that has been drawing considerable attention and widely studied in many recent applications like in remote sensing. In this paper, we propose a novel structure of such a database for representing image content in order to select the optimu m data information. In fact, representing the hyperspectral images in well suited database functions allo ws a good distinction of various types of objects. In this paper, we apply a new source separation algorith m which is based on sparse representation of real hyperspectral data and show that choosing an appropriate basis is a key step towards a good sparse decomposition to improve the hyperspectral data analysis [14,41]. So, we exp lore in this work the sparse decomposition of hyperspectral data by using DCT and we will e xp lore the effect of sparse basis on dataset. Using the sparseness assumption, the following method illustrates the use of the mixing structure in order to estimate the mixing matrix [42][43][44].
We will define the model of sparse representation with a more formal way. Assuming a signal x is a vector in a subspace of finite dimension x=[x [1],…, x [N ]]. x is accurately sparse if most of its components are zero, i.e. its support supp(x)={i/ 1 ≤ i ≤ N and x[i]≠0} become, if sparse, |supp(x) |=K << N and the signal x is said K-sparse. In most applications, the signal is sparse in an appropriate transformed do main but not in its original one, so x can be written in a suited basis D as follo ws: where supp(α)=K << N and α[i] is the coefficient representing the contribution of the atom φ i of the dictionary D in x.
To estimate the sources, it is sufficient to find a representation in the form of a set of coefficients S such that s = SD where S is an unknown sparse matrix. In order to simp lify the problem, BSS method based on sparsity explo its the matrix S that contains few coefficients significantly different fro m zero [45][46][47]. By co mbin ing the representation s = SD with the instantaneous mixing model x = As, we find: The objective of BSS in the transform do main is to compute a new representation x=XD with X=AS following the structure of the chosen dictionary [47,48].

Data Descripti on and Methodol ogy
In this work, we use the Co mpact Airborne Spectrographic Imager (CASI) data ( Figure 3). The number of bands collected by CASI can be so great. Th is sensor can acquire up to 228 spectral bands between the wavelengths 400 to 1000 nano metres [49]. The proposed method described in Figure 4, shows a methodology based on two source separation techniques to evaluate hyperspectral classification: The first is in special domain and the second in a transformed do main. The latter shows a good performance and should minimize the misclassification risk of dataset.
To describe the source separation approach and to illustrate the corresponding results, we will use 9 observation images extracted fro m the CASI sensor, between the wavelengths 551.1 to 799.9 nanometres, by experts as the most pertinent to increase the reliability of the analysis of the study zone ( Figure 5).   The source separation method produces source images represented according to mutually independent axes. Therefore, there is a decrease in the rate of correlation between the source images. At this level, the decorrelation is achieved in the spatial-do main by the SOBI (Second-Order Blind Identification) algorith m [13,51]. Thus, a visual analysis shows the important contribution of the source separation method to discriminate natural themes compared to original images ( Figure 6). However, some sources don't have a physical sense and we cannot identify for them a significant theme like source 2, source 3 and source 4.

DCT-Do main Separation
To provide a valid decomposition of the hyperspectral images, we adopt a blind and automated procedure that relies on an optimal decomposition of the image spectra. The frequency approach used in this work is implemented by mixing DCT and second-order statistics. Since DCT is a linear orthogonal transformation, it can be applied either on spatial or on spectral data [52]. The used criterion should provide independent information turned to d istinct spectra. The extracted independent components may lead to a mean ingful data representation which permits to extract informat ion at a finer level of precision [53]. The positive effect of such transformat ion is the removal of redundancy between neighbouring pixels in the first stage and the discrimination between low and high frequency of bands in the second stage.
In this paper, we use the source separation criterion in the frequency-domain [46,54]. Therefore, the particularity of SOSFD approach is to imp lement the DCT in order to extract independent spatial-frequency sources. The DCT explo its inter-pixel redundancies to turn into excellent decorrelation for most natural images. The frequency source separation method can be modelled by the following form Hence, the source separation problem is transformed to the DCT-do main. The superscript (T) indicates that the related matrix is of T co lu mns. Furthermore, DCT exhib its excellent energy compaction for highly correlated images such as hyperspectral images and because the noise produces DCT-coefficients that are close to zero at a s maller frequency, we can model our frequency-based approach by a free noisy form where X dct (T') is a m×T' mat rix and S dct (T') is a n×T' matrix with (T') << (T). (T') is chosen to give the most important coefficients. So, T' corresponds to coefficients with the largest energy of the transformed images. The separation complexity can be reduced by manipulating (T') DCT-coefficients instead of (T) pixel values.
Then, to ensure the identificat ion of the sources and to improve the statistical efficiency, we estimate the dominant independent orientation fro m only the most significant DCT-coefficients (Figure 7). In fact, we adopt in our work an algorith m of independent component analysis in the frequency-domain. The frequency-separation criterion is based on the following steps: • Determining the threshold fro m the histogram obtained by computing K wh ich is the mean of all coefficients of a homogeneous DCT basis • Reducing the number of parameters to be estimated by whitening the observed process X dct (T) . So, the step of whitening is based on the covariance matrix and it is done by eigenvalue decomposition which is equivalent to Principal Co mponents Analysis (PCA). This process consists of whitening X dct (T) , the signal of observation by applying a whitening matrix W.
The wh itened process Z dct (T) still obeys a linear model given by where U is a n×n unitary matrix. Hence, instead of estimating the m×n mixing matrix parameters, we only need to estimate the unitary mixing matrix wh ich contains only n×(n-1)/2 degrees of freedom • Determining the unitary factor U fro m a unitary diagonalization of a whitened covariance matrix R dct (ν) for any frequency shift ν ≠ 0.
where D is a diagonal matrix.
• The existence of a frequency shiftν, such that R Zdct (ν) yields the relevant parameters, is directly linked to the existence of distinct eigenvalues of R Zdct (ν). To increase the statistical efficiency of the estimation, we can consider a joint d iagonalization of several covariance matrices R Zdct (ν i )1<i <n for n different frequency shifts (ν i )1<i<n. Fro m the spectral theorem, we can jointly diagonalize the set of covariance matrices by a un itary mat rix V that is essentially equal to U [19]. This leads to minimize the following joint diagonality (Jd) criterion.
Then the source coefficients in the DCT-do main are estimated as is styled. .
And then, by the inverse DCT-transform, we determine an estimate of the source matrix Ŝ and an estimate Â o f the mixing mat rix A such as (20)

Experimental Results and Evaluations
First, we illustrate the benefit of the blind source separation in the DCT-domain by comparing the performance of SOSFD approach with the classical second-order source separation that performs in the spatial-do ma in [18].

Joint Di agonalizati on Performance
The performance measure used to judge the quality of the separation is the Joint Diagonalizat ion (JD) criterion defined by the relation (18). In Figure 8, the JD criterion is plotted in decibels against sample size. The DCT-domain curve shows a performance gain reaching 5d B co mpared to the image domain curve. A sketch of the proof of the efficiency of the Joint diagonality criterion, when applied in the DCT-do main rather than in the original spatial-do main, is given in following section.

Power S pectral Density Evaluation
In this section, we illustrate the performance of our approach on hyperspectral data, which are known to be sparse in the DCT-do main. At the beginning, we consider the hyperspectral observations. Before processing, we show the power spectral density of these images (Figure 10-a). This figure illustrates the huge correlation between the power spectral densities of the hyperspectral images.
These spectral densities show a large nu mber of spectral components with very weak amplitude. This reduces the calculus complexity when dealing with source separation in the DCT-domain.
In (Figure 10-b), the source power spectral densities look more separated using second-order statistics in the spatial-do main.
The data resulting fro m the new source separation approach are presented in Figure 9. We can note a mo re effective discriminat ion between the different classes. The later are represented more clearly by maximizing the contrast between them, wh ich can improve the accuracy o f the classification process. In fact, this new approach produces source images represented according to independent axes that will therefo re permit an important decrease of the correlation between the extracted sources and allow a mo re efficient representation of info rmation contained in each image. Then, each source can represent specifically certain themes. Let us note that the DCT is a linear transform that is used to represent the frequency-content of image data in terms of amplitude or energy. This transformation is studied to establish the link between the frequency-distribution and structural composition of the image. The deco mposition of data by the DCT employs information contours of low frequency, midrange and high frequencies to energy at the edges. Consequently in comparison to Figure 10-a, the sources resulting fro m the new approach (Figure 10-c) are physically more meaningful; they maintain the spectral properties of the data while gaining the edge informat ion.
The effect of our approach is also seen from the power spectral densities of the DCT-co mponents. The corresponding sources are then identified reliably due to the distinct differences in their power spectra.
It is interesting to note that the most important spectral components of the new sources (Figure 10-d) are accumulated in the same frequency-range 0-15 Hz of original images (Figure 10-a); as opposed to the power spectra of the spatial-do main sources (Figure 10-b), which are ranging in a larger frequency-domain. Th is figure describes the source energy distributions according to the frequency-domain. Classical BSS is a mathematical or statistical method, so that the physical sense of BSS is not obvious. We are simp ly attempting to make the estimated sources independent. Subsequently, the DCT-deco mposition of images provides a physical understanding of frequency-domain BSS.
When applied in the DCT-do main, second-order statistics permit to group and separate the different spectra around each dominant frequency. This permits to give a physical sense to each generated source. The result of Figure 10-d is in excellent agreement with the previous results. The mean power spectrum of the DCT-domain source is in correlation with the mean power spectra of the original bands which enhances the physical interpretability of these sources (Figure 10-d ). However, the mean power spectral of the sources in spatial domain (Figure 10-d) is spreading in a larger frequency-field part icularly in the high frequency-field. Indeed, even if the hyperspectral data does not physically verify the independence test, BSS can find directions in wh ich the co mponents are independent. The estimated directions are less Gaussian thus most asymmet ric, which can imp rove the image classification. In fact, the BSS as mathematical approach better characterizes the relationship between components that are actually almost non-orthogonal. The large high frequency-power spectral values don't guarantee the physical interpretability; this led to three non significant extracted sources, like the source 2, 3 and 4 of the Figure 6.

Classification Method Eval uation
In order to evaluate the performance of the proposed approach, we use a traditional supervised method, which is the Maximu m Likelihood (ML) classifier [55]. The ML classifier is a spectral parametric classifier that characterizes the pattern of each class in terms of its pdf, the form of which is assumed to be known in advance. The pdfs are usually mu ltivariate Gaussian functions so the only need to estimate the mean vector and the covariance matrix. The estimation accuracy of the ML method is generally h igh. This method allo ws designing an optimal classifier to make availab le a statistical model which giv ing description of the observations x ∈X and the hidden state c ∈C. This statistical model must be estimated fro m the training set { } Characterized by mean and covariance, the Gaussian distribution has simple analytical properties. It is needed to estimate two parameters θ 1 and θ 2 which are respectively the mean vector value and variance value Before starting the discussion of results, we must define the terms for evaluating our results. Firstly, we identify the confusion matrix as erro r matrix which d isplays the degree of misclassification among classes. In fact, the quality of the classification is expressed by the number of p ixels correctly identified in the total for the studied area. The confusion matrix is a square matrix of size equal to the nu mber of classes and whose elements represent the number of well assigned pixels of each ground truth according to the corresponding classes. Among the indicators of relative accuracy, we cite the Omission Erro r (OE) and the Co mmission Error (CE) by and with X ij , X il and X cj are respectively the elements of the confusion matrix, the sum of ro w elements and the sum of column elements of the confusion matrix. Fro m there, a global measure representing the average rate of co rrect classification can be obtained such as the Kappa coefficient K  (31) with N P , M c and X Dii are respectively the total number of data pixels, the total number of existing classes and the diagonal elements of the confusion matrix. In this work, we used the classification Error Rate (ER) to test the performance of the classification. Th is indicator is obtained by Figure 11 provides the classification result fo r init ial bands (Figure 11-a), sources in the spatial-do main ( Figure  11-b) and sources in the DCT-do main (Figure 11-c). This classification was done using sixteen input classes identified fro m a ground truth chosen by experts who are familiar with the terrain.
The ER is of 14.54%, 12.14% and 11.97% respectively for the initial bands, the spatial-domain sources and DCT-do main sources.
In this context, it is crucial to highlight the value and the potential of the SOSFD algorith m by applying it to another data set.
Then, the proposed approach is applied to other hyperspectral Mapper (Hymap) data ( Figure 12). This sensor can acquire 126 spectral bands between the wavelengths 438 to 2483 nano metres [50]. Figure 13 provides the classification results for init ial bands (Figure 13-a), sources in the spatial-domain (Figure 13-b) and sources in the DCT-do main (Figure 13-c). The ER is of 24.88%, 19.24% and 17.15% for the in itial bands, the spatial-do main sources and DCT-domain sources respectively.   These experiment results show that the sources generated in the DCT-do main p resent the lowest classification ER and can provide a reliable tool for hyperspectral image classification.

Conclusions
This study confirms the potential of the DCT-transform for some image-t reatments. In this paper, we present a novel approach to separate hyperspectral data in the spectral-do main. Indeed, the hyperspectral images present a strong correlat ion wh ich affects the extraction of significant informat ion linked to ground truth. The joint application of the source separation method and the DCT-t ransform allows a more efficient representation of the spectral data and increase the reliability of the analysis of these images. The sources resulting fro m the new source separation approach are then identified reliab ly due to the distinct differences in their power spectra. The main conclusion to be drawn from this research study is that the application of the second-order source separation approach in the DCT-do main reduces the classification ER of the hyperspectral images. The use of a supervised classification shows that the sources generated in the DCT-domain present the lowest classification error and the more decorrelation between image themes. The ER is of 14.54%, 12.14% and 11.97% respectively for the init ial bands, the spatial-do main sources and DCT-do main sources. By apply ing the SOSFD algorithm to another data set, the ER is of 24.88%, 19.24% and 17.15% for the in itial bands, the spatial-do main sources and DCT-do main sources respectively.
To take advantage fro m the new representation of hyperspectral data, we propose a novel classification approach based on using Binary Part ition Trees (BPT). The BPT is obtained by iteratively merg ing regions and provided a comb ined and hierarchical representation of the image in a tree structure of regions. The proposed strategy incorporates spatial info rmation with spectral informat ion by jointly using the adjacency information. Indeed, this methodology is based on the consideration of spatial attributes in the model and region merging criterion.

S S =
with the number 1 at the l_th position.
The source covariance matrices in the spatial and DCT-do main, for any frequency shift ν k , can take the following form Because the inequality (38) is unchanged up to permutation and scalar factor, we can have for any k where P is a permutation matrix and D is a diagonal mat rix. So that, there exists a unitary matrix V that is essentially equal to U such that The form (41) proves the efficiency of the Joint diagonality criterion when applied in the DCT-do main rather than in the original spatial-domain.