Multi-Resolution Mosaicing

High-resolution image mosaics construction is one of the significant topics of research in the fields of computer vision, image processing, and computer graphics. Video mosaics contain enormous valuable information but when analyzing, searching or browsing a video, redundancy between frames creates problem. Therefore it is more suitable to bring together all the frames of a shot into one single image called a mosaic. The idea of Multi-resolution mosaics comes when it is needed to handle large variations in image resolution. As the camera zooms in and zooms out changes in image resolution occur within the video sequence. Some frames which are zoomed in are at higher resolution and contain more detail. If the mosaic image is constructed in low resolution it results in loss of high frequency information in the regions of mosaic that corresponds to high resolution frames. On the other hand if the mosaic image is constructed at the highest resolution it will result in over sampling the low resolution frames. Varying image resolutions can be handled by multi-resolution mosaic data structure Zooming information of each pixel is recorded in a multi resolution data structure. In a multi resolution mosaic user can view the actual zooming information and can distinguish the region with and without zooming information.


Introduction
As information technology develops day by day, video image mosaics are becoming focus of research in image processing, computer vision, photogrammetric fields, and computer graphics [1].
Image mosaicing is a famous means of effectively increasing the field of view of a camera by allowing numerous views of the same scene to be gathered into a single image [2].
Multi-resolution mosaics are mosaics that handle large variations in image resolution. Changes in image resolution occur within the sequence e.g. as the camera zoo ms in and zoo ms out. The zoomed out fra me would have to be magnified using some form o f pixel rep licat ion or interpolation [3].
In a v ideo wh ich includes zoo ming camera motion, all frames are not captured at the same resolution. Frames in which objects are zoomed in usually contain more details. When these frames are reg istered they need to be resized so that they align on the single coordinate plane of mosaic image The resizing o f zoo med images causes loss of spatial informat ion. When the size of an image is reduced a large number of pixels have to be either deleted or overwritten. Consider a video which zoo ms in on a part icular panoramic scene. Although the total number of pixe ls per image, also known as the resolution is same fo r all frames of the video but the objects in the zoo med frames would have mo re detail. All frames of the video sequence have to be mapped to a single reference frame that is the coordinate system of the final mosaic image.
Multi-resolution mosaic is a new kind of mosaic that preserves the actual resolution of each object in a video. The informat ion loss during transformations of frames for align ment is overco me. Zoo ming information of each pixel is recorded in a mu lti-resolution data structure. In a mu ltiresolution mosaic user can view the actual zooming informat ion of every object.
Image Mosaic technology is very important and efficient to acquire mo re extensive scenes. Develop ment of d igital images becomes more and more prosperous due to fast growth of info rmation technology [4].
In literature several techniques have been presented that are robust against illu mination variations, moving objects, image rotation and image noise [5]. Researchers have focused on parallax affect and object motion that results in misalignments of fra mes [6].Proper b lending of image features at different resolutions i.e., mult i-resolution analysis has been introduced by Su, M.S., et al [7]. Images that are misaligned due to subpixel translation, rotation or shear are difficult to fu lly re-align. Stitching of such images can result in a mosaic in wh ich discontinuities are clear. Another technique presented by Li, J.S and Randhawa, S provide a method for creation of a seamless mosaic to reduce discontinuities [8].
The regions of high resolution which represents the camera's varying focal length are not necessary to be limited to a single area within the still image. If the camera zoo med in on an area and then panned, there would be a "stripe" of high resolution. If the camera zoo med in and out wh ile panning, there would be several regions of high resolution. For such scenarios L.A Teodosio and W.Bender have presented an approach where mosaic is constructed at the overall highest resolution, scaling the low resolution frames by interpolation [9].
Interactive visualization of h igh-resolution, mu lti-resolution images in 2D has been addressed by I. Trotts., et al. A method for interactive v isualization of mu lti-resolution image stacks is described. The technique relies on accessing image tiles fro m mult i-resolution image stacks in such a way that, fro m the observer's view, image tiles all emerge the same size appro ximately even if they are accessed from diverse tiers within the images comprising the stack. This technique enables efficient navigation of high resolution image stacks [10].
The techniques described so far have used a single-resolution compositing surface to blend all of the images together. In many applications, it may be required to have spatially-vary ing amounts off resolution, e.g., for zoo ming in on areas of interest [11].
The concept of mult i-resolution mosaicing has been explored d ifferently by different researchers. But the idea discussed and proposed here is new of its kind. When zoo med frame is transformed; original pixel intensity values are lost. The ma in challenge involved in formation of mu lti-resolution mosaic is to create a mult i-resolution structure, wh ich could store the ma ximu m zoomed intensity values for each pixel. Another important issue is display of mu ltip le resolutions (pixel intensities) to the user.When mu ltip le resolution exists in a mosaic than how to make it possible for the user to view the area having zooming informat ion and the region without zoo ming informat ion.

Multi-Resolution Mosaicing Technique
The proposed technique deals with multi-resolution mosaic creation using mult i-resolution data structure. The idea is to develop a data structure that will store the indexes of the pixels that co mpose the pixe l intensity of the transformed fra me. A zoo med image is transformed using bilinear interpolation to align with the unzoo med reference image. The presented technique is initiated by determining the SIFT (Scale Invariant Features) features of the two images to be stitched. The SIFT features are then matched and the best matches are extracted. Based on the features matched, the scaling factor is computed to determine whether the fra me is zoo med or not. Image transformation phase is shown in Fig 1 in dotted block. The proposed technique which will create mult i-resolution data structure to create the mapping between mosaic image pixels and their ma ximu m zoo med in intensities. The proposed technique will p rovide the user with the facility to select any area of interest and the multi-resolution data structure will be provide the zoo ming details of the region selected by the user. The suggested algorithm is simp le and efficient in terms of computational co mplexity. The technique for mu lti-resolution mosaic creation can be given as follo ws: A. Feature Detection Feature points are detected using Scale Invariant Feature Transform SIFT feature detector. Invariant scale features are also called SIFT features. SIFT features are local image features, which keep invariant in rotation, scale or illu mination, and also robust in vision changes, affine changes or noises. SIFT algorith m is robust in detecting feature points. SIFT algorithm is also co mple x and inefficient. The time co mplexity of algorithm is high [1,12].
Following are the majo r stages of computation used to generate the set of SIFT features. See [9] for details.
1. Scale-space extrema detection: The first stage of computation searches over all scales and image locations. It is imp le mented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.
2. Keypoint localizat ion: At each candidate location, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stability.
3. Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient direct ions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformat ions.
4. Keypoint descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allo ws for significant levels of local shape distortion and change in illu mination. Each keypoint is represented by a 128 element feature vector.
B. Co rrelation It is required to find the correspondence between the SIFT features extracted. The indexes of the 2 sets of SIFT descriptors are determined that match accord ing to a distance ratio. The distance between all pairs of descriptors in two corresponding sets of descriptors is computed. If rat io is smaller (or it is found to be in the threshold probabilistic way) to the given Distance Ratio, indexes will be zero for this entry.
C. Determine the Scaling Factor After obtaining the points of correspondence between two successive frames, it is possible to generat correspondence maps between the two images. These correspondence maps can be used to combine the images into an aggregate structure. sc = scale*cos(angle) (3) ss sc tx ty] Scale=sqrt(ss*ss+sc*sc) (4) Here, sc specifies the scale factor along the x axis; ss specify the scale factor along the y axis. tx refers to translation along x-axis and ty translation along y-axis. In this way using matlab t ransformat ion function spatial transformation mat rix is obtained. Fro m the transformation matrix the scale is obtained. Linear conformal transformations can include a rotation, a scaling, and a translation. Shapes and angles are preserved. Parallel lines re main parallel. St raight lines remain straight.
Based on the observations and experiments, the scale threshold is set to 0.9. If the scale factor is less than the threshold the current frame is zoomed with respect to the reference fra me.

D. Resizing Zoomed Frame and Preserving Pixel Intensit ies
The high-resolution fra mes represent the regions in video sequence, where the camera zoo ms in, to capture any object of interest at higher resolution. For the zoo med fra me the scaling factor is observed to be less than the specified threshold. The zoo med in fra me is resized using bilinear interpolation so that it could align with the reference image. Consider following sequence of successive frames in Fig 2, where every ne xt frame is at h igher resolution than the previous one. As shown in Fig 2, when the camera zoo ms in, resolution of the vehicle is increased step by step in successive frames. Due to resizing of zoomed fra me for allign ment, number of pixe ls in the interpolated frame is reduced, resulting in informat ion loss. The aim of the proposed technique is to preserve the informat ion lost in resizing.
For bilinear interpolat ion, the output pixe l value is a weighted average of pixels in the nearest 2-by-2 neighborhood. Loss of orig inal pixel intensities during interpolation is demonstrated in Fig 3. In the proposed technique, for each pixe l P in transformed image the x and y indexes of P (1, 1) are saved. Thus from the in itial index the indexes of rest three neighbouring pixel indexes can be computed. The scale would determine the size of the transformed image. Let I be the zoomed image and I t be the transformed image after interpolation. Let r1 and c1 be the rows and columns in the zoo med frame initially than after resizing: 2 = * 1 (5) 2 = * 1 (6) here r2 and c2 are rows and colu mns of the transformed image I t . The ro w and column rat io of zoomed image I and transformed image I t is calculated as follows: x_ ratio and y_ratio is used to calculate the x and y coordinates of the pixels in the zoomed image I which are contributing in the computation of interpolated pixel intensities in transformed image I t . The x and y coordinates of the pixe ls in the zoomed image I are calculated as: _ and _ are the x and y coordinates of the pixels in the zoo med image I. As described earlier that if the initial indexes for any pixel 'P' i.e., _ and _ are stored, for each pixel in the transformed image, the actual pixe l intensities can be retrieved fro m zoo med fra me.
The total nu mber of pixe ls in the resized image would be r2*c2. Therefore it is required to record r2*c2 indexes in order to create a mapping matrix fro m mosaic image to the real zoo med in intensities for each pixel in the image matrix. Thus it is required to create a data structure of the same size as the mosaic image that would keep track of the pixels having zooming info rmation and pixe ls without zooming details.
(18) Where A, B, C and D are the four closest pixe l values which are used in bilinear interpolat ion as follows and then assign a single value to the output pixe l by computing a weighted average of these pixe ls in the vicinity of the point. The weightings are based on the distance each pixel is fro m the point.
Where I t ( , ) refers to the interpolated image p ixe l intensities in the transformed image. Our proposed technique ma kes use of saving indexes rather than the pixe l intensities A, B, C and D, wh ich compose the interpolated image intensities. Fig 4 shows the interpolated resized image and corresponding zoo med fra me .Here it can be seen that A, B, C and D are the four closest pixels. Now the four actual intensities are lost and a new single intensity value is formed. Therefore it is needed to keep track of the co mposing four pixe l intensities for each pixel. There are two ways this idea can be imple mented; either by recording the indexes of the composing pixels or by recording the pixe l intensities . The first approach is preferable. The idea is efficient in terms of time and me mo ry requirements, as it is needed to record only x and y index of one initial pixel, instead of saving four intensity values for each pixe l in interpolated image. Besides this lot of redundant information would be stored e.g. if pixel 1 is formed fro m pixels (2,1),(2,2),(3,1) and (3,2) and pixe l 2 is formed fro m(3,1),(3,2),(4,1) and (4,2) than redundant informat ion is saved if four intensities for each pixel are stored. For high resolution images, this would consume lot of me mo ry.
Hence, it is required to record x and y coordinates of 'A' i.e., _ and _ for each interpolated pixel in I t . If the size of the transformed image I t is r2*c2, it is required to have a data structure that stores r2*c2 'x' and 'y' coordinates.

E. Formation of Multi-Resolution Mapping Matrix
Since during image transformation, many pixel intensities are lost and the details in zoo med image are no more visib le in the final mosaic image . In order to record the initial indexes of composite p ixe ls, a mu lti-resolution data structure is proposed that creates efficient mapping fro m mosaic to the original zoomed p ixe l intensities.
For examp le consider a 240* 320 mosaic image. In Fig 5, a data structure is shown whose size is equal to size of mosaic image which shows how ma ximu m zoo m intensities, for each pixel in a 240* 320 mosaic image are stored. '0' against any pixe l represents that this pixe l do not contains any zoo ming informat ion. Whereas the non-zero values for any pixe l represent the indexes of first closest pixe l contributing towards format ion of its interpolated intensity.
Whenever an image is zoo med, some pixels are zoomed in whereas some pixe ls are not. Multi-resolution "mapping matrix" is used to keep track of zoo med and unzoo med pixels in the mosaic. For zoo med pixels, it keeps the starting index of the actual corresponding pixels in the original zoo med frame .
As shown in Fig 5 the index x and y of pixe l 'A' is only stored in the mu lti-resolution matrix. Th is index represents the first closest pixe l in the actual zoo med fra me. The indexes of rest three co mposing pixe ls B, C and D can be obtained from coordinates of A. In the following Fig 6, the indexes (2,1),(2,2), (3,1) and (3,2) are the four neighbouring indexes of A,B,C and D intensity values in ma ximu m zoo med frame that have formed pixel 2 .
As shown in Fig 6, corresponding to each zoomed pixe l in mosaic image, there are four indexes represented. From theses four indexes the starting index i.e. (2, 1) for p ixe l 1 is stored in the map as shown in mult i-resolution mapping matrix. These indexes correspond to the four neighbouring pixe l intensities in the actual zoo med frame.

Viewing The Region Of Interest Selected By The User
When the final mosaic has been created the user can scroll the mouse over the image and select a rectangular region of his interest. If the area selected by user, has any further resolution, it will be displayed.   Three different cases can happen. First, the region selected contains some pixe ls that have zooming informat ion and some pixe ls do not have. Second, none of the pixels in region have zooming informat ion. Third, all the p ixe ls in selected area have zooming informat ion. Considering the first and the most comp le x case the rest two cases are also demonstrated. In the first case, it is required to create a "mult i-resolution view" for the viewer.
First, the pixels between the starting and ending points, ma rked in Fig 7, are traced in the mult i-resolution "mapping matrix". Those pixe ls that have non-zero indexes in the matrix have zoo ming information. As it is evidently shown in Fig 8, fro m the region selected, some pixels are zoo med, whereas some pixe ls are not. For examp le for the region selected in the mosaic in Fig 7, the starting pixe l coordinates are (2, 10) and ending pixel coordinates are (220,133).
As shown in Fig 8, the sub region containing zoo ming informat ion and the sub region without any zooming informat ion are indicated. It can be seen that pixe ls having zoo ming informat ion, are starting fro m (x1, y 1) i.e., (17,20) and ending at (x2, y 2) i.e. (150,133)    (23) Thus the "zooming factors" wou ld be determined as follows: Zooming_x=New_height/Old_height (24) Zooming_y=New_width/Old_width (25) Zooming_x and Zooming_y are the scaling or zoo ming factors along x and y directions of the reg ion having zooming informat ion. Since there is some region in the area selected without any zooming informat ion thus it is required to create a "mu lti-resolution view" user where the region with actual zoo ming information and the reg ion having no zooming informat ion are d istinguishable. Let 'rows' and 'cols' be the rows and columns of the whole region selected by the user. The region with zoo ming informat ion has greater number of rows and columns as compared to the region without zoo ming informat ion. Th is is the hinderance in format ion of a rectangular matrix that can be displayed using matlab display functions. To overcome this limitation, the region without zoo ming informat ion is resized using the same scale or zoo ming factor. Therefore, the whole selected region would be first resized to new dimensions calculated as follows: New_ro ws=Zooming_ x*rows (26) New_colu mns=Zooming_y*cols (27) The resized image is shown in Fig 10. So the whole selected region is resized to New_rows and New_colu mns.
In the resized image shown in Fig 10, some region has actually zoo med pixe l intensities. The actual pixel intensities retrieved using the mu lti-resolution mapping data structure as shown in   When the all the pixels in the region selected by the user have zooming detail it is simpler as co mpared to above mentioned case. The zooming intensities would be obtained as described in the previous case and displayed to the viewer.No merg ing of zoo med and unzoo med p ixels is required. Considering the third case, where none of the pixe ls in the selected region have zooming informat ion. Since, no info rmation exists in mu lti-resolution mapping matrix for such pixels, as depicted by '0' in the matrix. Thus the region would be simply resized by a factor of 1.25 and displayed to the user. That will make the user understand that there is not actual zooming information in the selected area.

Image Stitching
In order to create the mosaic image it is needed to align the transformed image with the reference image. These images are placed on identical sized canvases and then combined to create the final image. Using the offsets of the transformed image with respect to the original image the images are aligned. The offset is calculated by using affine transformation The offset of in itial point of the transformed image with respect to the initial point of the static image can be positive or negative. This information along with the image sizes allo wed for not only the summation of the transformed and static images, but also creates an appropriate canvas for the resultant image.

Results and Analysis
To prove the proposed technique, algorithm is tested on various video sequences. For carrying out analysis of proposed technique some parameters are defined. The qualitative parameters involve computational complexity, performance loss, processing overhead.
The videos selected for experimentation are those which includes single object zoo ming. Since the proposed algorith m is a new algorith m and no such multi-resolution mosaicing algorithm exists so there is no co mparative study performed with any other technique.

A. Computational Analysis
To prove our proposed technique, algorith m is tested on various video sequences. Execution time for various video sequences is shown in TABLE I. The results presented in TABLE I revealed that the algorithm is efficient. Execution time co mparison is not given because no such mu lti-resolution mosaicing technique presented until now. The technique is time efficient .Inspiteof the complexity and the huge computations involved in it,its execution t ime is good enough. As compared to simple mosaicing techniques, memo ry requirements of mu lti-resolution technique presented are high. The execution time is recorded by its execution in Matlab , by its imp lementation in other appropriate tools like C sharp, mo re efficient results are expected.
B. Visual Results Visual results of mosaic are most important. Visual result of panorama depends on type of camera, environ ment and weather. Panorama created fro m v ideo sequence 1 is shown in Fig 12. It was taken fro m dig ital handheld camera and noticeable intensity variation. Other two video sequences are taken fro m Sony digital camcorder with automatic camera control on brightness which causes intensity variation.