A Model Software Reuse Repository with an Intelligent Classification and Retrieval Technique

The essence of software reuse is the use of engineering knowledge or artefacts from existing software components to build a new system. Software reuse can significantly improve the quality of software products and reduces the overall development cost. Software reuse repository must be designed and developed in such a way that they can easily locate the components based on the requirements of the developers. This work proposes a new methodology for efficient classification and retrieval of multimedia software components based on user requirements by using attribute classification scheme with genetic algorithm. In this intelligent classification we use Genetic algorithm that performs the classification of reusable software components in an intelligent manner and retrieves the components based on the requirements of the developers.


Introduction
Software reuse is the use of engineering knowledge or artefacts from existing software to build new systems [1]. The most common reuse product is the source code. Not only limiting to the source code reuse the other work products like the design, documentation, architecture, test data, tool and requirement specification can also be reused.
Software reuse is an important area of software engineering research that promises significant improvements in software productivity and quality [4]. Reuse has the potential to reduce cost, increase the quality of the products and shortens the time of software development. Reuse makes sense because the similarity found across software systems is significant. It is usually found that 60% to 70% of one development activity is common to the next activity. From this point of view software reuse can be promoted as a productivity and quality enhancement. As it is observed now a day's the cost pressure is increasing example in case of telecommunication and banking domains [3].
The biggest problem of software reusability in many organisations is the ability to locate and retrieve the existing software components. To overcome this problem, a necessary step is the ability to organize and catalogue collections of software components, to quickly search a collection to identify candidates for potential reuse, which can be used by developer to incorporate the components to build new efficient applications based on the requirements.
The best quality reuse repository tool is required to have a wide variety of high quality components, which are organized in an efficient manner using a classification technique and must be able to retrieve the best components that match user requirements. Effective software reuse requires that the users of the system have access to appropriate components. The user must access these components accurately and quickly and if necessary be able to modify them [2,16]. This paper focus on the new methodology of intelligent classification and retrieval of software components from the reuse repository, this method implements a genetic algorithm for the effective classification of components in the repository and retrieves the best fit components from the repository based on the user requirements. This paper is organized into the following sections. Section 2 is the literature survey which describes about the various existing classification techniques that are used to classify the components in the repository. Section 3 describes about the architecture of the proposed system in detail. Section 4 describes in detail about the intelligent classification and retrieval technique in two phases the component classification phase and retrieval phase. Section 5 explains about the genetic algorithm for identifying the classifiers. Section 6 deals with the experiments and the results. Section 7 explains about the graphs in details which deal with experimental results. Section 8 deals with conclusion and future work followed by references.  [16]. All of find an approp or drawback w he keywords u h my result in mple of free te NIX manual sy ge overheads in time taken to m ation n uses a set of m n a hierarchy n of this is th ks in a library. , has its own cl alist subject ar ain be sub cod advantages a ts of a unique n scheme will a assified within f more than on s one dimensi n of componen rated classifica cation scheme ification ation scheme u ent [6]. For exa he author, the classification c e only examp who wants in e concerned w used, the type n are reuse has b programming reusability, de ment strategies for setting u ents and classi ftware professi ware library th sification and using the text system is typic f the documen priate entry fo with this metho used. Another n many irrele ext retrieval is ystem. This typ n the time take make a query.

Intellig Retrieval
The Intellig software com into two phase tion phase, wh sifier discover retrieval of co    several difclassifies a number of software components into a homogeneous set in terms of characteristics. The classifier sets may have common elements as the classification process is based on component characteristics, with which it attempts to find large group of components with common values. There will be large no of components classified against a small number of classifiers.
Searching for a component will be performed by examining the user preferences against the classifiers rather than the actual components, something which will result in a fast searching process. The threshold parameter value specifies the similarity of a component with a classifier (that is the number of perfectly matched characteristics).

Phase 2: Component Retrieval
In the component retrieval phase, user will search for a specific component. First the user will enter the desired characteristics of a component which he wants to retrieve from the reuse repository, through an interface. Second the user will set the matching threshold value (obviously the lower the threshold value the more components will be returned and higher the threshold value, exactly the components that matches user entered characteristics will get returned).
The System will encode the user request as a bit string and will compare it against all classifiers that were discovered in the classifier discovery phase. The closest match will signify the "winning" classifier and the components that are classified under the winning classifier will get returned.

Genetic Algorithm for Identifying the Classifiers
A dedicated Genetic Algorithm [5] was developed to evolve candidate classifiers and select the optimal solution in terms of number of components in the corresponding classes, which works in discrete steps as follows: 1. Create a random population of 100 chromosomes (potential classifiers) 2. For every generation of genetic algorithm 2.1 Apply crossover operation to every pair of classifiers, where each pair is randomly selected according to the crossover probability 2.2 Apply mutation to a randomly selected classifier according to the mutation probability c) If the average fitness of the current generation is greater than that of the previous generation then create a new population by selecting chromosomes according to their fitness and repeat step 3. Otherwise do not create new population and repeat the step 2 The above algorithm is repeated until a termination condition is reached. In our case the algorithm terminates if no improvement in the average fitness of the population is observed for 100 generations. A very important parameter is the value of threshold, which determines whether a component belongs to a certain classifier. For example, a value of 40% means that at least 40% of the values of the classifier characteristics are identical to those of a component. This threshold value essentially determines the "success" level of a classifier to gather a rich number of components in his class.

Experimentation and Results
The first phase of the experiments was concerned with the classification of pool of components. In the second phase we investigate the retrieval of specific components.

Classification Phase
For the classification phase we created a randomly 1000 components, each comprising 36 bits. The results reported are averages over 100 runs. The classification of the components is based on the 9 characteristics as described in section 4. The threshold parameter is of paramount importance to our method, since it is a measure of similarity between the component characteristics and the classifier characteristics. We set the threshold value to assume the values of 30%, 40%, 50%, 60%, 70% and 80% for comparison purposes. The value of 30% produced classifiers, where each classified almost all of the available software components. This denotes that the classifiers derived cannot differentiate between the components. The threshold value of 80% did not produce good results either, because each classifier classified only between one and three components, which is also undesirable as it leaves many components unclassified.
The results for the threshold values of 40%, 50% and 60% are listed in Table 1. The "Average" column denote the average number of components classified by each classifier, while "Not Classified", denotes the number of unclassified components. The scores of 50% are quite successful, since there are no unclassified components and each classifier includes almost half of the components (47%). Thus, in the retrieval phase only half of the components need to be searched. Moving along the same line, the value of 60% is also satisfactory since each class contains a small number of components (58 on average), but there is a significant number of unclassified components. The threshold value of 40% did not perform at all as it classified almost all of the components are classified by its classifiers.

Retrieval Phase
In the Retrieval phase testing, we created a 10 random user requests searching for software components. Then the threshold value is set from 40% to 70% at increments of 10% as shown in Table2. We can observe that the 40% threshold returned a richer number of components, but not all of them were relevant to the user's requirements as expected. The 50% and 60% values retrieved less but more relevant components. The 70% threshold returned results for some queries only but it retrieved exact matching components for user's request.

Graph for Comparing the Search Effectiveness of Various Classification Schemes
Search effectiveness refers to how well a given method supports in finding relevant components in the repository. It tells about the number of relevant items retrieved over the total number of retrieved items.
The graph for comparing the search effectiveness of various classification schemes is depicted below in figure 4. The horizontal axis on the graph represents the list of various existing classification schemes along with the intelligent classification scheme. The no of data items are represented along the vertical axis. The total data items retrieved are shown in white colour and the coloured area indicates the percentage of the relevant items among all the retrieved items. Explanation: Comparing with existing classification techniques, the integrated classification scheme performs well in retrieving the most relevant components according to the user requirements, but this scheme classifies components in the repository using only few attributes. Whereas, our proposed intelligent classification and retrieval system classifies the components in a broad manner on the basis of both functional and non-functional characteristics, which makes the proposed system more efficient in nature in retrieving most relevant components. The retrieved components working performance is highly commendable when integrated in the newly developing software systems, as the system supports the retrieval of most relevant items matching the user re-quirements.
The threshold value parameter which is important in the Intelligent classification and retrieval technique will effectively determine whether a component belongs to certain classifier or not. For example, a value of 40% threshold means that at least 40% of the values of the classifier characteristics are identical to those of a component. Thus, the selection of threshold value parameter will also plays a prominent role in retrieving most relevant components among the existing components. The graph representing the finding of most relevant components is shown for the threshold values 40%, 50% and 60 % which marked the high performance results, in retrieving the most relevant components among all the components but they retrieved very less components but all of those matched with most of the user requirements.

Graph for Comparing the Search Time of Various Classification Schemes
Search time is the amount of time spent by the reuse repository system to locate the specific component in response to the user request. The graph for comparing search time of various classification schemes is shown in figure 5. The horizontal axis on the graph represents the various existing classification schemes along with the proposed scheme and the vertical axis represents the total search time to retrieve the components. The total data items retrieved are shown in white colour and the coloured area indicates the search time to retrieve those data items.
The intelligent classification and retrieval scheme uses a genetic algorithm, this genetic algorithm attempts to discover the several different classifiers, each of which classifies a set of homogeneous components in terms of charac-teristics.
Searching for a component will be performed by examining the user preferences against the classifiers rather than the actual components this will result in a faster searching of components. The graph in figure 5 shows the search time of components in the proposed system. The graph showing the intelligent classification and retrieval scheme search time performs well for the threshold values of 50% and 60% with successful outcome.

Conclusion and Future Work
An effective software reuse repository software tool is designed and successfully implemented with the proposed intelligent classification and retrieval scheme. Our Classification is based on small set of classifiers which are evolved using the genetic algorithm. Each classifier evolved by the genetic algorithm attempts to classify the large number of software components according to the common characteristics. Retrieval of the relevant components is performed by comparing the user requirements with those of the classifiers. Thus, comparing a component's specification with only those of the classifiers instead of the entire set of available components in the repository will significantly save the search time of the components. A threshold is also used when evolving the classifiers, which determines the degree (percentage) of similarity with a classifier that is required to classify a component in a certain class. The threshold value has been found to have a profound influence in both the classifier's design phase (with the GA) and the retrieval phase.
Future work involved with this proposed intelligent scheme is the multimedia presentation of the components. Ranking of components that are returned by the system can also be included as an enhancement to future work.