Copyright © 2011 Scientific & Academic Publishing. All Rights Reserved.
Abstract
Organizations are being flooded with massive transactional data. This data is of no use, if not analysed properly, to reach to any strategic decision and ultimately to achieve competitive advantage. The efficient data analysis is one of the success strategies. The analysis is highly dependent on the quality of the data. The clean data will lead to efficient data analysis. In this paper, authors suggest application of similarity metrics in context free data cleaning and a mechanism to suggest correct data based on learning from patterns derived in the prior phase. The sequence similarity metrics like Needlemen-Wunch, Jaro-Winkler, Chapman Ordered Name Compound Similarity and Smit-Watermen are used to find distance of two values. Experimental results show that how the approach not only effectively cleaning the data but suggesting suitable values in order to reduce the data entry errors.
Keywords:
Context Free Data Cleaning, Similarity Metrics
Paper's body in HTML will come soon.
Reference
| [1] | Hui Xiong, Gaurav Pandey, Michael Steinbach, and Vipin Kumar, Enhancing Data Analysis with Noise Removal, IEEE Transactions on Knowledge and Data Engineering, Vol 18, No. 3, pp. 304-319, 2006 |
| [2] | Lukasz Ciszak, Application of Clustering and Association Methods in Data Cleaning. In proceedings of the International Multi-conference on Computer Science and Information Technology, 2008 |
| [3] | Sohil Pandya and Dr. P. V. Virparia, Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning, Inter-national Journal on Computer Science & Security, Vol. 3, No 5, pp. 344-350, Nov. 2009 |
| [4] | Sohil Pandya and Dr. P. V. Virparia, Data Cleaning in Knowledge Discovery in Databases: Various Approahces, In proceedings of National Seminar on Current Trends in ICT, India, Feb. 2009 |
| [5] | W Cohen, P Ravishankar, and S Fienberg, A Comparison of String Distance Metrics for Name Matching Tasks, In the proceedings of IJCAI 2003 |
| [6] | http://en.wikipedia.org |
| [7] | http://www.dcs.shef.ac.uk/~sam/simmetric.html |