EST

Call for paper
April Edition 2017

International Journal of Applied Information Systems solicits high quality original research papers for the
March 15, 2017
April 2017 Edition of the journal.
The last date of research paper submission is
March 15, 2017
SUBMIT YOUR PAPER

Number 2

An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality

User Rating: 0 / 5

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 

PrintEmail

journal image
 Download
1616
  • Amanpreet Kaur Toor and Amarpreet Singh 2014. An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality. International Journal of Applied Information Systems. 7, 2 (April 2014), 5-9. DOI=http://dx.doi.org/10.5120/ijais451136
  • @article{10.5120/ijais2017451568,
    author = {Amanpreet Kaur Toor and Amarpreet Singh},
    title = {An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality},
    journal = {International Journal of Applied Information Systems},
    issue_date = {April 2014},
    volume = {7},
    number = {},
    month = {April},
    year = {2014},
    issn = {},
    pages = {5-9},
    numpages = {},
    url = {/archives/volume7/number2/618-1136},
    doi = { 10.5120/ijais14-451136},
    publisher = { xA9 2013 by IJAIS Journal},
    address = {}
    }
    
  • %1 451136
    %A Amanpreet Kaur Toor
    %A Amarpreet Singh
    %T An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality
    %J International Journal of Applied Information Systems
    %@ 
    %V 7
    %N 
    %P 5-9
    %D 2014
    %I  xA9 2013 by IJAIS Journal
    

Abstract

The cluster analysis method is one of the critical methods in data mining; this method of clustering algorithm will manipulate the clustering results directly. This paper proposes an Advanced Clustering Algorithm in order to addresses the concern of high dimensionality and large data set [1]. The Advanced Clustering Algorithm method avoids computing the distance of each data object to the cluster recursively and save the execution time. ACA requires a simple data structure to store information in each iteration, which is to be used in the next iteration. Experimental results show that the Advanced Clustering Algorithm method can effectively improve the speed of clustering and accuracy, reducing the computational complexity of the traditional algorithm Kohonen SOM. This paper includes Advanced Clustering Algorithm (ACA) and its simulated experimental results with different data sets.

References

  1. Yuan F, Meng Z. H, Zhang H. X and Dong C. R, "A New Algorithm to Get the Initial Centroids," Proc. of the 3rd International Conference on Machine Learning and Cybernetics, pp. 26–29, August 2004.
  2. Sun Jigui, Liu Jie, Zhao Lianyu, "Clustering algorithms Research",Journal of Software ,Vol 19,No 1, pp. 48-61,January 2008.
  3. Amanpreet Kaur Toor, Amarpreet Singh, " Analysis of Clustering Algorithm based on Number of Clusters, error rate, Computation Time and Map Topology on large Data Set", International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Volume 2, Issue 6, November- December 2013.
  4. Amanpreet Kaur Toor, Amarpreet Singh, " A Survey paper on recent clustering approaches in data mining", International Journal of Advanced Research in Computer Science and Software Engineering Vol 3, Issue 11, November 2013.
  5. Sun Shibao, Qin Keyun," Research on Modified K-means Data Cluster Algorithm"I. S. Jacobs and C. P. Bean, "Fine particles, thin films and exchange anisotropy," Computer Engineering, vol. 33, No. 13, pp. 200– 201,July 2007.
  6. Merz C and Murphy P, UCI Repository of Machine Learning Databases, Available: ftp://ftp. ics. uci. edu/pub/machine-learning-databases
  7. Fahim A M,Salem A M,Torkey F A, "An efficient enhanced k-means clustering algorithm" Journal of Zhejiang University Science A, Vol. 10, pp:1626-1633,July 2006.
  8. Zhao YC, Song J. GDILC: A grid-based density isoline clustering algorithm. In: Zhong YX, Cui S, Yang Y, eds. Proc. of theInternet Conf. on Info-Net. Beijing: IEEE Press,2001. 140?145. http://ieeexplore. ieee. org/iel5/7719/21161/00982709. pdf
  9. Huang Z, "Extensions to the k-means algorithm for clustering large data sets with categorical values," Data Mining and Knowledge Discovery, Vol. 2, pp:283–304, 1998.
  10. K. A. AbdulNazeer, M. P. Sebastian, "Improving the Accuracy and Efficiency of the k-means Clustering Algorithm",Proceeding of the World Congress on Engineering, vol 1,london, July 2009.
  11. Fred ALN, Leitão JMN. Partitionalvs hierarchical clustering using a minimum grammar complexity approach. In: Proc. of the SSPR & SPR 2000. LNCS 1876, 2000. 193?202. http://www. sigmod. org/dblp/db/conf/sspr/sspr2000. htm
  12. Gelbard R, Spiegler I. Hempel's raven paradox: A positive approach to cluster analysis. Computers and Operations Research, 2000,27(4):305?320.
  13. Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proc. of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tucson, 1997. 146?151.
  14. Ding C, He X. K-Nearest-Neighbor in data clustering: Incorporating local information into global optimization. In: Proc. of the ACM Symp. on Applied Computing. Nicosia: ACM Press, 2004. 584?589. http://www. acm. org/conferences/sac/sac2004/
  15. HinneburgA,KeimD. An efficient approach to clustering in large multimedia databases with noise. In:AgrawalR,StolorzPE,Piatetsky- Shapiro G,eds. Proc. of the 4th Int'l Conf. on Knowledge Discovery and Data Mining(KDD'98). New York:AAAIPress,1998. 58~65.
  16. ZhangT,RamakrishnanR,LivnyM. BIRCH:An efficient data clustering method for very large databases. In:JagadishHV,MumickIS,eds. Proc. of the 1996 ACM SIGMOD Int'l Conf. on Management of Data. Montreal:ACM Press,1996. 103~114.
  17. Birant D, Kut A. ST-DBSCAN: An algorithm for clustering spatial- temporal data. Data & Knowledge Engineering, 2007,60(1): 208-221.

Keywords

ACA, SOM, Clustering, Large Data Set, High Dimensionality, Cluster Analysis

Index Terms

Computer Science
Information Sciences