EST

Call for paper
April Edition 2017

International Journal of Applied Information Systems solicits high quality original research papers for the
March 15, 2017
April 2017 Edition of the journal.
The last date of research paper submission is
March 15, 2017
SUBMIT YOUR PAPER

Number 3

Stemming Algorithms: A Comparative Study and their Analysis

journal image
  • International Journal of Applied Information Systems
  • Foundation of Computer Science (FCS), NY, USA
  • Volume 4 - Number 3
  • Year of Publication: 2012
  • Authors: Deepika Sharma
  • 10.5120/ijais12-450655
 Download
2543
  • Deepika Sharma 2012. Stemming Algorithms: A Comparative Study and their Analysis. International Journal of Applied Information Systems. 4, 3 (September 2012), 7-12. DOI=http://dx.doi.org/10.5120/ijais450655
  • @article{10.5120/ijais2017451568,
    author = {Deepika Sharma},
    title = {Stemming Algorithms: A Comparative Study and their Analysis},
    journal = {International Journal of Applied Information Systems},
    issue_date = {September 2012},
    volume = {4},
    number = {},
    month = {September},
    year = {2012},
    issn = {},
    pages = {7-12},
    numpages = {},
    url = {/archives/volume4/number3/279-0655},
    doi = { 10.5120/ijais12-450655},
    publisher = { xA9 2010 by IJAIS Journal},
    address = {}
    }
    
  • %1 450655
    %A Deepika Sharma
    %T Stemming Algorithms: A Comparative Study and their Analysis
    %J International Journal of Applied Information Systems
    %@ 
    %V 4
    %N 
    %P 7-12
    %D 2012
    %I  xA9 2010 by IJAIS Journal
    

Abstract

Stemming is an approach used to reduce a word to its stem or root form and is used widely in information retrieval tasks to increase the recall rate and give us most relevant results. There are number of ways to perform stemming ranging from manual to automatic methods, from language specific to language independent each having its own advantage over the other. This paper represents a comparative study of various available stemming alternatives widely used to enhance the effectiveness and efficiency of information retrieval.

References

  1. WB Frakes, 1992,"Stemming Algorithm ", in "Information Retrieval Data Structures and Algorithm", Chapter 8, page 132-139.
  2. A. Ramanathan and D. Rao, 2003. " A lightweight stemmer for Hindi". In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), on Computational Linguistics for South Asian Languages (Budapest, Apr. ) Workshop.
  3. J. Savoy 2008. " Searching strategies for the Hungarian language". Inf. Process. Manage. 44, 1, 310–324.
  4. P. McNamee, and J. Mayfield 2004. " Character n-gram tokenization for European language text retrieval", Inf. Retr. 7(1-2), 73–97.
  5. D. W. Oard, G. A. Levow and C. I. Cabezas 2001. CLEF experiments at Maryland:" Statistical stemming and back off translation". In Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (CLEF), Springer, London, 176–187.
  6. WB Frakes 1984. "Term Conflation for Information Retrieval" in Research and Development in Information Retrieval, ed. C. van Rijsbergen. New York: Cambridge University Press.
  7. WB Frakes 1992 "LATTIS: A Corporate Library and Information System for the UNIX Environment," Proceedings of the National Online Meeting, Medford, N. J. : Learned Information Inc. , 137-42.
  8. M. Hafer and S. Weiss 1974. "Word Segmentation by Letter Successor Varieties," Information Storage and Retrieval, 10, 371-85.
  9. G. Adamson and J. Boreham 1974. "The Use of an Association Measure Based on Character Structure to Identify Semantically Related Pairs of Words and Document Titles," Information Storage and Retrieval, 10, 253-60.
  10. M. F. Porter 1980. "An Algorithm for Suffix Stripping Program", 14(3), 130-37.
  11. J. B. Lovins 1968. "Development of a Stemming Algorithm. " Mechanical Translation and Computational Linguistics, 11(1-2), 22-31.
  12. V. I. Levenstein 1966. Binary codes capable of correcting deletions, insertions and reversals. Commun. ACM 27, 4, 358–368
  13. A. K. Jain, M. N. Murthy, and P. J. Flynn 1999. "Data clustering": A review. ACM Comput. Surv. 31, 3, 264–323.
  14. WB Frakes and C. J. Fox 2003. Strength and similarity of affix removal stemming algorithms. SIGIR.
  15. J. Goldsmith 2001. " Linguistica: Unsupervised learning of the morphology of a natural language". Comput. Linguist. 27, 2, 153–198.
  16. J. Xu and W. B. Croft 1998. " Corpus-based stemming using co occurrence of word variants". ACM Trans. Inf. Syst. 16, 1, 61–81.
  17. M. Bacchin, N. Ferro, and M. Melucci 2005. "A probabilistic model for stemmer generation". Inf. Process. Manage. 41, 1, 121–137.
  18. P. Majumder, M Mitra, S. K. Parui, and G. Kole (ISI), P. Mitra (IIT), and K. K. Dutta. "YASS: Yet another Suffix Stripper", published in ACM Transaction on Information System (TOIS), Volume 25 Issue 4, October 2007, Chapter 18, Page 5-6.
  19. JH Paik, Mandar Mitra, Swapan K. Parui, Kalervo Jarvelin, "GRAS : An effective and efficient stemming algorithm for information retrieval", published in ACM Transaction on Information System (TOIS), Volume 29 Issue 4, December 2011, Chapter 19, page 20-24.

Keywords

Information Retrieval, Stemming Algorithm, Conflation Methods

Index Terms

Computer Science
Information Sciences