PMI Based Clustering Algorithm for
Feature Reduction in Text
Classification

P.Jeyadurga; Prof. P. R. Vijaya Lakshmi; J.S.Kanchana

抽象的

PMI Based Clustering Algorithm for Feature Reduction in Text Classification

P.Jeyadurga, Prof. P. R. Vijaya Lakshmi, J.S.Kanchana

Feature clustering is a feature reduction method that reduces the dimensionality of feature vectors for text classification. In this paper an incremental feature clustering approach is proposed that uses Semantic similarity to cluster the features. Pointwise Mutual Information (PMI) is widely used word similarity measure, which finds Semantic similarity between two words and is an alternative for distributional similarity. PMI computation requires simple statistics about two words for similarity measure, that is number of cooccurrences or correlations between two concepts of fixed size are computed. Once the words from preprocessed documents are fed, clusters are formed and one feature (head word) is identified for each cluster which are used for indexing the document. PMI assumes that a word have single sense, but clustering can be optimized further if polysemies of words are considered. Hence PMI may be combined with PMImax, which estimates correlation between the closest senses of two words also, thereby better feature reduction and execution time compared with other approaches.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

期刊亮点

应用科学植物学流体动力学生物化学生物医学工程航空航天工程色谱技术

索引于

学术钥匙

研究圣经

引用因子

宇宙IF

参考搜索

哈姆达大学

世界科学期刊目录

学者指导

国际创新期刊影响因子（IIJIF）

国际组织研究所 (I2OR)

宇宙

国际期刊

制药科学医学科学工程普通科学

国际科学、工程与技术创新研究杂志

抽象的

PMI Based Clustering Algorithm for Feature Reduction in Text Classification

期刊亮点

索引于

国际期刊

地址