抽象的

Text Classification Using Symbolic Data Analysis

Sangeetha N

In the real world, an operational text classification system is usually placed in the environment where the amount of human-annotated training documents is small in spite of thousands of classes. In this environment text classifier are probably the most appropriate methods for the practical systems rather than other complex learning models. Text classifiers are basically used for free flowing texts that are basically unstructured text documents and classification is done with a statistical feature weighting method which involves a pre-processing- a method wherein texts are reduced by eliminating digits, punctuations, hyphens, stop words and high/low frequency words and by applying stemming. This strategy of text classification cannot be applied to the domain of unstructured texts describing the advertisements, since these texts give the description in terms of attribute values. Since none of the text classifiers are useful in classifying such texts in an unstructured text document, the concept of symbolic data analysis is introduced. Symbolic Data Analysis (SDA) is a new domain in the area of knowledge discovery and data management, related to multivariate analysis, pattern recognition, databases and artificial intelligence. In this method of Symbolic Data Analysis for classification of unstructured text documents, uses a symbolic database and querying processes are proposed. From the proposed technique it seems that it is one of the efficient techniques to classify texts in unstructured text documents and hence is introduced for the better result when dealing with unstructured text documents

免责声明: 此摘要通过人工智能工具翻译,尚未经过审核或验证