Multilabel over-sampling and under-sampling with class alignment for imbalanced multilabel text classification

Multilabel over-sampling and under-sampling with class alignment for imbalanced multilabel text classification. Journal of Information and Communication Technology, 20 (3). pp. 423-456. ISSN 2180-3862 (2021)



Abstract

Simultaneous multiple labelling of documents, also known as multilabel text classification, will not perform optimally if the class is highly imbalanced. Class imbalanced entails skewness in the fundamental data for distribution that leads to more difficulty in classification. Random over-sampling and under-sampling are common approaches to solve the class imbalanced problem. However, these approaches have several drawbacks; the under-sampling is likely to dispose of useful data, whereas the over-sampling can heighten the probability of overfitting. Therefore, a new method that can avoid discarding useful data and overfitting problems is needed. This study proposes a method to tackle the class imbalanced problem by combining multilabel over-sampling and under-sampling with class alignment (ML-OUSCA). In the proposed ML-OUSCA, instead of using all the training instances, it draws a new training set by over-sampling small size classes and under-sampling big size classes.

Item Type: Article
Keywords: Data mining, Multilabel text Classification, Class imbalance problem, Resampling method, Class alignment
Taxonomy: By Subject > Computer & Mathematical Sciences > Computer Science
By Subject > Computer & Mathematical Sciences > Information Technology
Local Content Hub: Subjects > Computer and Mathematical Sciences
Depositing User: Muslim Ismail @ Ahmad
Date Deposited: 21 Feb 2022 23:29
Last Modified: 22 Feb 2022 08:51
Related URLs:

Actions (login required)

View Item View Item