Comparing different approaches for text classification in a dataset of customers reviews

Compartilhe:

Resumo:

In this work, we received a real dataset provided by a retail clothing store. With the objective of exploring different approaches for different types of datasets, we experimented with three scenarios: in the first, we have a fully labeled dataset; in the second, we removed most of labels, having a dataset where only some instances are labeled; in the third scenario, we removed all labels from the dataset. For the first approach, we employed the XGBoost and BERTimbau, a BERT-based model. For the second one, we used the models XGBoost, BERTimbau and GAN-BERT. Finally, for the third one, we performed zero-shot classification using the RoBERTa as a pre-trained model. Results showed that XGBoost achieved similar results to the BERTimbau model in the fully labeled approach and close results to the GAN-BERT in the partially labeled approach. Considering that the XGBoost is less computationally costly than other models, it proves itself a satisfactory solution for the presented dataset.

Referência:

CUSTÓDIO, Gustavo Torres; OLIVEIRA, Marcio De Lima; COSTA, Richard Silva; PIMENTEL, Douglas Roberto De Matos; SALÉS, Elisa Morandé; SILLES, Felipe Silva. Comparing different approaches for text classification in a dataset of customers reviews. In: INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND TECHONLOGY MANAGEMENT VIRTUAL, 20., 2024, São Paulo. Proceedings… 9p. 

Acesso ao trabalho no site do Evento:

https://www.tecsi.org/contecsi/index.php/contecsi/20thCONTECSI/paper/view/7258

INSCREVA-se em nossa newsletter

Receba nossas novidades em seu e-mail.

SUBSCRIBE to our newsletter

Receive our news in your email.

Pular para o conteúdo