Abstract:
The increasing demand for proactive patient outreach and follow-up within health insurance providers and large hospital systems has led to the adoption of automated message classification systems. Although state-of-the-art tools like Azure and OpenAI o1 series are available, their costs are prohibitive for small companies. Therefore, using cost-effective traditional tools for text classification is still a valuable alternative. This study investigates how OpenAI’s ChatGPT can generate synthetic data to balance class distributions in medical message datasets and improve the performance of binary classification models. The primary aim is to classify messages that indicate medical needs, aiding in patient outreach and efficient resource allocation. We applied Natural Language Processing (NLP) techniques, including Logistic Regression, Naive Bayes, Random Forest, XGBoost, and BERTimbau, a Portuguese transformer-based model. Models were trained on both original and augmented datasets, and results showed a significant increase in classification accuracy, particularly with BERTimbau, whose performance improved from 84% to 99%. Synthetic data also reduced overfitting by narrowing the gap between training and validation accuracies while improving model robustness on unseen data. Our findings demonstrate that synthetic data from ChatGPT effectively addresses class imbalance, improving the robustness and accuracy of medical message classification models. However, challenges remain in correctly classifying non-medical messages, indicating that further refinement in synthetic data generation is necessary for optimal performance in real-world scenarios. These results suggest that such approaches can significantly enhance automated patient engagement efforts.
Referência:
BRITO, Adriana Camargo; LEAL, Adriano Galindo; CUSTÓDIO, Gustavo Torres; RODRIGUES, Edilson José; MARTINS, Michele Marcia Viana; GUIRADO, Vinícius Monteiro de Paula. Enchancing proactive patient outreach with medical message classification using synthetic data from ChatGPT. In: IEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE, 2025, Trondheum, Noruega. Proceedings… 5p.
Documento com acesso restrito. Logar na BiblioInfo, Biblioteca GITEB/IPT para acessar o trabalho em PDF: