The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Comparative Analysis of NLP Methods for ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • More
    Article info Full article Related articles

Comparative Analysis of NLP Methods for Emotion Detection in Student Responses During COVID-19
Alexander Maret   Cade Dees   Yule Fu     All authors (7)

Authors

 
Placeholder
https://doi.org/10.51387/26-NEJSDS105
Pub. online: 1 June 2026      Type: Case Study, Application, And/or Practice Article      Open accessOpen Access
Area: NextGen

Accepted
19 May 2026
Published
1 June 2026

Abstract

Natural language processing (NLP) algorithms have demonstrated significant capabilities in understanding responses to open-ended questions in survey data. However, the reliability and uncertainty of these methods on this task still need to be thoroughly investigated. To address this issue, this paper presents a comprehensive comparative analysis of various NLP methods for detecting fine-grained emotions in student responses about their mental health during the COVID-19 pandemic. The evaluated models include a Lexicon-based approach, the bag-of-words (BoW) model, Term Frequency-Inverse Document Frequency (TF-IDF), a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model, MentalBERT, and OpenAI’s GPT-3.5. We carefully assess the efficacy of these models in accurately classifying emotions into predetermined categories using performance metrics such as accuracy and F1 score. Furthermore, model stability and distinguishing ability were quantified through repetitive cross-validation and the Area Under the Receiver Operating Characteristic Curve (AUC). The consistency of emotion detection across different models is also evaluated. The study highlights that the effectiveness of employing NLP methods for mental health analysis may vary depending on the emotions being analyzed, and their stability and uncertainty require thorough examination. Our work can provide valuable guidance for data scientists on applying NLP methods to survey data, particularly for understanding survey respondents’ emotions.

References

[1] 
Amona, E., West, A., White, A., Sahoo, I., Chan, D. M., Gandhi, P. and Qian, Y. (2025). Breakdown of COVID effects on students’ mental health at the beginning of the pandemic. PLOS Mental Health 2(6) 0000363.
[2] 
Barry, J. (2017). Sentiment Analysis of Online Reviews Using Bag-of-Words and LSTM Approaches. In AICS 272–274.
[3] 
Bogatinovski, J., Todorovski, L., Deroski, S. and Kocev, D. (2022). Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications 203 117215.
[4] 
Boon-Itt, S., Skunkan, Y. et al. (2020). Public perception of the COVID-19 pandemic on Twitter: sentiment analysis and topic modeling study. JMIR Public Health and Surveillance 6(4) 21978.
[5] 
Bouazizi, M. and Ohtsuki, T. (2019). Multi-class sentiment analysis on Twitter: Classification performance and challenges. IEEE Access 7 46273–46284.
[6] 
Browning, M. H., Larson, L. R., Sharaievska, I., Rigolon, A., McAnirlin, O., Mullenbach, L., Cloutier, S., Vu, T. M., Thomsen, J., Reigner, N. et al. (2021). Psychological impacts from COVID-19 among university students: Risk factors across seven states in the United States. PloS One 16(1) 0245327.
[7] 
Chan, D. M., Broda, M. D., Winslow, J., Jones, Q., Luce, C., McGinnis, H. A., Tomlinson, C. A., Hamid, H. and Ma, J. (2022). The Effects of Prime Supporters within a College Student’s Support Network. Nonlinear Dynamics, Psychology & Life Sciences 26(4).
[8] 
Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 321–357.
[9] 
Copeland, W. E., McGinnis, E., Bai, Y., Adams, Z., Nardone, H., Devadanam, V., Rettew, J. and Hudziak, J. J. (2021). Impact of COVID-19 pandemic on college student mental health and wellness. Journal of the American Academy of Child & Adolescent Psychiatry 60(1) 134–141.
[10] 
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G. and Ravi, S. (2020). GoEmotions: A Dataset of Fine-Grained Emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 4040–4054.
[11] 
Desmet, B. and Hoste, V. (2013). Emotion detection in suicide notes. Expert Systems with Applications 40(16) 6351–6358. https://doi.org/10.1016/j.eswa.2013.05.050.
[12] 
Devlin, J., Chang, M. -W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171–4186.
[13] 
Floridi, L. and Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 681–694.
[14] 
Guo, X., Zhang, G., Wang, S. and Chen, Q. (2020). Multi-way matching based fine-grained sentiment analysis for user reviews. Neural Computing and Applications 32(12) 7729–7743.
[15] 
Hofmann, T., Schölkopf, B. and Smola, A. J. (2008). Kernel methods in machine learning. The Annals of Statistics 36(3) 1171–1220. https://doi.org/10.1214/009053607000000677. MR2418654
[16] 
Jain, B., Goyal, G. and Sharma, M. (2024). Evaluating Emotional Detection & Classification Capabilities of GPT-2 & GPT-Neo Using Textual Data. In 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 12–18. https://doi.org/10.1109/Confluence60223.2024.10463396.
[17] 
Ji, S., Zhang, T., Ansari, L., Fu, J., Tiwari, P. and Cambria, E. (2022). Mentalbert: Publicly available pretrained language models for mental healthcare. In Proceedings of the 13th Language Resources and Evaluation Conference 7184–7190.
[18] 
Kim, H., Rackoff, G. N., Fitzsimmons-Craft, E. E., Shin, K. E., Zainal, N. H., Schwob, J. T., Eisenberg, D., Wilfley, D. E., Taylor, C. B. and Newman, M. G. (2022). College mental health before and during the COVID-19 pandemic: results from a nationwide survey. Cognitive Therapy and Research 46(1) 1–10.
[19] 
Lossio-Ventura, J. A., Weger, R., Lee, A. Y., Guinee, E. P., Chung, J., Atlas, L., Linos, E. and Pereira, F. (2024). A comparison of ChatGPT and fine-tuned open pre-trained transformers (OPT) against widely used sentiment analysis tools: sentiment analysis of COVID-19 survey data. JMIR Mental Health 11 50150.
[20] 
Mohammad, S. M. (2018). Word Affect Intensities. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1027.
[21] 
Mohammad, S. M. and Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence 29(3) 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x. MR3093841
[22] 
Mustafa, R. U., Ashraf, N., Ahmed, F. S., Ferzund, J., Shahzad, B. and Gelbukh, A. (2020). A Multiclass Depression Detection in Social Media Based on Sentiment Analysis. In International Conference on Intelligent Systems Design and Applications 879–889 Springer.
[23] 
Nadkarni, P. M., Ohno-Machado, L. and Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association 18(5) 544–551.
[24] 
Nardi, P. M. (2018) Doing survey research: A guide to quantitative methods. Routledge, an imprint of the Taylor & Francis Group.
[25] 
Naseem, U., Razzak, I., Khushi, M., Eklund, P. W. and Kim, J. (2021). COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE Transactions on Computational Social Systems 8(4) 1003–1015.
[26] 
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N. and Mian, A. (2023). A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology.
[27] 
OpenAI (2023). GPT-4 Technical Report. arXiv preprint arXiv:2303.08774.
[28] 
Pröllochs, N., Feuerriegel, S. and Neumann, D. (2018). Statistical inferences for polarity identification in natural language. PloS one 13(12) 0209323.
[29] 
Qader, W. A., Ameen, M. M. and Ahmed, B. I. (2019). An Overview of Bag of Words: Importance, Implementation, Applications, and Challenges. In 2019 International Engineering Conference (IEC) 200–204. https://doi.org/10.1109/IEC47844.2019.8950616.
[30] 
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. et al. (2018). Improving language understanding by generative pre-training. Technical Report, OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/.
[31] 
Rahman, S. S. M. M., Biplob, K. B. M. B., Rahman, M. H., Sarker, K. and Islam, T. (2020). An investigation and evaluation of N-Gram, TF-IDF and ensemble methods in sentiment classification. In Cyber Security and Computer Science: Second EAI International Conference, ICONCS 2020, Dhaka, Bangladesh, February 15-16, 2020, Proceedings 2 391–402. Springer.
[32] 
Ramos, J. et al. (2003). Using TF-IDF to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning 242 29–48. Citeseer.
[33] 
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1) 1–47.
[34] 
Smola, A. J. and Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing 14 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88. MR2086398
[35] 
Son, C., Hegde, S., Smith, A., Wang, X. and Sasangohar, F. (2020). Effects of COVID-19 on college students’ mental health in the United States: Interview survey study. Journal of Medical Internet Research 22(9) 21279.
[36] 
Stoltzfus, J. C. (2011). Logistic regression: a brief primer. Academic Emergency Medicine 18(10) 1099–1104.
[37] 
Sundaram, V., Ahmed, S., Muqtadeer, S. A. and Reddy, R. R. (2021). Emotion analysis in text using TF-IDF. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 292–297. IEEE.
[38] 
Taboada, M., Brooke, J., Tofiloski, M., Voll, K. and Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics 37(2) 267–307.
[39] 
Tang, T., Tang, X. and Yuan, T. (2020). Fine-tuning BERT for multi-label sentiment analysis in unbalanced code-switching text. IEEE Access 8 193248–193256.
[40] 
Tsoumakas, G. and Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3) 1–13.
[41] 
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, . and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30.
[42] 
Wang, X., Hegde, S., Son, C., Keller, B., Smith, A. and Sasangohar, F. (2020). Investigating mental health of US college students during the COVID-19 pandemic: Cross-sectional survey study. Journal of Medical Internet Research 22(9) 22817.
[43] 
Weger, R., Lossio-Ventura, J. A., Rose-McCandlish, M., Shaw, J. S., Sinclair, S., Pereira, F., Chung, J. Y., Atlas, L. Y. et al. (2023). Trends in language use during the COVID-19 pandemic and relationship between language use and mental health: text analysis based on free responses from a longitudinal study. JMIR Mental Health 10(1) 40899.
[44] 
Wright, L., Burton, A., McKinlay, A., Steptoe, A. and Fancourt, D. (2022). Public opinion about the UK government during COVID-19 and implications for public health: A topic modeling analysis of open-ended survey response data. PloS One 17(4) 0264134.
[45] 
Zeng, X. and Martinez, T. R. (2000). Distribution-balanced stratified cross-validation for accuracy estimation. Journal of Experimental & Theoretical Artificial Intelligence 12(1) 1–12.

Full article Related articles PDF XML
Full article Related articles PDF XML

Copyright
© 2026 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Survey responses COVID-19 Natural language processing Emotion detection

Funding
This research was supported by NSF Research Experiences for Undergraduates (REU), grant number DMS1950015, and by the VCU College of Humanities and Sciences Catalyst.

Metrics
since December 2021
8

Article info
views

3

Full article
views

3

PDF
downloads

1

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy