Evaluation of Approaches Based on the BERT Model for Opinion Mining about the Cachaça Beverage

Authors

DOI:

https://doi.org/10.24215/16666038.25.e09

Keywords:

BERT, Cachaça, Opinion Mining, Sentiment Analysis, Social Network

Abstract

Opinion mining is a natural language processing task that aims to classify user opinions expressed on e-commerce platforms, social networks and other media. It is an important tool for decision making, monitoring products/services, detecting trends, developing marketing strategies, among others. Much research has been carried out addressing opinions in the English language. The Portuguese language is still very lacking in linguistic resources aimed at training machine learning models. This work contributes to the evaluation of approaches based on the BERT language model for opinion mining in the Portuguese language, in particular, by creating and evaluating a dataset with labeled data in the domain of the beverage called Cachaça. This is a popular drink in Brazil, and of great economic importance. As a result of the experimental evaluation, the approaches based on the BERT model stood out in relation to two baselines, and in a cross-domain evaluation, they achieved values greater than 0.97 in the F1 metric for classification into 2 classes and 0.64 for 3 classes, in the dataset labeled for the cachaça beverage.

Downloads

Download data is not yet available.

References

M. Wankhade, A. C. S. Rao, y C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artificial Intelligence Review, vol. 55, no. 7, pp. 5731–5780, 2022. doi: 10.1007/s10462-022-10144-1.

A. Yadollahi, A. G. Shahraki, y O. R. Zaiane, “Current state of text sentiment analysis from opinion to emotion mining,” ACM Computing Surveys, vol. 50, no. 2, pp. 25:1–25:33, May 2017. doi: 10.1145/3057270.

D. A. Pereira, “A survey of sentiment analysis in the portuguese language,” Artificial Intelligence Review, vol. 54, no. 2, pp. 1087–1115, 2021. doi: 10.1007/s10462-020-09870-1.

W. Zhang, Y. Deng, B. Liu, S. J. Pan, y L. Bing, “Sentiment analysis in the era of large language models: A reality check,” arXiv preprint arXiv:2305.15005, 2023.

Instituto Brasileiro da Cachaça, “IBRAC,” 2022. Accedido en Septiembre, 2025. Disponible en: https://ibrac.net/.

J. Devlin, M.-W. Chang, K. Lee, y K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” en Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. Disponible en: https://aclanthology.org/N19-1423.

F. D. Souza y J. B. de Oliveira e Souza Filho, “Sentiment analysis on brazilian portuguese user reviews,” arXiv preprint arXiv:2112.05459, 2021.

H. Brum y M. d. G. Volpe Nunes, “Building a sentiment corpus of tweets in Brazilian Portuguese,” en Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA), May 2018, pp. 4167–4172. Disponible en: https://aclanthology.org/L18-1658.

L. da Câmara Cascudo, Prelúdio da Cachaça. São Paulo: Global Editora, 2015.

E. Cambria, B. Schuller, Y. Xia, y C. Havasi, “New avenues in opinion mining and sentiment analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21, 2013. doi: 10.1109/MIS.2013.30.

E. Cambria, “Affective computing and sentiment analysis,” IEEE Intelligent Systems, vol. 31, no. 2, pp. 102–107, Mar. 2016. doi: 10.1109/MIS.2016.31.

I. Chaturvedi, E. Cambria, R. E. Welsch, y F. Herrera, “Distinguishing between facts and opinions for sentiment analysis: Survey and challenges,” Information Fusion, vol. 44, pp. 65–77, 2018. Disponible en: http://www.sciencedirect.com/science/article/pii/S1566253517303901.

J. Cui, Z. Wang, S.-B. Ho, y E. Cambria, “Survey on sentiment analysis: evolution of research methods and topics,” Artificial Intelligence Review, vol. 56, pp. 8469–8510, August 2023. doi: 10.1007/s10462-022-10386-z.

W. Medhat, A. Hassan, y H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, 2014. doi: 10.1016/j.asej.2014.04.011.

J. C. F. Neto et al., “Approaches based on language models for aspect extraction for sentiment analysis in the portuguese language,” Neural Computing and Applications, vol. 36, pp. 19 353–19 363, November 2024. doi: 10.1007/s00521-024-10265-4.

L. Zhang y B. Liu, Aspect and entity extraction for opinion mining. Springer, 2014, volumen 1 de la serie Studies in Big Data, cap. de libro.

W. Zhang et al., “A survey on aspect-based sentiment analysis: tasks, methods, and challenges,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 11, pp. 11 019–11 038, Nov. 2022. doi: 10.1109/TKDE.2022.3230975.

M. Araújo, A. Pereira, y F. Benevenuto, “A comparative study of machine translation for multilingual sentence-level sentiment analysis,” Information Sciences, vol. 512, pp. 1078 – 1102, 2020. Disponible en: http://www.sciencedirect.com/science/article/pii/S0020025519309879.

B. Cardoso y D. A. Pereira, “Evaluating an aspect extraction method for opinion mining in the portuguese language,” en Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. Porto Alegre, RS, Brasil: SBC, 2020, pp. 137–144. Disponible en: https://sol.sbc.org.br/index.php/kdmile/article/view/11969.

A. A. L. Cunha, M. C. Costa, y M. A. C. Pacheco, “Sentiment analysis of youtube video comments using deep neural networks,” en International Conference on Artificial Intelligence and Soft Computing (ICAISC). Cham: Springer International Publishing, 2019, pp. 561–570. doi: 10.1007/978-3-030-20912-4_51.

R. P. da Silva et al., “Cross-language approach for sentiment classification in brazilian portuguese with convnets,” en Information Technology - New Generations, S. Latifi, Ed. Cham: Springer International Publishing, 2018, pp. 311–316.

V. T. F. Kuwaki, M. N. Ladeira, M. G. G. Benitez, y R. J. T. Junior, “Building a corpus from supermarket reviews in portuguese for document-level sentiment analysis,” Anais do XIII Computer on the Beach, vol. 13, pp. 119–125, May 2022.

M. Won y J. Fernandes, “Ss-pt: A stance and sentiment data set from portuguese quoted tweets,” Lecture Notes in Computer Science, vol. 13208, pp. 110–121, March 2022. doi: 10.1007/978-3-030-98305-5_11.

F. D. Souza y J. B. d. O. e. S. Filho, “Bert for sentiment analysis: Pre-trained and fine-tuned alternatives,” en International Conference on Computational Processing of the Portuguese Language, vol. 13208. Springer, 2022, pp. 209–218.

F. Souza, R. Nogueira, y R. Lotufo, “Bertimbau: Pretrained bert models for brazilian portuguese,” en Intelligent Systems, R. Cerri y R. C. Prati, Eds. Cham: Springer International Publishing, 2020, pp. 403–417. doi: 10.1007/978-3-030-61377-8_28.

P. Silva et al., “Cachacaner: a dataset for named entity recognition in texts about the cachaça beverage,” Language Resources and Evaluation, 2023, publicación en línea. doi: 10.1007/s10579-023-09665-0.

D. Calbino, M. J. de Brito, y V. d. G. P. Brito, “Reordenacão do status da cachaça de alambique: uma abordagem sob a ótica do trabalho institucional,” Revista Eletrônica de Ciência Administrativa, vol. 21, no. 1, pp. 37–66, 2022. doi: 10.21529/RECADM.2022002.

J. E. de Souza, E. R. Scharf, y G. A. Gehrke, “Identidade de marca de cachaças artesanais: Um gole pro santo!” Revista Interdisciplinar de Marketing, vol. 12, no. 1, pp. 52–68, 2022. Disponible en: https://periodicos.uem.br/ojs/index.php/rimar/article/view/61185.

E. T. T. de Araújo et al., “O consumo de cachaça e seus sentidos: uma análise do comportamento do consumidor à luz da teoria do sensemaking,” Revista Gestão Organizacional, vol. 14, no. 2, pp. 46–68, 2021. doi: 10.22277/rgo.v14i2.5392.

B. U. Rodrigues et al., “Cachaça type identification using color information and computer vision,” en X Workshop de Visão Computacional, vol. 10, 2014, pp. 45–49.

G. C. Silvello et al., “New approach for barrel-aged distillates classification based on maturation level and machine learning: A study of cachaça,” LWT, vol. 140, p. 110836, 2021. Disponible en: https://www.sciencedirect.com/science/article/pii/S0023643820318259.

S. S. Virnodkar et al., “Application of machine learning on remote sensing data for sugarcane crop classification: A review,” en ICT Analysis and Applications, S. Fong, N. Dey, y A. Joshi, Eds., vol. 93. Singapore: Springer Singapore, 2020, pp. 539–555. doi: 10.1007/978-981-15-0630-7_55.

“Brazilian e-commerce public dataset by olist,” 2018, https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce. Accedido en Noviembre, 2023.

L. Real, M. Oshiro, y A. Mafra, “B2W-Reviews01 - an open product reviews corpus,” en Proceedings of the XII Symposium in Information and Human Language Technology, Salvador, BA, October 2019, pp. 200–208. Disponible en: https://github.com/b2wdigital/b2w-reviews01.

N. S. Hartmann et al., “A large corpus of product reviews in portuguese: tackling out-of-vocabulary words,” en Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland: European Language Resources Association (ELRA), May 2014, pp. 3865–3871. Disponible en: http://www.lrec-conf.org/proceedings/lrec2014/pdf/413_Paper.pdf.

R. F. d. Sousa, H. B. Brum, y M. d. G. V. Nunes, “A bunch of helpfulness and sentiment corpora in brazilian portuguese,” en Symposium in Information and Human Language Technology - STIL, Salvador, BA, October 2019, pp. 209–218.

J. A. Wagner Filho et al., “The brWaC corpus: A new open resource for Brazilian Portuguese,” en Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association, May 2018. Disponible en: https://aclanthology.org/L18-1686.

F. Barbieri, L. Espinosa Anke, y J. Camacho-Collados, “XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond,” en Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association, Jun. 2022, pp. 258–266. Disponible en: https://aclanthology.org/2022.lrec-1.27.

A. Conneau et al., “Unsupervised cross-lingual representation learning at scale,” en Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, Jul. 2020, pp. 8440–8451. Disponible en: https://aclanthology.org/2020.acl-main.747.

Y. Liu et al., “RoBERTa: A robustly optimized bert pretraining approach,” ArXiv, vol. abs/1907.11692, 2019. Disponible en: https://api.semanticscholar.org/CorpusID:198953378.

G. de Araujo, T. de Melo, y C. M. S. Figueiredo, “Is chatgpt an effective solver of sentiment analysis tasks in portuguese? a preliminary study,” en Proceedings of the 16th International Conference on Computational Processing of Portuguese-Vol. 1, 2024, pp. 13–21.

F. S. Marcondes et al., “Lexicon annotation with llm: A proof of concept with chatgpt,” en Hybrid Artificial Intelligent Systems, H. Quintián et al., Eds. Cham: Springer Nature Switzerland, 2025, pp. 190–200.

Downloads

Published

2025-10-22

Issue

Section

Original Articles

How to Cite

[1]
“Evaluation of Approaches Based on the BERT Model for Opinion Mining about the Cachaça Beverage”, JCS&T, vol. 25, no. 2, p. e09, Oct. 2025, doi: 10.24215/16666038.25.e09.

Similar Articles

1-10 of 247

You may also start an advanced similarity search for this article.