Volume 1, Article ID: 2024.0006
Kashif Ahmad
kashif.ahmad@mtu.ie
Muhammad Asif Ayub
asifayub836@gmail.com
Muhammad Tayyab Zamir
mzamir2023@cic.ipn.mx
Imran Khan
imran.cse@uetpeshawar.edu.pk
Hannia Naseem
hanninaseem836@gmail.com
Nasir Ahmad
nasir.ahmad@mtu.ie
1 Munster Technological University, Cork, Ireland
2 Department of Computer Systems Engineering, University of Engineering and Technology, Peshawar, Pakistan.
3 Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC) Mexico
* Author to whom correspondence should be addressed
Received: 08 Sep 2024 Accepted: 06 Nov 2024 Published: 15 Nov 2024
Crowd-sourcing has been widely explored for monitoring and feedback on infrastructure and services, such as air and water quality analysis. However, the traditional methods of crowd-sourcing for feedback and analysis of water quality, such as offline and online surveys, have several limitations, such as the limited number of participants and low frequency due to the labor involved in conducting such surveys. Social media analytics could overcome these challenges by providing a more sustainable and cost- effective water quality monitoring and analysis tool. This paper explores the potential of social media analytics in such applications by proposing a Natural Language Processing (NLP) framework to automatically collect and analyze water-related posts from social media for data-driven decisions. The proposed framework is composed of two components, namely (i) text classification, and (ii) topic modeling. For text classification, we propose a merit-fusion-based framework incorporating several Large Language Models (LLMs) combined in a late fusion method with optimal weights. In topic modeling, we employed the BERTopic library to discover the hidden topic patterns in the water-related tweets. We also analyzed relevant tweets originating from different regions and countries to explore global, regional, and country-specific issues and water-related concerns. We also collected and manually annotated a large-scale dataset, which is expected to facilitate future research on the topic.
Disclaimer: This is not the final version of the article. Changes may occur when the manuscript is published in its final format.
We use cookies to improve your experience on our site. By continuing to use our site, you accept our use of cookies. Learn more