Attention based Bidirectional GRU hybrid model for inappropriate content detection in Urdu language

Shoukat, Ezzah; Irfan, Rabia; Basharat, Iqra; Tahir, Muhammad Ali; Shaukat, Sameen

Abstract:With the increased use of the internet and social networks for online discussions, the spread of toxic and inappropriate content on social networking sites has also increased. Several studies have been conducted in different languages. However, there is less work done for South Asian languages for inappropriate content identification using deep learning techniques. In Urdu language, the spellings are not unique, and people write different common spellings for the same word, while mixing it other languages, like English in the text makes it more challenging, and limited research work is available to process such language with the finest algorithms. The use of attention layer with a deep learning model can help handling the long-term dependencies and increase its efficiency . To explore the effects of the attention layer, this study proposes attention-based Bidirectional GRU hybrid model for identifying inappropriate content in Urdu Unicode text language. Four different baseline deep learning models; LSTM, Bi-LSTM, GRU, and TCN, are used to compare the performance of the proposed model. The results of these models were compared based on evaluation metrics, dataset size, and impact of the word embedding layer. The pre-trained Urdu word2Vec embeddings were utilized for our case. Our proposed model BiGRU-A outperformed all other baseline models by yielding 84\% accuracy without using pre-trained word2Vec layer. From our experiments, we have established that the attention layer improves the model's efficiency, and pre-trained word2Vec embedding does not work well with an inappropriate content dataset.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2501.09722 [cs.CL]
	(or arXiv:2501.09722v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.09722

Computer Science > Computation and Language

Title:Attention based Bidirectional GRU hybrid model for inappropriate content detection in Urdu language

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators