Document Type

Graduate Project

Date of Degree Completion

Fall 2024

Degree Name

Master of Science (MS)

Department

Computational Science

Committee Chair

Dr. Razvan Andonie

Second Committee Member

Dr. Szilard Vajda

Third Committee Member

Dr. Boris Kovalerchuk

Fourth Committee Member

Dr. Razvan Andonie

Abstract

Since the advent of social networks in 1997, businesses and people’s lives have changed in a good way. From promoting companies to reaching out to friends and family, social networking has become a major element in our lives. X is a very popular platform that is used by many people, including celebrities and politicians, to communicate with their audience. Like other platforms, X is not spared by the racism contained in the tweets. We should be able to catch those racist comments on any social media and block the accounts of those responsible for them. To do so, we have two options that we can use. The first option is to use more moderators on social media because they are humans. They can use their common sense to detect racism better than artificial intelligence (AI), but there is no guarantee that they will be honest enough to do the job. The second option and the purpose of this study is to upgrade the performance of deep learning models and use them to detect racism online. For this kind of task, we need a model that will allow computers to work with little supervision and learn independently. Deep learning almost learns the way human brains do and process information by utilizing artificial neural networks to learn patterns in data. Those neural networks can assimilate complicated concepts and relationships from data. Moreover, they can be improved and get more accurate for better predictions.

This research will employ thirteen different machines and deep learning models, including Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Bidirectional Encoder Representations from Transformers (Bert), Naive Bayes, Extreme Gradient Boosting(XGBoost),The Gated Recurrent Unit (GRU),Fasttext, Light Gradient Boosting Machine(LightGBM), Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, to analyze and classify instances of racist language in a dataset.

We will begin by a literature review to understand the racism detection methodologies and the challenges associated with them. Following this, The dataset that will be used, will be preprocessed to optimize it for deep learning, utilizing techniques such as embedding, tokenization, removing punctuation and stopwords, lower casing and lemmatization.

The methodology will involve training multiple deep learning models and comparing their performance in terms of accuracy, precision, and recall. It will also use some techniques like Smote, We implemented a function capable of creating synthetic sentences including different races in orders to allow the models to train on a dataset containing different races rather than one race. We will also use techniques like Ensemble learning, Hyper-parameters optimization, Attention mechanism, Sentiment Analysis, and Dynamic weighting to boost the accuracy. The aim of this study is to show that it exists techniques that can be added to the existing models in order to help them improve their performance.

When training models for racism detection, the dataset is a critical concern because if the data represents only one race, the model will be biased towards recognizing patterns specific to that race and it may fail to generalize across other racial groups. The ability of the model to effectively detect or understand racial or harmful content involving people from different racial backgrounds will be limited. To address this, it is essential to diversify the dataset by including a balanced and representative sample of data from multiple racial groups. This way, the model will learn or understand how racism can manifest across different contexts, languages, and cultural expressions, rather than learning patterns in a one-race dataset.

Our contribution to improve machines and deep learning for racism detection will contribute to reduce racial biases in AI, ensuring that deep learning systems detect harmful stereotypes and discriminatory language without reinforcing societal inequalities. Our research promotes the development of ethical AI, with practical applications in fields like content moderation, hiring practices, and criminal justice, contributing to the creation of more inclusive, equitable systems.

Share

COinS