Spam Filtration using Massively Parallel kNN in CUDA
Document Type
Oral Presentation
Campus where you would like to present
SURC Room 140
Start Date
15-5-2014
End Date
15-5-2014
Keywords
Spam, CUDA, kNN
Abstract
Spam-classification is a fundamental, unseen element of everyday life. Unconsciously, every email-user relies on spam-classification/filtration systems to properly remove the unwanted, while leaving the desired without requiring user-input. As email systems become more robust, and email communication becomes more prolific, it becomes necessary for spam-classification systems to continue to run accurately and efficiently, while remaining all but invisible to the user. This presentation details our massively parallel implementation of spam-classification using the k-Nearest Neighbors (kNN) algorithm on nVIDIA GPUs using CUDA. The kNN algorithm is a classification algorithm that operates as follows: given a dataset of points (training set) with known attributes and known classification, a point with known attributes but unknown classification is classified based on a weighted average of its most similar points among the training set. The algorithm can be broken down into three phases: the distance calculation phase, the sorting phase, and the classification phase. Of the three phases, the sorting phase is the most complex and computationally demanding. As such, our primary goal has been to optimize this aspect in particular. Utilizing the computational abilities of GPUs, we have developed an implementation that greatly improves the performance of the algorithm by using a massively parallel reduction for the sorting phase. The experimental results of our spam filtration system have demonstrated that our implementation is efficient and highly scalable. As such, we believe that it proves to be a feasible solution to the growing demands of spam-classification systems.
For their work on this project, Patrick McElroy and Joshua Smithrud were both nominated for the SOURCE 2014 Scholar of the Year Award.
Recommended Citation
McElroy, Patrick and Smithrud, Joshua, "Spam Filtration using Massively Parallel kNN in CUDA" (2014). Symposium Of University Research and Creative Expression (SOURCE). 29.
https://digitalcommons.cwu.edu/source/2014/oralpresentations/29
Additional Mentoring Department
Computer Science
Spam Filtration using Massively Parallel kNN in CUDA
SURC Room 140
Spam-classification is a fundamental, unseen element of everyday life. Unconsciously, every email-user relies on spam-classification/filtration systems to properly remove the unwanted, while leaving the desired without requiring user-input. As email systems become more robust, and email communication becomes more prolific, it becomes necessary for spam-classification systems to continue to run accurately and efficiently, while remaining all but invisible to the user. This presentation details our massively parallel implementation of spam-classification using the k-Nearest Neighbors (kNN) algorithm on nVIDIA GPUs using CUDA. The kNN algorithm is a classification algorithm that operates as follows: given a dataset of points (training set) with known attributes and known classification, a point with known attributes but unknown classification is classified based on a weighted average of its most similar points among the training set. The algorithm can be broken down into three phases: the distance calculation phase, the sorting phase, and the classification phase. Of the three phases, the sorting phase is the most complex and computationally demanding. As such, our primary goal has been to optimize this aspect in particular. Utilizing the computational abilities of GPUs, we have developed an implementation that greatly improves the performance of the algorithm by using a massively parallel reduction for the sorting phase. The experimental results of our spam filtration system have demonstrated that our implementation is efficient and highly scalable. As such, we believe that it proves to be a feasible solution to the growing demands of spam-classification systems.
For their work on this project, Patrick McElroy and Joshua Smithrud were both nominated for the SOURCE 2014 Scholar of the Year Award.
Faculty Mentor(s)
Andonie, Razvan