Spam Filtration using Massively Parallel kNN in CUDA

Presenter Information

Patrick McElroy
Joshua Smithrud

Document Type

Oral Presentation

Campus where you would like to present

SURC Room 140

Start Date

15-5-2014

End Date

15-5-2014

Keywords

Spam, CUDA, kNN

Abstract

Spam-classification is a fundamental, unseen element of everyday life. Unconsciously, every email-user relies on spam-classification/filtration systems to properly remove the unwanted, while leaving the desired without requiring user-input. As email systems become more robust, and email communication becomes more prolific, it becomes necessary for spam-classification systems to continue to run accurately and efficiently, while remaining all but invisible to the user. This presentation details our massively parallel implementation of spam-classification using the k-Nearest Neighbors (kNN) algorithm on nVIDIA GPUs using CUDA. The kNN algorithm is a classification algorithm that operates as follows: given a dataset of points (training set) with known attributes and known classification, a point with known attributes but unknown classification is classified based on a weighted average of its most similar points among the training set. The algorithm can be broken down into three phases: the distance calculation phase, the sorting phase, and the classification phase. Of the three phases, the sorting phase is the most complex and computationally demanding. As such, our primary goal has been to optimize this aspect in particular. Utilizing the computational abilities of GPUs, we have developed an implementation that greatly improves the performance of the algorithm by using a massively parallel reduction for the sorting phase. The experimental results of our spam filtration system have demonstrated that our implementation is efficient and highly scalable. As such, we believe that it proves to be a feasible solution to the growing demands of spam-classification systems.

For their work on this project, Patrick McElroy and Joshua Smithrud were both nominated for the SOURCE 2014 Scholar of the Year Award.

Faculty Mentor(s)

Andonie, Razvan

Additional Mentoring Department

Computer Science

This document is currently not available here.

Share

COinS
 
May 15th, 10:00 AM May 15th, 10:20 AM

Spam Filtration using Massively Parallel kNN in CUDA

SURC Room 140

Spam-classification is a fundamental, unseen element of everyday life. Unconsciously, every email-user relies on spam-classification/filtration systems to properly remove the unwanted, while leaving the desired without requiring user-input. As email systems become more robust, and email communication becomes more prolific, it becomes necessary for spam-classification systems to continue to run accurately and efficiently, while remaining all but invisible to the user. This presentation details our massively parallel implementation of spam-classification using the k-Nearest Neighbors (kNN) algorithm on nVIDIA GPUs using CUDA. The kNN algorithm is a classification algorithm that operates as follows: given a dataset of points (training set) with known attributes and known classification, a point with known attributes but unknown classification is classified based on a weighted average of its most similar points among the training set. The algorithm can be broken down into three phases: the distance calculation phase, the sorting phase, and the classification phase. Of the three phases, the sorting phase is the most complex and computationally demanding. As such, our primary goal has been to optimize this aspect in particular. Utilizing the computational abilities of GPUs, we have developed an implementation that greatly improves the performance of the algorithm by using a massively parallel reduction for the sorting phase. The experimental results of our spam filtration system have demonstrated that our implementation is efficient and highly scalable. As such, we believe that it proves to be a feasible solution to the growing demands of spam-classification systems.

For their work on this project, Patrick McElroy and Joshua Smithrud were both nominated for the SOURCE 2014 Scholar of the Year Award.