Document Type


Date of Degree Completion

Spring 2020

Degree Name

Master of Science (MS)


Computational Science

Committee Chair

Szilárd Vajda

Second Committee Member

Razvan Andonie

Third Committee Member

Donald Davendra


Tuberculosis (TB) is a disease responsible for the deaths of more than one million people worldwide every year. Even though it is preventable and curable, it remains a major threat to humanity that needs to be taken care of. It is often diagnosed in developed countries using approaches such as sputum smear microscopy and culture methods. However, since these approaches are rather expensive, they are not commonly used in poor regions of the globe such as India, Africa, and Bangladesh. Instead, the well known and affordable chest x-ray (CXR) interpretation by radiologists is the technique employed in those places. Nevertheless, if this method is obsolete in other parts of the world nowadays it is because of its many flaws including: i) it is a tedious task that requires experienced medical personnel --which is scarce given the high demand for it--, ii) it is manual and difficult when executed for a large population, and iii) it is prone to human error depending on the proficiency and aptitude of the interpreter. Researchers have thus been trying to overcome these challenges over the years by proposing software solutions that mainly involve computer vision, artificial intelligence, and machine learning. The problems with these existing solutions are that they are either complex or not reliable enough. The need for better solutions in this specific domain as well as my desire to bring my contribution to something meaningful are what led us to investigate in this direction.

In this manuscript, I propose a simple fully automatic software solution that uses only machine learning and image processing to analyze and detect anomalies related to TB in CXR scans. My system starts by extracting the region of interest from the incoming images, then performs a computationally inexpensive yet efficient feature extraction that involves edge detection using Laplacian of Gaussian and positional information retention. The extracted features are then fed to a regular random forest classifier for discrimination. I tested the system on two benchmark data collections --Montgomery and Shenzhen-- and obtained state-of-the-art results that reach up to 97% classification accuracy.