Document Type

Thesis

Date of Degree Completion

Spring 2021

Degree Name

Master of Science (MS)

Department

Computational Science

Committee Chair

Boris Kovalerchuk

Second Committee Member

Razvan Andonie

Third Committee Member

Szilard Vajda

Abstract

High-dimensional data play an important role in knowledge discovery and data science. Integration of visualization, visual analytics, machine learning (ML), and data mining (DM) are the key aspects of data science research for high-dimensional data. This thesis is to explore the efficiency of a new algorithm to convert non-images data into raster images by visualizing data using heatmap in the collocated paired coordinates (CPC). These images are called the CPC-R images and the algorithm that produces them is called the CPC-R algorithm. Powerful deep learning methods open an opportunity to solve non-image ML/DM problems by transforming non-image ML problems into image recognition. The main idea behind CPC-R is splitting attributes of an n-D point into consecutive pairs of its attributes, locating pairs in the same 2-D Cartesian space, and assigning greyscale intensities or colors to the pairs. There are several parameters that can be changed producing several versions of CPC-R images allowing to optimize images for classification. This thesis reports the results of computational experiments with the CPC-R algorithm for different Convolution Neural Network classifiers, and the methods to optimize the several versions of CPC-R images for the same n-point. These results show that the combined CPC-R and deep learning Convolution Neural Network algorithms are able to solve non-image Machine Learning problems reaching high accuracy on the benchmark datasets. The second part of this thesis reports the results of Saliency Mapping with the CPC-R algorithm. The saliency models take an image and generate a saliency map that predicts which regions of the image will most likely draw a human viewer’s attention. The saliency mappings with the CPC-R are explored, and further optimization studies are outlined. This thesis reports the importance of features by estimating the change of prediction accuracy due to the exclusion of the individual features. The large sets of pixels are used as features that can capture a large context. This approach views a cell as the most informative if covering it leads to the largest decrease in classification accuracy. This method is called the Informative Cell Covering (ICC) algorithm.

Keywords: Knowledge Discovery, Deep Learning, Collocated Paired Coordinates, Convolutional Neutral Networks, Raster Images, Machine Learning, Visualization, Nonimage data, Data conversion.

Share

COinS