Document Type
Thesis
Date of Degree Completion
Spring 2022
Degree Name
Master of Science (MS)
Department
Computational Science
Committee Chair
Boris Kovalerchuk
Second Committee Member
Razvan Andonie
Third Committee Member
Szilard Vajda
Abstract
This research contributes to interpretable machine learning via visual knowledge discovery in General Line Coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and GLC are combined to create a visual self-service machine learning model. Two variants of GLC known as Dynamic Scaffold Coordinates (DSC) are proposed. DSC1 and DSC2 can map in a lossless manner multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a dynamic scaffolding graph construction algorithm.
Hyperblock analysis is used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree rules and a series of DSC1 or DSC2 plots can visualize in a lossless manner n-D data in accordance with a decision tree model. For large decision trees with many branches such as MNIST handwritten digits where hyperblock discovery was hampered, dimensionality reduction techniques such as principal component analysis, singular value decomposition, and t-distributed stochastic neighbor embedding were used to create new attributes of interest for visual class separation.
Major benefits of DSC1 and DSC2 is their highly interpretable nature. They allow domain experts to control or establish new machine learning models through visual pattern discovery. A software package referred to as Dynamic Scaffold Coordinates Visualization System (DSCViz) was created to showcase the DSC1 and DSC2 systems. DSCViz expands the end-user’s capabilities by offering several functions such as real-time drag and zoom, scaling techniques, sample clipping, attribute reordering, and the ability to hide classes or change their colors. DSC2 was used to estimate and visualize the worst-case validation splits in the Wisconsin Breast Cancer, Iris, and Seeds dataset. DSC2 was also plotted against MNIST Handwritten digits to determine its feasibility in large datasets. In general, the technique of estimating worst-case validation splits is important for every high-risk application.
Recommended Citation
Recaido, Charles, "Interpretable Machine Learning for Self-service High-risk Decision Making" (2022). All Master's Theses. 1751.
https://digitalcommons.cwu.edu/etd/1751
Included in
Data Science Commons, Graphics and Human Computer Interfaces Commons, Other Computer Sciences Commons