Lossless Visual Knowledge Discovery in High-Dimensional Data in Elliptic Paired Coordinates
Document Type
Poster
Campus where you would like to present
Ellensburg
Event Website
https://digitalcommons.cwu.edu/source
Start Date
18-5-2020
Abstract
Large datasets can provide useful information in quantitative fields of study. Data with more than two or three dimensions are difficult for humans to conceptualize and facilitate knowledge discovery. Novel Elliptic Paired Coordinates (EPCs) [Kovalerchuk, 2018] allows for multidimensional data to be represented two-dimensionally by halving the required visual elements in the graph without information loss. This research explores the effectiveness of constructing predictive machine learning models interactively using EPC visualizations. For this research Ellipse, an interactive software system, was developed to process high-dimensional datasets, create corresponding EPC visualizations, and build predictive classification models based on dominance rules. The Ellipse system automatically discovers areas that are located with a high percentage of single-class dominance. The experimental study using Ellipse on data from University of California Irvine’s Machine Learning Repository suggests EPC approach is a promising method for discovering predictive models with the following dataset results: (1) 100% coverage and 98.67% weighted average precision (WAP) for Iris plants (4 dimensions), (2) 98.24% coverage and 94.63% WAP on Wisconsin Breast Cancer (10 dimensions), and (3) 100% coverage and 96.26% WAP on Glass Identification with classes 1-6 grouped apart from class 7 (10 dimensions). Further evaluation and development of the EPC approach through machine learning is ongoing. Datasets with dominance rules of high precision and accuracy show promise for correctly classifying data points. This could be useful in many fields, such as medical diagnostics and forensics, allowing for visually appealing dominance rules to be easily interpreted in the application domains.
Recommended Citation
McDonald, Rose, "Lossless Visual Knowledge Discovery in High-Dimensional Data in Elliptic Paired Coordinates" (2020). Symposium Of University Research and Creative Expression (SOURCE). 54.
https://digitalcommons.cwu.edu/source/2020/COTS/54
Department/Program
Computer Sciences
Additional Mentoring Department
https://cwu.studentopportunitycenter.com/2020/04/lossless-visual-knowledge-discovery-in-high-dimensional-data-in-elliptic-paired-coordinates/
Lossless Visual Knowledge Discovery in High-Dimensional Data in Elliptic Paired Coordinates
Ellensburg
Large datasets can provide useful information in quantitative fields of study. Data with more than two or three dimensions are difficult for humans to conceptualize and facilitate knowledge discovery. Novel Elliptic Paired Coordinates (EPCs) [Kovalerchuk, 2018] allows for multidimensional data to be represented two-dimensionally by halving the required visual elements in the graph without information loss. This research explores the effectiveness of constructing predictive machine learning models interactively using EPC visualizations. For this research Ellipse, an interactive software system, was developed to process high-dimensional datasets, create corresponding EPC visualizations, and build predictive classification models based on dominance rules. The Ellipse system automatically discovers areas that are located with a high percentage of single-class dominance. The experimental study using Ellipse on data from University of California Irvine’s Machine Learning Repository suggests EPC approach is a promising method for discovering predictive models with the following dataset results: (1) 100% coverage and 98.67% weighted average precision (WAP) for Iris plants (4 dimensions), (2) 98.24% coverage and 94.63% WAP on Wisconsin Breast Cancer (10 dimensions), and (3) 100% coverage and 96.26% WAP on Glass Identification with classes 1-6 grouped apart from class 7 (10 dimensions). Further evaluation and development of the EPC approach through machine learning is ongoing. Datasets with dominance rules of high precision and accuracy show promise for correctly classifying data points. This could be useful in many fields, such as medical diagnostics and forensics, allowing for visually appealing dominance rules to be easily interpreted in the application domains.
https://digitalcommons.cwu.edu/source/2020/COTS/54
Faculty Mentor(s)
Boris Kovalerchuk