Lossless Visual Knowledge Discovery in High-Dimensional Data in Elliptic Paired Coordinates

Document Type

Poster

Campus where you would like to present

Ellensburg

Event Website

https://digitalcommons.cwu.edu/source

Start Date

18-5-2020

Abstract

Large datasets can provide useful information in quantitative fields of study. Data with more than two or three dimensions are difficult for humans to conceptualize and facilitate knowledge discovery. Novel Elliptic Paired Coordinates (EPCs) [Kovalerchuk, 2018] allows for multidimensional data to be represented two-dimensionally by halving the required visual elements in the graph without information loss. This research explores the effectiveness of constructing predictive machine learning models interactively using EPC visualizations. For this research Ellipse, an interactive software system, was developed to process high-dimensional datasets, create corresponding EPC visualizations, and build predictive classification models based on dominance rules. The Ellipse system automatically discovers areas that are located with a high percentage of single-class dominance. The experimental study using Ellipse on data from University of California Irvine’s Machine Learning Repository suggests EPC approach is a promising method for discovering predictive models with the following dataset results: (1) 100% coverage and 98.67% weighted average precision (WAP) for Iris plants (4 dimensions), (2) 98.24% coverage and 94.63% WAP on Wisconsin Breast Cancer (10 dimensions), and (3) 100% coverage and 96.26% WAP on Glass Identification with classes 1-6 grouped apart from class 7 (10 dimensions). Further evaluation and development of the EPC approach through machine learning is ongoing. Datasets with dominance rules of high precision and accuracy show promise for correctly classifying data points. This could be useful in many fields, such as medical diagnostics and forensics, allowing for visually appealing dominance rules to be easily interpreted in the application domains.

Faculty Mentor(s)

Boris Kovalerchuk

Department/Program

Computer Sciences

Additional Mentoring Department

https://cwu.studentopportunitycenter.com/2020/04/lossless-visual-knowledge-discovery-in-high-dimensional-data-in-elliptic-paired-coordinates/

Share

COinS
 
May 18th, 12:00 PM

Lossless Visual Knowledge Discovery in High-Dimensional Data in Elliptic Paired Coordinates

Ellensburg

Large datasets can provide useful information in quantitative fields of study. Data with more than two or three dimensions are difficult for humans to conceptualize and facilitate knowledge discovery. Novel Elliptic Paired Coordinates (EPCs) [Kovalerchuk, 2018] allows for multidimensional data to be represented two-dimensionally by halving the required visual elements in the graph without information loss. This research explores the effectiveness of constructing predictive machine learning models interactively using EPC visualizations. For this research Ellipse, an interactive software system, was developed to process high-dimensional datasets, create corresponding EPC visualizations, and build predictive classification models based on dominance rules. The Ellipse system automatically discovers areas that are located with a high percentage of single-class dominance. The experimental study using Ellipse on data from University of California Irvine’s Machine Learning Repository suggests EPC approach is a promising method for discovering predictive models with the following dataset results: (1) 100% coverage and 98.67% weighted average precision (WAP) for Iris plants (4 dimensions), (2) 98.24% coverage and 94.63% WAP on Wisconsin Breast Cancer (10 dimensions), and (3) 100% coverage and 96.26% WAP on Glass Identification with classes 1-6 grouped apart from class 7 (10 dimensions). Further evaluation and development of the EPC approach through machine learning is ongoing. Datasets with dominance rules of high precision and accuracy show promise for correctly classifying data points. This could be useful in many fields, such as medical diagnostics and forensics, allowing for visually appealing dominance rules to be easily interpreted in the application domains.

https://digitalcommons.cwu.edu/source/2020/COTS/54