"Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyp" by Joseph Lemley, Filip Jagodzinski et al.

All Faculty Scholarship for the College of the Sciences

Title

Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data

Authors

Joseph Lemley, Central Washington University
Filip Jagodzinski, Western Washington University
Răzvan Andonie, Central Washington University

Document Type

Article

Department or Administrative Unit

Computer Science

Publication Date

6-10-2016

Abstract

We present the first algorithm for finding holes in high dimensional data that runs in polynomial time with respect to the number of dimensions. Previous algorithms are exponential. Finding large empty rectangles or boxes in a set of points in 2D and 3D space has been well studied. Efficient algorithms exist to identify the empty regions in these low-dimensional spaces. Unfortunately such efficiency is lacking in higher dimensions where the problem has been shown to be NP-complete when the dimensions are included in the input. Applications for algorithms that find large empty spaces include big data analysis, recommender systems, automated knowledge discovery, and query optimization. Our Monte Carlo-based algorithm discovers interesting maximal empty hyper-rectangles in cases where dimensionality and input size would otherwise make analysis impractical. The run-time is polynomial in the size of the input and the number of dimensions. We apply the algorithm on a 39-dimensional data set for protein structures and discover interesting properties that we think could not be inferred otherwise.

Comments

This article was originally published in 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). The article from the publisher can be found here.

Due to copyright restrictions, this article is not available for free download from ScholarWorks @ CWU.

Recommended Citation

Lemley, J., Jagodzinski, F., & Andonie, R. (2016). Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data. 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 16253047. https://doi.org/10.1109/compsac.2016.73

Journal

2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC)

Rights

Link to Full Text

Contact Author

Find in your library

COinS

All Faculty Scholarship for the College of the Sciences

Title

Authors

Document Type

Department or Administrative Unit

Publication Date

Abstract

Comments

Recommended Citation

Journal

Rights

Search

Browse

Connect

Links

All Faculty Scholarship for the College of the Sciences

Title

Authors

Document Type

Department or Administrative Unit

Publication Date

Abstract

Comments

Recommended Citation

Journal

Rights

Share

Search

Browse

Connect

Links