HIGH DIMENSIONAL DATA ANALYSIS FOR ANOMALY DETECTION AND QUALITY IMPROVEMENT A Dissertation Presented to The Academic Faculty

Abstract

Analysis of large-scale high-dimensional data with a complex heterogeneous data structure to extract information or useful features is vital for the purpose of data fusion for assessment of system performance, early detection of system anomalies, intelligent sampling and sensing for data collection and decision making to achieve optimal system performance. Chapter 3 focuses on detecting anomalies from high-dimensional data. Traditionally, most of the image-based anomaly detection methods perform denoising and detection sequentially, which affects detection accuracy and efficiency. In this chapter, A novel methodology, named smooth-sparse decomposition (SSD), is proposed to exploit regularized high-dimensional regression to decompose an image and separate anomalous regions simultaneously by solving a large-scale optimization problem. Chapter 4 extends this to spatial-temporal functional data by extending SSD to spatiotemporal smooth-sparse decomposition (ST-SSD), with a likelihood ratio test to detect the time of change accurately based on the detected anomaly. To enable real-time implementation of the proposed methodology, recursive estimation procedures for ST-SSD are also developed. The proposed methodology is also applied to tonnage signals, rolling inspection data and solar flare monitoring. Chapter 5 considers the adaptive sampling problem for high-dimensional data. A novel adaptive sampling framework, named Adaptive Kernelized Maximum-Minimum Distance is proposed to adaptively estimate the sparse anomalous region. The proposed method balances the sampling efforts between the space filling sampling (exploration) and focused sampling near the anomalous region (exploitation). The proposed methodology is also applied to a case study of anomaly detection in composite sheets using a guided wave test. Chapter 6 explores the penalized tensor regression to model the tensor response data with the process variables. Regularized Tucker decomposition and regularized tensor regression methods are developed, which model the structured point cloud data as tensors and link the point cloud data with the process variables. The performance of the proposed method is evaluated through simulation and a real case study of turning process optimization.