A Minimax Surrogate Loss Approach to Causal Inference
To overcome the problem that both treatment and control outcomes for the same unit are required for causal inference problem, we proposed surrogate loss functions that incorporate both treatment and control data. A specific choice of loss function, namely, a type of hinge loss, yields a minimax support vector machine formuation. The resulting optimization problem requires the solution to only a single convex optimization problem, incorporating both treatment and control units, and it enables the kernel trick to be used to handle nonlienar (also non-parametric) estimation.
Siong Thye Goh and Cynthia Rudin (2017) A Minimax Surrogate Loss Approach to Causal Inference. Submitted
Cascaded High Dimensional Histograms
Histograms are frequently used to understand distributions of data when the dimension of data is small. We discuss how to represent high dimensional histogram in the form of either a tree or a list (one-sided tree). Our models look for a balance between accuracy of representation of the data set as well as interpretability of the model.
Siong Thye Goh and Cynthia Rudin (2016) Cascaded High Dimensional Histograms: A Generative Approach to Density Estimation. Submitted
Imbalanced Data Classification
Box Drawings for Learning with Imbalanced Data
We propose two machine learning algorithms to handle highly imbalanced classification problems. The constructed classifiers are unions of parallel axis rectangles around the positive examples, and thus have the benefits of being interpretable.
Siong Thye Goh and Cynthia Rudin (2014) Box Drawings for Learning with Imbalanced Data. KDD 2014: 333-342
Using Fast Boxes to Predict Gas Turbine Failure
In practice, we rarely know when a machine begins to malfunction. We propose a method to label such unsupervised data. Fast boxes algorithm are then applied to predict gas turbine failures.
Siong Thye Goh, Xinmin Cai, Chao Yuan, Amit Chakraborty, and Matthew Evans (2016) Gas Turbine Failure Prediction Utilizing Supervised Learning Methodologies. Patent W2016040085A1
Sparse Coding for Faulty Sensor Detection using L-1 norm on the Residual
We apply sparse coding to the task of anomaly detection. In particular, we propose post-processing steps that take the time component into consideration to reduce the number of false positive in this task. Our method performs better than one-class SVM in the task of anomaly detection.
Siong Thye Goh, Chao Yuan, Amit Chakraborty, and Matthew Evans (2016) Gas Turbine Sensor Failure Detection Utilizing a Sparse Coding Methodology. Patent W2016040082A1
Linear Discriminant Analysis
Null Space Based Linear Discriminant Analysis
Null space based linear discriminant analysis is a common tool that is used in dimension reduction in pattern recognition..Previously, the standard way to perform null space based linear discriminant analysis is to perform Singular Value Decomposition (SVD) which is slow. We present a new implementation of the null space based linear discriminant analysis without performing any SVD. The main complexity comes from an economic QR decomposition with column pivoting which is much faster than the previous implementation.
Delin Chu and Siong Thye Goh (2010) A New and Fast Implementation for Null Space Based Linear Discriminant Analysis Pattern Recognition, Vol 34, Issue 4, April 2010, Pages 1373-1379
Orthogonal Linear Discriminant Analysis
The traditional LDA (linear discriminant analysis) computations require taking inverses. However, when the problem is under-sampled, the inverse of the total scatter matrix is not well defined. There have been various generalizations of the LDA algorithm and they require the inversion of matrices and computation of singular value decompositions. We propose a method to perform such computations with only orthogonal transformations, i.e. such a method is inverse-free and numerically stable.
Delin Chu and Siong Thye Goh (2010) A New and Fast Orthogonal Linear Discriminant Analysis on Undersampled Problems SIAM J. Sci. Comput., 32(4), 2274-2297
Uncorrelated Linear Discriminant Analysis
We find all the solutions to the uncorrelated linear discriminant analysis and parametrize them explicitly. Furthermore, we propose new and fast algorithms to compute ULDA (uncorrelated linear discriminant analysis) without performing singular value decomposition.
Delin Chu, Siong Thye Goh, and Y.S. Hung (2011) Characterization of All Solutions for Undersampled Uncorrelated Linear Discriminant Analysis Problems SIAM J. Matrix Anal. Appl. Vol 32, No. 3, pages 820-844
Even-Variable Balanced Boolean Functions with Optimal Algebraic Immunity
We construct six infinite classes of balanced Boolean functions. These six classes of Boolean functons achieve optimal algebraic degree, optimal algebraic immunity and high nonlinearity. We also prove a lower bound of the nonlinearities of these balanced Boolean functions and prove the better lower bound of nonlinearity for Carlet-Feng's Boolean function.
Chik-How Tan and Siong-Thye Goh (2011) Several Classes of Even-Variable Balanced Boolean Functions with Optimal Algebraic Immunity IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences Vol.E94-A No.1 pp.165-171
Determining All Permutations with Linear Translators
Let G be a finite function and consider functions F of the form of F=G+aH where a is a non-zero constant and H is a trace function.
We characterize the conditions where F is a permutation polynomial when G is a (1) permutation polynomial, (2) k-to-1 function, and (3) k-even function. The (2) and (3) cases positively answer the open problem proposed by Charpin and Kyureghyan. The technique can be generalized to obtaining permutation on any finite commutative right with identity, and H is a function from ring to a subring.
Guang Gong, Siong Thye Goh, and Yin Tan (2016) Determining Permutation Polynomials with Linear Translator. submitted .