Finding Structures in High-Dimensional Biomedical Data

7th November 2024

Timing : 1 pm ET

Please use this zoom link for joining the webinar

For a list of all talks at the NanoBio seminar Series Fall'24, see here


High-dimensional datasets often contain unlabeled structures with insightful knowledge. Neighbor embedding algorithms are unsupervised algorithms that identify groups of related data to visualize this structure. The algorithms are regarded as nonlinear dimensionality reduction methods, as they allow visualization of high-dimensional data in a low-dimensional space (typically 2-D). A popular neighbor embedding algorithm is Uniform Manifold Approximation and Projection (UMAP). UMAP utilizes a k-nearest neighbor (k-NN) graph to establish a pairwise metric in a high-dimensional space, which it uses to align a lower-dimensional representation. This talk explores (a) techniques for improving UMAP and (b) utilizing it to design new algorithms. We analyze the UMAP algorithm and better explain its optimization scheme and cluster formation, enhance the consistency of embeddings with respect to initialization, and improve its out-of-sample embedding. Then, we apply UMAP for aligning manifolds and analyzing large biomedical image datasets. In particular, we analyze chest x-rays and show that dimensionality reduction can discover 1) different phenotypes of COVID-19 response and 2) outliers in image datasets.