Nano-Cybernetic Biotrek Lab: Professor Deblina Sarkar

Finding Structures in High-Dimensional Biomedical Data

7th November 2024

Timing : 1 pm ET

Please use this zoom link for joining the webinar

For a list of all talks at the NanoBio seminar Series Fall'24, see here

High-dimensional datasets often contain unlabeled structures with insightful knowledge. Neighbor embedding algorithms are unsupervised algorithms that identify groups of related data to visualize this structure. The algorithms are regarded as nonlinear dimensionality reduction methods, as they allow visualization of high-dimensional data in a low-dimensional space (typically 2-D). A popular neighbor embedding algorithm is Uniform Manifold Approximation and Projection (UMAP). UMAP utilizes a k-nearest neighbor (k-NN) graph to establish a pairwise metric in a high-dimensional space, which it uses to align a lower-dimensional representation. This talk explores (a) techniques for improving UMAP and (b) utilizing it to design new algorithms. We analyze the UMAP algorithm and better explain its optimization scheme and cluster formation, enhance the consistency of embeddings with respect to initialization, and improve its out-of-sample embedding. Then, we apply UMAP for aligning manifolds and analyzing large biomedical image datasets. In particular, we analyze chest x-rays and show that dimensionality reduction can discover 1) different phenotypes of COVID-19 response and 2) outliers in image datasets.

Dr. Mohammad Tariqul Islam
Postdoctoral Associate, MIT Media Lab

Dr. Mohammad Tariqul Islam is a postdoctoral scholar advised by Prof. Deblina Sarker at the Nano-Cybernetic Biotrek lab at Massachusetts Institute of Technology. He is supported by the MIT-Novo Nordisk Artificial Intelligence Fellowship. He completed PhD at Princeton University under the supervision of Prof. Jason Fleischer at the Imaging Physics Group. Tariq's research interest includes signal processing and machine learning focusing on biomedical applications.