Manolis Kellis - Biosketch

Manolis Kellis is a Full Professor of Computer Science and Artificial Intelligence at MIT, a member of the Computer Science and Artificial Intelligence Laboratory and of the Broad Institute of MIT and Harvard, where he directs the MIT Computational Biology Group (compbio.mit.edu). His group has recently been funded to lead the integrative analysis efforts of the modENCODE project for Drosophila melanogaster, and also for integrative analysis of the NIH Epigenome Roadmap Project. He has received the US Presidential Early Career Award in Science and Engineering (PECASE) for his NIH R01 work in Computational Genomics, the NSF CAREER award, the Alfred P. Sloan Fellowship, the Karl Van Tassel chair in EECS, the Distinguished Alumnus 1964 chair, and the Ruth and Joel Spira Teaching Award in EECS. He was recognized for his research in genomics as one of the top young innovators under the age of 35 by Technology Review Magazine, one of the principal investigators of the future by Genome Technology magazine, and one of three young scientists representing the next generation in biotechnology by the Boston Museum of Science. He obtained his Ph.D. from MIT, where he received the Sprowls award for the best doctorate thesis in computer science, and the first Paris Kanellakis graduate fellowship. Prior to computational biology, he worked on artificial intelligence, sketch and image recognition, robotics, and computational geometry, at MIT and at the Xerox Palo Alto Research Center. He lived in Greece and France before moving to the US.

Research Aims

Our group at MIT aims to further our understanding of the human genome by computational integration of large-scale functional and comparative genomics datasets. (1) Using alignments of multiple closely related species, we have defined evolutionary signatures for the systematic discovery and characterization of diverse classes of functional elements, including protein-coding genes, RNA structures, microRNAs, developmental enhancers, regulatory motifs, and biological networks. (2) Using epigenomics datasets of multiple chromatin marks across the complete genome, we have defined chromatin signatures that reveal numerous classes of promoter, enhancer, transcribed, and repressed regions, each with distinct functional properties. (3) Using diverse functional datasets across many cell types, we have defined multi-cell activity signatures for chromatin states, regulator expression, motif enrichment, and target gene expression, and have used their correlations to link candidate enhancers to their putative target genes, infer cell type-specific activators and repressors, and to predict and validate functional regulator binding in specific chromatin states.

We have used these evolutionary, chromatin, and activity signatures to elucidate the function and regulatory circuitry of the human and fly genomes, to reveal many new insights on animal gene regulation and development, including abundant translational read-through in neuronal proteins, functionality of anti-sense microRNA transcripts, and thousands of novel large intergenic non-coding RNAs. We have also used these signatures to revisit previously uncharacterized diseaseassociated single-nucleotide polymorphism (SNP) variants linked to several diseases and phenotypes from genome-wide association studies, which has enabled us to provide mechanistic insights into their likely molecular roles. Overall, our genomic signatures dramatically expand the annotation of the non-coding genome, providing a systematic annotation of chromatin functions, new insights on diverse regulatory mechanisms, and shining new light on previously uncharacterized disease-assocaited variants.

We have also developed methods to study systematic differences between the species compared, and uncovered important evolutionary mechanisms for the emergence of new functions. Our work provided definitive proof of an ancestral whole-genome duplication in yeast, which led to a complete doubling of the gene count, and was rapidly followed by massive gene loss, asymmetric divergence, and new gene functions. To further understand the evolutionary processes leading to new functions, we developed a phylogenomic framework for studying gene family evolution in the context of complete genomes, revealing two largely independent evolutionary forces, dictating gene- and species-specific mutation rates. De-coupling these two rates also allowed us to develop the first machine-learning approach to phylogeny, resulting in drastically higher accuracies than any existing phylogenetic method.

Research Group

My research group consists primarily of computer science graduate students and postdocs with expertise in algorithms, statistical inferences and machine learning, and sharing a passion for understanding fundamental biological problems.

We work in a highly interdisciplinary environment at the interface of Computer Science and Biology. Since its inception, our lab has eagerly engaged in collaborative research partnerships with biological and experimental collaborators, facilitated by our affiliation with the Broad Institute and the Computational and Systems Biology initiative (CSBi) at MIT, our participation in the Epigenome Roadmap, ENCODE, and modENCODE consortia, and by several other ongoing collaborations at MIT, Harvard, and the Harvard Medical School affiliated hospitals.

We are located on the 5th floor (D5) of MIT Stata Center, a truly unique building that stretches the imagination, and home of the Computer Science and Artificial Intelligence Lab (CSAIL).

Teaching

Each Fall, I teach a computational biology course at MIT, titled "Computational Biology: Genomes, Networks, Evolution". The course is geared towards advanced undergraduate and early graduate students, seeking to learn the algorithmic and machine learning foundations of computational biology, and also be exposed to current frontiers of research in order to become active practitioners of the field. Foundations. We cover principles of algorithm design, influential problems and techniques, and analysis of large-scale biological datasets. Genomes: sequence analysis, gene finding, RNA folding, genome alignment and assembly, database search. Networks: gene expression analysis, regulatory motifs, biological network analysis. Evolution: comparative genomics, phylogenetics, genome duplication, genome rearrangements, evolutionary theory. These are coupled with fundamental algorithmic techniques including: dynamic programming, hashing, Gibbs sampling, expectation maximization, hidden Markov models, stochastic context-free grammars, graph clustering, dimensionality reduction, Bayesian networks. Term Project In addition to the course materials, the students get introduced to a term-long final project, working in teams with a postdoc or senior grad student mentor, and several project milestones including an NIH-style project proposal, a peer-review process and panel discussion, a mid-course project report, and final project report with oral presentation at the end of the term. At the end of the term, many students decide to pursue their projects into MS or PhD theses, publications, and frequently join computational biology research groups with the experience gained.

Selected Publications

Identification of functional elements and regulatory circuits in Drosophila by large-scale data integration.
Discovery and characterization of chromatin states for systematic annotation of the human genome.
A Bayesian approach for fast and accurate gene tree reconstruction.
The modENCODE Project: Unlocking the secrets of the genome.
Evolution of pathogenicity and sexual reproduction in eight Candida genomes.
Histone modifications at human enhancers reflect global cell-type-specific gene expression.
Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures
Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals
Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae
Transcriptional regulatory code of a eukaryotic genome
Sequencing and comparison of yeast species to identify genes and regulatory motifs

Contact:

MIT Computer Science and Artificial Intelligence Laboratory (

CSAIL

32G-675A

Go to: 32 Vassar St, Cambridge MA 02139
Enter: The MIT Stata Center (it's hard to miss)
Also known as: MIT Building 32
Take the: Dreyfous Tower Elevator
Go to the 5th floor (make sure you're in the Dreyfoos Tower)
Enter the: D5 CompBio Area
Look for room: 32D-524.

Massachusetts Institute of Technology

Broad Institute of MIT and Harvard

Computer Science and Artificial Intelligence Lab