Manolis Kellis
Research Interests

Distinguished Alumnus (1964) Career Development Professorship.
Associate Professor, MIT Department of Electrial Engineering and Computer Science (EECS).
Principal Investigator, Computer Science and Artificial Intelligence Laboratory (CSAIL).
Associate Member, Broad Institute of MIT and Harvard (Broad).

My primary research interests are in Computational Biology. My group is developing new algorithms and machine learning techniques for the interpretation of complete genomes, the understanding of gene regulation, and the elucidation of evolutionary mechanisms and embryo development. I am additionally involved in various research projects in Machine Learning, Computational Geometry, Robotics, and Information Retrieval.

I did my Ph.D. thesis on computational genomics, with Eric Lander and Bonnie Berger.
I did my masters thesis on image interpretation and retrieval, with Patrick Winston.

Computational Biology:
Genome Alignment
Gene Identification
Regulatory Motif Discovery
Genome Evolution
Genome Duplication
Embryo Development
Machine Learning:
Image interpretation and retrieval
Human Motion Primitives
Handwriting Recognition
Learning Musical Patterns
Computational Geometry:
3D Surface Reconstruction
3D polygon morphing
2D Gaussian curve morphing
Robotics:
iLogo robotic vehicle
Modular Reconfigurable Robots
Autonomous Lego Robot
Other Projects:
Stock Market
Web Crawler
Turing Machine
Ray Tracing
Information Sharing
Meta-Evolution

Computational Biology


Graph-Theoretic Framework for Genome Alignment

Manolis Kellis, Bonnie Berger, Eric Lander
MIT/Whitehead Center for Genome Research

Genome Alignment The first step to comparative genomics is to determine what regions to compare across the multiple species analyzed. This is by no means a simple task, because even close relatives are separated by gene duplication and loss events, and large-scale genomic rearrangements. Previous approaches have used heuristics to find the most similar sequence for each region and provide no guarantees of correctness or uniqueness. I formalized the problem of genome correspondence in a graph-theoretic framework and developed algorithms that work with the full bipartite graph of gene similarities across the species compared. The framework seamlessly incorporates protein sequence similarity and gene order information to resolve ambiguities in gene correspondence and to determine those genes that arose from the same gene in the common ancestor. Applying the graph algorithms to four closely related species of yeast determined that over 90% of genes have one-to-one correspondence. Additionally, the resulting graph pinpointed those genes that underwent recent duplication, loss, and protein-family expansion events, revealing regions of rapid evolution at the ends of chromosomes, and distinct mechanisms of chromosomal change. Finally, by ensuring that aligned genes share a common ancestry, the algorithm allowed me to apply robust evolutionary models of sequence divergence to discover biological signals.

Systematic Gene Identification from Patterns of Nucleotide Change

Manolis Kellis, Bruce Birren, Eric Lander
MIT/Whitehead Center for Genome Research

Gene Identification I developed computational methods that use multiple related genomes to identify protein-coding genes in yeast. Previous studies have identified genes based on the functional effects of their disruption, experimental evidence of their use, or protein sequence similarity to distantly related species. I formalized the question of gene identification as a classification problem between genes and intergenic regions. I developed a computational test to discriminate between the two types of region based on their patterns of nucleotide change in the multiple alignment of the four yeast species. I found distinct conservation signatures between genes and intergenic regions in yeast and implemented a discriminating test with sensitivity and specificity values that exceed 99%, based on 4000 experimentally-validated genes and 300 random intergenic regions. I used this test to evaluate every candidate gene in the yeast genome, and to refine the start, stop and intron boundaries of previously annotated genes. The work led to the largest re-annotation of the genome since its original sequencing, reducing the overall gene count by 10%, and proposing changes that affect nearly 15% of all genes. Numerous independent reports have provided experimental support confirming the changes.

Regulatory Motif Discovery using Genome-Wide Conservation Criteria

Manolis Kellis, Bruce Birren, Eric Lander
MIT/Whitehead Center for Genome Research

Motif Discovery

I also developed new computational methods to address the greater challenge of de novo regulatory motif discovery. Regulatory motifs are short 6-8 nucleotide sequence tags recognized by specific proteins that control gene expression, dictating when the associated genes will be turned on or off. Regulatory motif discovery has typically relied on extensive experimentation, evaluating gene expression of thousands of genes across dozens of experimental conditions, then constructing clusters of genes that show similar patterns of expression and searching their upstream sequences for over-represented sequence tags. In absence of biological knowledge of functionally related sets of genes, the identification or regulatory motifs has remained an open problem. Previous comparative studies have focused on individual instances of regulatory motifs, searching for conservation islands in intergenic regions. Such methods lack the power to discover the true motifs within these islands, since additional bases surrounding the motifs are spuriously conserved, not by natural selection but due to lack of divergence time between the species compared.

I approached regulatory motif discovery using genome-wide methods, evaluating motif conservation simultaneously across the entire genome. I found that the spuriously conserved bases surrounding a motif varied across its different instances, allowing the signal of the true motifs to stand out. Computationally, the algorithm started with an exhaustive enumeration of short sequence seeds with non-random genome-wide conservation, followed by a motif refinement and collapsing step, leading to a small dictionary of discovered motifs. These included nearly all previously known regulatory motifs, and a similar number of novel motifs, most of which showed functional specificities. Overall, the purely computational method showed unprecedented power in extracting signal from noise. The discovery of such genome-wide signals would have been impossible using traditional gene-by-gene experimental approaches, regardless of the time and effort invested.


Genome Evolution

Manolis Kellis, Bruce Birren, Eric Lander
MIT/Whitehead Center for Genome Research

Genome Duplication and Emergence of New Functions

Manolis Kellis, Bruce Birren, Eric Lander
MIT/Whitehead Institute Center for Genome Research

I have additionally applied computational methods to understand mechanisms of evolutionary innovation. It is generally thought that new functions arise by duplication of existing genes, but the mechanisms of duplication have been poorly understood. For example, the ancestry of hundreds of duplicated human and yeast genes has been under debate, some scientist arguing for complete genome duplication and others for independent duplication events. Previous comparative work has focused on divergence of individual gene families or regions, and the analyses have remained inconclusive.

I have taken a genome-wide approach to address this question in yeast. Instead of using divergence rates of individual genes, I have focused on the conservation of gene order across the entire genome. My approach relied on a key dataset: the sequence of yeast species that diverged prior to the estimated time of duplication. My genome alignment methods provided definitive proof of complete genome duplication: every region of the newly sequenced relative corresponded to exactly two regions of the yeast genome. More than 90% of duplicated genes were lost in one copy or the other, but the remaining genes provided a sufficient signal to show complete duplication. This work has put an end to a long debate on the ancestry of the yeast genome that started nearly seven years ago with the completion of its sequencing. Additionally, the analysis led to the first genome-wide study of the emergence of new gene functions after duplication, with important revelations on the emergence of anti-viral defense mechanisms, environmental stress response, gene silencing and transcriptional regulation.


Genomic Analysis of the Hox developmental genes

Ken Dewar, Manolis Kellis, Eric Lander
MIT/Whitehead Center for Genome Research

Hox genes control the anterior-posterior axis of embryo formation during development. By comparing the intergenic regions of human, baboon, mouse, and rat hox clusters, we are trying to understand the reasons behind the preservation of sequence and spacing.


Machine Learning


Sketch based image retrieval

Manolis Kellis, Ovidiu Marina, Patrick Winston
MIT AI Lab
Imagina

My masters thesis was on Computer Vision, more precisely, sketch-based image retrieval. The project involved image understanding, segmentation, representation and learning (more about Imagina)


Understanding Primitives in Human Motion

Manolis Kellis, Michael Black
Digital Video Analysis group (DiVA), Xerox PARC
Human Motion

Developped a new represenation for full-body 3D motion data. The goal is to find lower-dimensional manifolds that contain motion patterns for different activities, embedded in a 27-dimensional space defined by the angles of joints. Finding such structure in human motion would allow to describe activities much more compactly for recognition or generation of 3D motion in terms of primitives intrinsic to the dynamics of human anatomy.


Handwritten character recognition

Manolis Kellis, Paul Viola
MIT AI Lab
Character Recognition

Presented a compact angle representation for handwritten characters based on online data that filters noise and facilitates recognition. Wavelet decopmosition of the character signal in derivative angle space is invariant to scaling or rotation and remains preserved within samples of the same character. Code in Matlab. Supervised by Paul Viola from the Learning and Vision group of the AI Lab at MIT.


Learning Musical Patterns

Manolis Kellis, Ovidiu Marina, Hooman Vassef, Dedric Carter, Patrick Winston
MIT AI Lab
Mood Project

Under the supervision of Patrick Winston, more interested in understanding the human I/O channels, I wrote a program to recognize music by learning patterns of attentional state. Instead of observing the notes directly, we were more interested in watching the states that a note observer would go through. In other words, the succession of primitive routines that a note pattern-matcher uses in observing a note sequence should tell us more about a musical piece than the patterns extracted themselves. The strength of this project was a clean architecture for using meta-information about an observed phenomenon. It separated the low-level primitive routines (that match patterns, change the focus of attention to salient points, and establish properties about the observed events) from an attentional state observer of those low-level routines (that records and analyzes series of states the pattern matcher enters). This separation that the program exhibited is suggested by Minsky's notion of a B-brain that monitors the activity of our A-brain. (visit the MOOD team)


Computational Geometry


Surface Reconstruction

Nina Amenta, Marshall Bern, Manolis Kellis
Xerox PARC, CSL
Crust Project

My involvement in graphics also grew out of mathematical interest, in computational geometry. At Xerox PARC, I worked with Marshall Bern and Nina Amenta on a surface reconstruction algorithm. The algorithm relies on Voronoi diagrams and Delaunay triangulations to compute a provably correct reconstruction of the connectivity of a set of scattered sample points in space. Furthermore, the sampling density requirement is proportional to the curvature and proximity of other surface patches, which allows less dense sampling in those parts of the figure with fewer features. Our implementation reached speed and correctness benchmarks that were state of the art, and our work was published the subsequent summer at ACM Siggraph 1998, the largest graphics conference. (learn more about the Crust algorithm). Computer graphics was my new passion.



Polygon based 3D morphing

Manolis Kellis, Krzysztof Gajos, Matt Blum, Hooman Vassef, Seth Teller
MIT Graphics Group
3D Morphing 3D Morphing 3D Morphing
3D Morphing 3D Morphing 3D Morphing

Back at MIT I took a computer graphics class and as a final project I proposed and implemented a 3D morphing algorithm. The project explored the question of transforming a three-dimensional polygonal model into another while minimizing the energy lost in creating and destroying polygons and maximizing the uniformity of the polygon flow. Following these constraints, a polygon can spawn another polygon from one of its edges (birth), and a polygon can collapse onto another polygon's edge (death). From there followed two algorithms I called Edge Ordering and Incremental Edge Ordering that solve the general problem of scheduling a multitude of agents to satisfy a set of tasks, for example a set of robots extinguishing small fires. Further constraints were applied on the birth and death edges to preserve the uniformity of the polygon flow, and the aesthetic results were very appealing, yet biased by the cost metric I used. (you can find 3D morphing examples online)



Gaussian curve morphing

Manolis Kellis, Berthold Horn
MIT AI Lab
Gaussian Morphing Project

In search of a purely mathematical solution, I further explored the problem of morphing in the continuous case. Instead of having a discrete set of polygons as the input, this problem addressed an arbitrary convex shape. Inspired by Professor Berthold Horn's Robot Vision class, I mapped the two shapes to be morphed to their corresponding Extended Gaussian Images, and simply matched the normals on the surface of one unit sphere with normals on the other. The matching algorithm is a function that distributes each source to all destinations, the amount displaced between any two normals inversely exponential to a combination of their difference in angle and weight. This combination depended on an input parameter that favored matching normals of similar weights or normals of neighboring angles. This parameter ended up incorporating the different intuitive approaches we have about transforming a square into a triangle: in one case we round off the corners (minimizing length variation of the sides), and in the other case we progressively shrink one of the square's edges until it becomes a triangle (minimizing angle variation). Both approaches were supported by different users of the morphing program. This purely mathematical solution thus yielded intuition on a problem for which there was no base of argumentation. (you can test the Gaussian Morphing applet yourself!).


Robotics


An interacting reprogrammable robot

Manolis Kellis, Jeremy Lueck, Chris Rohrs
MIT
RoboLogo Project

I made this system a reality in my 6.115 final project, RoboLogo, in a team of three people. Inspired from Seymour Papert's work with children and Logo, we developed iLogo (interactive Logo), to extend the Logo language with primitives for interacting with the environment, reading sensors, and reacting accordingly to external events. We designed the new language and its grammar; we built a compiler that translated programs from iLogo to assembly code; we wrote a library of low-level routines and initialization code to be linked with the generated files; we designed and laid out a printed circuit board to incorporate a processor, RAM, ROM, motor controllers, sensor drivers, AD converter; finally we put everything together on a mechanical truck on which we mounted sensors and actuators. Children can control the robot by writing simple high level commands in iLogo, and see a working robot bounce between two walls with only six lines of code. This new abstraction layer allows for more complicated behaviors to be expressed in smaller, simpler programs, with an overhead that is negligible when the speed of the processor is compared to the speed of external events. Building the entire RoboLogo environment from the ground was a very rewarding experience and went a step further than 6.270 where the circuit board was provided along with the iC language and compiler. Making RoboLogo happen was like constructing our own 6.270 class. (on to the RoboLogo page).


Modular Reconfigurable Robots

Mark Yim, Manolis Kellis
Xerox PARC, SPL
PolyPod Project

At Xerox PARC, the summer of 98, I worked on a larger-scale robotics project and I concentrated on the control of the robots rather than the low-level construction of the hardware. I worked with Mark Yim on Polypod, a unit-modular reconfigurable robot, and I was responsible of the control architecture for coordinating the actions of the different modules. The model I proposed is a combination between a global bus carrying one or multiple values shared among all modules and a message passing mechanism that provides an interpretation for the shared values. The architecture is designed to be adaptive to the environment and robust against isolated module failures. To decentralize the control it distributes the computational power throughout the robot and specifies local behavioral rules at the module level, so that global communication be not necessary for simple decisions that need only local information. Global behavior then emerges at the robot level when Polypod interacts with the environment. While working on Polypod, my graphics background helped in the making of a Java3D simulation to visualize the modules; my geometry background helped me work out the forward and inverse kinematics; my background in AI helped me develop an architecture for decision making and control arbitration; my mathematical background helped me address scalability issues and design algorithms for information propagation and storing; moreover, having dealt with both mechanical and electronics issues during my robotics experience at MIT, I had an intuition about what was feasible to undertake in a mechatronics system. (see more at the PolyPod page)



Old Projects


Predicting the stock market

Manolis Kellis, Shishir Mehrotra, Nimrod Warshawsky, Tomás Lozano Perez
MIT AI Lab
Invest Project

My interest in artificial intelligence arose as a mathematical curiosity first. In an introductory AI class, I was presented with a series of small independent problems and smart ideas with which to approach them. I applied those ideas in different projects in an intensive class by Tomás Lozano Perez, for which I trained a Neural Network to recognize human faces, I used Temporal Difference Learning for a board game in which the machine grew sometimes stronger than the user, and for my final project, I experimented with different machine learning methods to extract patterns from the stock market. (more about Invest)



Autonomous Lego Robot

Panayiotis Kellis, Maria Kellis, Manolis Kellis
MIT
RoboAnt

My interest in building and controlling robots was first expressed when I participated in 6.270, the autonomous robot design competition. My brother was a junior in course VI, my sister a sophomore in course II and I was a freshman at the time. Together, we constructed a robot out of Lego pieces, and we programmed it in iC (interactive C) using a provided circuit board. The goal was to create insect-like robots, or more specially, ants. The "ants" were supposed to collect as much "food" as possible during each 60-second face-off. The food was represented by styrofoam blocks. Robots received more points for picking up the blocks and stacking them on top of each other in its "nest," a raised rubber platform. This experience taught me a lot about the difference between programming and reality, and how many things one needed to take care of when programming a system that interacts with the real world, even in an overconstrained environment like the contest table. (visit the RoboAnt contest page). I was thinking that children in high school, and even primary school should have the same opportunity to program autonomous robots. However the overhead from mechanical, electronics and low-level aspects should be reduced for younger children. They should start in a system with instant gratification, where they can write their "Hello, World!" program in a high level language on the first day. Writing complex programs, more complete reactions to the environment, and understanding the tools that make their program really work should be part of further lessons. They should however, immediately see the robot act and react around the lab under their commands and bridge the gap between concrete world actions and abstract programming commands.


Constraint model for a web crawler

Manolis Kellis, Henrik Frystyk Nielsen
World Wide Web Consortium
Webbot Project The summer and fall of 96 i worked at the world wide web consortium, where i developped a simple constraint model for a web robot. The code was written in Tcl and allowed for arbitrarily rules to be expressed in a simple form of recursive deifinitions. (more about the w3c mini Webbot)

Scheme implementation of a Universal Turing Machine

Manolis Kellis
MIT
Turing Project The fourth core computer science class at MIT, 6.004, also known as Computational Structures, introduces students to Turing Machines. I implemented a Universal Turing Machine that can read in the descriptions of machines and simulate them. The code is written in Scheme. A year later, Mark Hershberg ported this tool online for 6.004 students to use when they want to test their FSM descriptions on sample inputs (the entire Turing code is available for download).

Ray Tracing, Lighting, Modelling

Manolis Kellis, Seth Teller
MIT LCS
Graphics

The picture on the right was generated by a ray tracer that I wrote using C and OpenGL (see example 1 or 2). Other parts of the graphics pipeline I implemented were color interpolation, normal interpolation for Gourauld shading, casting shadows, and handling transparency. Other graphics work I did included a 3D model of the MIT campus (with some artistic liberties taken), including animations of a rowing race and traffic cars. Here's a contact-inspired graphic. The rings were animated.


Online Information Sharing

Manolis Kellis
MIT
HKN Knowledge Market The spring of 1997 i created the HKN Knowledge Market, a online meeting place where students can exchange ideas, information, hints, and advice about classes, problem sets, internships, research. The project is sponsored by HKN (more about Knowledge Market)


How meta-information directs evolution

Manolis Kellis
MIT
Evolution Paper At the same time, in Marvin Minsky's Society of Mind I explored a different approach to AI, interested more in understanding how the human brain works from a computational standpoint rather than in creating powerful machines unrelated to human capabilities. For Minsky's class, I did some research on evolution and the human DNA and wrote a paper on how along with the genetic data, meta-information can be propagated to direct the interpretation of the raw data. This theory allowed for the evolving species' DNA to learn not only about surviving, but also about evolving. This twist provided a simple answer to some questions Darwin left unexplained, such as the exponential increase in evolutionary speed as species evolve, and the evolutionary bursts that disturb otherwise steady evolutionary periods. Minsky's class helped me understand the human brain as an engineering project, which inspired intuitions both from the biological world to address engineering problems, and from the engineering world to understand how we think, perceive, act. (read the Evolution paper).
Manolis Kellis