The Data Center

Preliminary Thesis Topics for Academic Year 2005 – 06

Update 8/28/05

Listed below are potential thesis topics for The MIT Data Center. These include infrastructure development, logistics, supply chain management, operations, security modeling, and applications in the pharmaceutical, retail, food, consumer goods, agriculture, heavy manufacturing, and healthcare industries. This represents a list that students can choose from based on their interests. We do not plan to complete all of the topics in 2006. Some of the ideas listed below were suggested by collaborators to The MIT Data Center.

To learn more about any of these topics, contact Ed Schuster at Edmund_w@mit.edu

Meetings will be held for students to discuss aspects of thesis work on each Friday at 2:00 PM beginning during the Fall semester. Location will generally be the Given Lounge.

A general guideline for thesis research conducted by The MIT Data Center is available upon request.

General Infrastructure and Modeling

1. Development of the M Dictionary and Language plus Security Modeling

This topic area involves a wide group of projects that deal with infrastructure development for interoperable modeling. This work will involve both conceptual and technical system design, including work with the partners of The MIT Data Center such as the WordNet (Princeton University – dictionary microstructure, semantic links, and grammar including noun phrases) and Oxford University, UK (dictionary Meta-Structure). To date, several prototypes have been developed, so the objective of these projects is to refine previous work and test for feasibility in actual operation.

(Dr. David Brock, The MIT Data Center; Edmund W. Schuster, The MIT Data Center, along with others participating with the MIT Data Center)

2. Security and Risk Management

The current regulatory environment and growing use of open, distributed, information technology-powered business processes is driving the demand for effective models that aid in decisions regarding security and associated operational and business risk. Such models must integrate data from heterogeneous, independently developed systems for managing threats, vulnerabilities, counter-measures as well as physical assets such as servers, desktops, PDAs, networks, applications and data storage. This research project will extend existing standards work from organizations such as the Distributed Management Task Force (DMTF) and the IT Infrastructure Library (ITIL) by using the M language to create a common semantic model. This IT Security Risk Model will enable subsequent definition and implementation of security metrics and models that will, for the first time, provide the insight required to improve business decisions in the face of changing security, technology, and economic conditions.

(Dr. Elizabeth A. Nichols, CTO, ClearPoint Metrics, Inc.; Dr. David Brock, The MIT Data Center; Edmund W. Schuster, The MIT Data Center)

3. Business Service Reliability and Availability Management

Modern data centers are currently managed according to five standard disciplines: fault, performance, configuration, accounting/usage, and security. Often the administrative tools and processes for these disciplines are defined separately and implemented as operational islands. Unifying these islands requires a common semantic model for the data center resident resources as well as a mapping to the business services that they enable. Using the M language and Semantic Modeling, an interoperable set of data center and business service models will be identified, formally defined, implemented, and placed into a searchable catalogue. An integrated design environment will be implemented to search, qualify, and compose multiple, cross-disciplinary models. Execution of these composite models will be orchestrated in a distributed computational grid to provide real-time command and control that institutionalizes the best management practices whose principals are embodied in the models.

(Dr. Elizabeth A. Nichols, CTO, ClearPoint Metrics, Inc.; Dr. David Brock, The MIT Data Center; Edmund W. Schuster, The MIT Data Center)

Food, Agriculture, Retail, Pharmaceutical, and Consumer Goods Industries

1. Improved New Product Forecasting though Visualization of Spatial Diffusion

Forecasting demand for a new product is a particularly difficult task. Part of the reason that new product forecasting is such a challenging problem involves the way consumers adopt a product within a defined space. Early studies have noted customer adoption is not spatially uniform. Clusters of adopters tend to form and grow or contract with time. This research deals with the spatial diffusion process in the context of introducing new products into markets. Advances in technology including visualization, along with innovative digital mapping technology and new ways of interoperating mathematical models give improved ways to track spatial diffusion resulting in better forecasting and supply chain coordination.

The research utilizes the M language to integrate models and data along with mapping technology initially developed in civil engineering and geodetic science.

The goal is to build a real time system to monitor spatial diffusion for a consumer goods product and to integrate models needed for decision-making in such areas as amount of advertising and logistical control.

(Edmund W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn State; and others)

2. Analysis of Real Estate Value using Travel Budgets and Spatial Diffusion

Understanding the pattern of real estate prices is complex problem that depends on economic growth and the transportation behavior of those living within a geographic area. Using travel budget techniques developed by Dr. Stanley B. Gershwin of MIT, this research will develop a family of models to predict the price of real estate in urban areas through time.

The M Language will be used to link models and data together by overcoming data semantic issues commonly encountered in governmental (local, state and federal) data gathered about urban areas.

(Dr. Stanley B. Gershwin, Department of Mechanical Engineering, MIT; Edmund W. Schuster, The MIT Data Center)

3. Building a Distributed Enterprise Resource Planning (ERP) System using Semantic Modeling and the M Language

A large portion of decision-making within business utilizes mathematical models. The backbone for any organization to employ mathematical models is the ERP system. Currently, the ERP systems are sold as packaged software with a pre-selected set of models. This research project will create a new type of ERP system where models become interchangeable parts resident within a repository located on the Internet. The idea is to make modeling and business decision-making more flexible in response to changes in business conditions.

(Edmund W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn State)

4. Patterns in Data for more Secure Retail Commerce

I am interested in the idea that patterns in data, even small amounts of data, can enhance security of retail commerce. As an example, today I routinely sign my name at retail outlets when I use a credit card. The signature goes into a computer, but it does not seem to reject signatures that are not mine at time of purchase. It should be possible to do this with appropriate pattern recognition, statistics, and infrastructure.

(Prof. Dan Frey, Dept. of Mechanical Engineering, MIT)

5. Monitoring for Money Laundering

The entire body of transactions, both in a bank where money might be laundered, and at other institutions, should be monitored to detect criminal activity. This is difficult because money launderers may take precautions to appear unsuspicious.

(Prof. Dan Frey, Dept. of Mechanical Engineering, MIT)

6. Making the Internet Less Anonymous

At present, transactions over the internet are relatively anonymous. People can easily either not disclose their identity or pretend to be someone else. This can lead to substantially increased incidence of misbehavior of many kinds. Imagine if it were possible to voluntarily announce one's identity and for the person at the other end to confirm that identity with a high degree of certainty. If that were possible, many people would insist on taking that step before interacting on-line. There might for example be chat rooms that require you announce who you are truthfully before entering. This project would seek to develop technology to enable this function.

(Prof. Dan Frey, Dept. of Mechanical Engineering, MIT)

7. Improving the Productivity of Marketing Science Models

One of the barriers to greater use of marketing science models in practice is the speed that a particular model can be applied to a practical problem. This research will utilize Semantic Modeling and the M Language to build an interoperable modeling environment where marketing science models can be quickly applied to problems.

(Edmund W. Schuster, The MIT Data Center)

8. An Approach to Building a Repository of Production Planning and Scheduling Models for the Process Industries

The literature lists hundreds of models for planning and scheduling of manufactured end times. While it is generally true that all models published in operations management and other journals are fully refereed providing details of implementation, most of this work involves only a single application. Managers face a significant problem in choosing the right planning model for a specific manufacturing process that will yield the best results in practice.

By using the M Language and Semantic Modeling, an interoperable set of models could be set where a specific model could be matched to a problem at hand and the data available.

(Edmund W. Schuster, The MIT Data Center; Edmund W. Schuster, The MIT Data Center)

9. Developing a Modeling Approach to Achieve Capacitated Materials Requirements Planning (MRP) in Practice

One of the characteristics of MRP systems is that planning occurs on an infinite capacity basis. Given real world situations, this is not a realistic assumption especially within the process industries. The current thinking is that a single model will be too complex to achieve capacitated MRP. Rather, a library of models are needed that can be mixed and matched.

Using the M Language, this research project will integrate existing models needed to achieve capacitated MRP in practice.

(Edmund W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn State)

10. Modeling Risk for Logistics Problems in Agriculture

Perhaps one of the oldest problems in statistics, agriculture depends a great deal on the weather, which varies in ways that can be statistically measured and analyzed.

This research develops stochastic models to optimize the risk associated with agricultural operations such as harvesting crops. The goal to maximize the size of the harvest given weather conditions. Mathematical development on several topics can be downloaded from:

The Three Risk Problem as Related to Harvesting

An Alternative Way to Solve the Jones Formulation for Risk of Production

(Stuart J. Allen, Prof. Emeritus Penn State; Edmund W. Schuster, The MIT Data Center)

11. Lot-Sizing for Short Life-Cycle Product

A number of products ranging from fashion apparel to high tech equipment experiences a short life cycle. With out using the proper models to determine production, manufacturers run the risk of holding huge inventories of obsolete product.

By building a family of models using the M Language, manufacturers can rapidly apply the correct model for determining the proper lot-size for short life-cycle products.

(Edmund W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn State)

12. Service Parts Inventory Modeling by Integrating Reliability Analysis

Using the M Language, this research will integrate the models needed to determine the optimal inventory levels for spare parts given reliability data as an input.

(Edmund W. Schuster, The MIT Data Center; Pinaki Kar, New York City)

13. Manufacturing Schedule Stability Under Conditions of Finite Capacity

Changing production schedules account for a great deal of productivity loss for manufactures. This project integrates various models to create the best chance of a stable production schedule.

(Edmund W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn State)

14. Safety Stock Calculation Using Non Normal Probability Distributions

It is seldom that demand is normally distributed for most consumer goods companies, yet most of the inventory models assume normality. This research integrates models needed to calculate non-normal probability distributions for inventory planning within the consumer goods industry.

(Edmund W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn State)

Heavy Industries

1. Data Semantics Within the Oil Services Industry

Companies have been conducting oil field studies for more than four decades. Currently there is no single standard for any of this data. Using the M Dictionary, a semantic approach will be used to integrate this data.

(Dr. David Brock, The MIT Data Center; John Eul, The MIT Data Center, Edmund W. Schuster, The MIT Data Center)

2. Air Transportation Modeling in Asia using Semantic Modeling and the M Language

Paralleling the economic expansion of the Asia region, air transportation (both passenger and freight) is forecasted to grow a significant amount over the next ten years. Working with a Japanese trading company, this project will establish a method to integrate Asian data from different regions with established transportation modeling approaches to predict where growth will occur.

(Prof. R. John Hansman, MIT; A Japanese Industrial Partner; Edmund W. Schuster, The MIT Data Center)

3. Ocean Transport Planning Model based on Semantic Modeling and M Language applied to Industrial Shipping

The planning model of ocean transport, like any other transport mode, starts from the strategic level and ends-up at the operational level. This means one should first decide: (a) where to deploy the fleet and (b) and which ships to use (Fleet Design Problem). Once knowing which ships you have to use, and where they should be operating at, then you come up to a problem that is to decide (c) where the inventory that is based at the harbor terminals alongside the cost should go to (this is the case of industrial shipping, when the owner of the cargo also manages its own fleet aiming at minimizing transportation costs). That is called the Inventory Routing Problem as you might know. Finally, you need to (d) rout the ships and to (e) schedule them (Ship Routing and Scheduling Problem) at the very operational level.

By reading the literature related to this topic (and here I recall Christiansen, Fagerholt and Ronen, 2003), you will notice that each part of the whole problem has been solved in many different ways using many different OR techniques, and very few connect the models created for each part of the planning hierarchy through programming because of the size of the problems. I think it would be quite nice to have Semantic Modeling and M Language applied to modeling and connecting the problems of Fleet Design, Fleet Deployment, Inventory Routing, Fleet Routing, Cargo Loading (stowage of cargo holds of ships) and Fleet Scheduling.

(Luiz Otto Abdenur, University of Sao Paulo; Edmund W. Schuster, The MIT Data Center)

Healthcare

1. Establishing a Semantic Data Standard for Genetic and Biotechnology Industries

No single data standard exists for medical data associated with genetic testing. Using the M-Dictionary, a standard will be established that will integrate existing and future data.

(Dr. David Brock, The MIT Data Center; Edmund W. Schuster, The MIT Data Center)

The Data Center News Letter

Published Six Times Per Year