Preliminary Thesis Topics for Academic Year 2005 – 06
Update 8/28/05
Listed below are
potential thesis topics for The MIT Data Center. These include
infrastructure development, logistics, supply chain management, operations,
security modeling, and applications in the pharmaceutical, retail, food,
consumer goods, agriculture, heavy manufacturing, and healthcare industries.
This represents a list that students can choose from based on their
interests. We do not plan to complete all of the topics in 2006.
Some of the ideas listed below were suggested by collaborators to The MIT
Data Center.
To learn more about
any of these topics, contact Ed Schuster at
Edmund_w@mit.edu
Meetings will be
held for students to discuss aspects of thesis work on each Friday at 2:00
PM beginning during the Fall semester. Location will generally be the Given
Lounge.
A general guideline
for thesis research conducted by The MIT Data Center is available upon
request.
General Infrastructure and Modeling
1.
Development of the M Dictionary and Language plus Security Modeling
This topic area
involves a wide group of projects that deal with infrastructure development
for interoperable modeling. This work will involve both conceptual and
technical system design, including work with the partners of The MIT Data
Center such as the WordNet (Princeton University – dictionary
microstructure, semantic links, and grammar including noun phrases) and
Oxford University, UK (dictionary Meta-Structure). To date, several
prototypes have been developed, so the objective of these projects is to
refine previous work and test for feasibility in actual operation.
(Dr. David Brock,
The MIT Data Center; Edmund W. Schuster, The MIT Data Center, along with
others participating with the MIT Data Center)
2.
Security and Risk Management
The current regulatory
environment and growing use of open, distributed, information
technology-powered business processes is driving the demand for effective
models that aid in decisions regarding security and associated operational
and business risk. Such models must integrate data from heterogeneous,
independently developed systems for managing threats, vulnerabilities,
counter-measures as well as physical assets such as servers, desktops, PDAs,
networks, applications and data storage. This research project will extend
existing standards work from organizations such as the Distributed
Management Task Force (DMTF) and the IT Infrastructure Library (ITIL) by
using the M language to create a common semantic model. This IT Security
Risk Model will enable subsequent definition and implementation of security
metrics and models that will, for the first time, provide the insight
required to improve business decisions in the face of changing security,
technology, and economic conditions.
(Dr. Elizabeth A.
Nichols, CTO, ClearPoint Metrics, Inc.; Dr. David Brock, The MIT Data
Center; Edmund W. Schuster, The MIT Data Center)
3. Business Service
Reliability and Availability Management
Modern data centers are
currently managed according to five standard disciplines: fault,
performance, configuration, accounting/usage, and security. Often the
administrative tools and processes for these disciplines are defined
separately and implemented as operational islands. Unifying these islands
requires a common semantic model for the data center resident resources as
well as a mapping to the business services that they enable. Using the M
language and Semantic Modeling, an interoperable set of data center and
business service models will be identified, formally defined, implemented,
and placed into a searchable catalogue. An integrated design environment
will be implemented to search, qualify, and compose multiple,
cross-disciplinary models. Execution of these composite models will be
orchestrated in a distributed computational grid to provide real-time
command and control that institutionalizes the best management practices
whose principals are embodied in the models.
(Dr. Elizabeth A.
Nichols, CTO, ClearPoint Metrics, Inc.; Dr. David Brock, The MIT Data
Center; Edmund W. Schuster, The MIT Data Center)
Food,
Agriculture, Retail, Pharmaceutical, and Consumer Goods Industries
1.
Improved New Product Forecasting though Visualization of Spatial Diffusion
Forecasting
demand for a new product is a particularly difficult task. Part of the
reason that new product forecasting is such a challenging problem involves
the way consumers adopt a product within a defined space. Early studies have
noted customer adoption is not spatially uniform. Clusters of adopters tend
to form and grow or contract with time. This research deals with the spatial
diffusion process in the context of introducing new products into markets.
Advances in technology including visualization, along with innovative
digital mapping technology and new ways of interoperating mathematical
models give improved ways to track spatial diffusion resulting in better
forecasting and supply chain coordination.
The
research utilizes the M language to integrate models and data along with
mapping technology initially developed in civil engineering and geodetic
science.
The goal is
to build a real time system to monitor spatial diffusion for a consumer
goods product and to integrate models needed for decision-making in such
areas as amount of advertising and logistical control.
(Edmund
W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn
State; and others)
2.
Analysis of Real Estate Value using Travel Budgets and Spatial Diffusion
Understanding the pattern of real estate prices is complex problem that
depends on economic growth and the transportation behavior of those living
within a geographic area. Using travel budget techniques developed by Dr.
Stanley B. Gershwin of MIT, this research will develop a family of models to
predict the price of real estate in urban areas through time.
The M
Language will be used to link models and data together by overcoming data
semantic issues commonly encountered in governmental (local, state and
federal) data gathered about urban areas.
(Dr.
Stanley B. Gershwin, Department of Mechanical Engineering, MIT; Edmund W.
Schuster, The MIT Data Center)
3.
Building a Distributed
Enterprise
Resource Planning (ERP) System using Semantic Modeling and the M Language
A large
portion of decision-making within business utilizes mathematical models.
The backbone for any organization to employ mathematical models is the ERP
system. Currently, the ERP systems are sold as packaged software with a
pre-selected set of models. This research project will create a new type of
ERP system where models become interchangeable parts resident within a
repository located on the Internet. The idea is to make modeling and
business decision-making more flexible in response to changes in business
conditions.
(Edmund
W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn
State)
4. Patterns in Data for
more Secure Retail Commerce
I am interested in the idea
that patterns in data, even small amounts of data, can enhance security of
retail commerce. As an example, today I routinely sign my name at retail
outlets when I use a credit card. The signature goes into a computer, but
it does not seem to reject signatures that are not mine at time of
purchase. It should be possible to do this with appropriate pattern
recognition, statistics, and infrastructure.
(Prof. Dan Frey, Dept. of
Mechanical Engineering, MIT)
5. Monitoring for Money
Laundering
The entire body of
transactions, both in a bank where money might be laundered, and at other
institutions, should be monitored to detect criminal activity. This is
difficult because money launderers may take precautions to appear
unsuspicious.
(Prof. Dan Frey, Dept. of
Mechanical Engineering, MIT)
6. Making the Internet
Less Anonymous
At present, transactions
over the internet are relatively anonymous. People can easily either not
disclose their identity or pretend to be someone else. This can lead to
substantially increased incidence of misbehavior of many kinds. Imagine if
it were possible to voluntarily announce one's identity and for the person
at the other end to confirm that identity with a high degree of certainty.
If that were possible, many people would insist on taking that step before
interacting on-line. There might for example be chat rooms that require you
announce who you are truthfully before entering. This project would seek to
develop technology to enable this function.
(Prof. Dan Frey, Dept. of
Mechanical Engineering, MIT)
7.
Improving the Productivity of Marketing Science Models
One of the
barriers to greater use of marketing science models in practice is the speed
that a particular model can be applied to a practical problem. This
research will utilize Semantic Modeling and the M Language to build an
interoperable modeling environment where marketing science models can be
quickly applied to problems.
(Edmund
W. Schuster, The MIT Data Center)
8. An
Approach to Building a Repository of Production Planning and Scheduling
Models for the Process Industries
The
literature lists hundreds of models for planning and scheduling of
manufactured end times. While it is generally true that all models
published in operations management and other journals are fully refereed
providing details of implementation, most of this work involves only a
single application. Managers face a significant problem in choosing the
right planning model for a specific manufacturing process that will yield
the best results in practice.
By using
the M Language and Semantic Modeling, an interoperable set of models could
be set where a specific model could be matched to a problem at hand and the
data available.
(Edmund
W. Schuster, The MIT Data Center; Edmund W. Schuster, The MIT Data Center)
9.
Developing a Modeling Approach to Achieve Capacitated Materials
Requirements Planning (MRP) in Practice
One of the
characteristics of MRP systems is that planning occurs on an infinite
capacity basis. Given real world situations, this is not a realistic
assumption especially within the process industries. The current thinking
is that a single model will be too complex to achieve capacitated MRP.
Rather, a library of models are needed that can be mixed and matched.
Using the M
Language, this research project will integrate existing models needed to
achieve capacitated MRP in practice.
(Edmund
W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn
State)
10.
Modeling Risk for Logistics Problems in Agriculture
Perhaps one
of the oldest problems in statistics, agriculture depends a great deal on
the weather, which varies in ways that can be statistically measured and
analyzed.
This
research develops stochastic models to optimize the risk associated with
agricultural operations such as harvesting crops. The goal to maximize the
size of the harvest given weather conditions. Mathematical development
on several topics can be downloaded from:
The Three Risk
Problem as Related to Harvesting
An Alternative Way to Solve the
Jones Formulation for Risk of Production
(Stuart
J. Allen, Prof. Emeritus Penn State; Edmund W. Schuster, The MIT Data
Center)
11.
Lot-Sizing for Short Life-Cycle Product
A number of
products ranging from fashion apparel to high tech equipment experiences a
short life cycle. With out using the proper models to determine production,
manufacturers run the risk of holding huge inventories of obsolete product.
By building
a family of models using the M Language, manufacturers can rapidly apply the
correct model for determining the proper lot-size for short life-cycle
products.
(Edmund
W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn
State)
12. Service Parts Inventory Modeling by Integrating
Reliability Analysis
Using the M Language, this research will integrate the models
needed to determine the optimal inventory levels for spare parts given
reliability data as an input.
(Edmund
W. Schuster, The MIT Data Center; Pinaki Kar, New York City)
13.
Manufacturing Schedule Stability Under Conditions of Finite Capacity
Changing
production schedules account for a great deal of productivity loss for
manufactures. This project integrates various models to create the best
chance of a stable production schedule.
(Edmund
W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn
State)
14.
Safety Stock Calculation Using Non Normal Probability Distributions
It is
seldom that demand is normally distributed for most consumer goods
companies, yet most of the inventory models assume normality. This research
integrates models needed to calculate non-normal probability distributions
for inventory planning within the consumer goods industry.
(Edmund
W. Schuster, The MIT Data Center; Stuart J. Allen, Prof. Emeritus, Penn
State)
Heavy
Industries
1. Data Semantics Within the Oil Services Industry
Companies have been conducting oil field studies for more than
four decades. Currently there is no single standard for any of this data.
Using the M Dictionary, a semantic approach will be used to integrate this
data.
(Dr.
David Brock, The MIT Data Center; John Eul, The MIT Data Center, Edmund W.
Schuster, The MIT Data Center)
2.
Air Transportation Modeling in
Asia using Semantic Modeling and the M Language
Paralleling
the economic expansion of the
Asia region, air transportation (both passenger and freight) is
forecasted to grow a significant amount over the next ten years. Working
with a Japanese trading company, this project will establish a method to
integrate Asian data from different regions with established transportation
modeling approaches to predict where growth will occur.
(Prof.
R. John Hansman, MIT; A Japanese Industrial Partner; Edmund W. Schuster, The
MIT Data Center)
3. Ocean Transport Planning Model
based on Semantic Modeling and M Language applied to Industrial Shipping
The planning model of ocean transport,
like any other transport mode, starts from the strategic level and ends-up
at the operational level. This means one should first decide: (a) where to
deploy the fleet and (b) and which ships to use (Fleet Design Problem). Once
knowing which ships you have to use, and where they should be operating at,
then you come up to a problem that is to decide (c) where the inventory that
is based at the harbor terminals alongside the cost should go to (this is
the case of industrial shipping, when the owner of the cargo also manages
its own fleet aiming at minimizing transportation costs). That is called the
Inventory Routing Problem as you might know. Finally, you need to (d) rout
the ships and to (e) schedule them (Ship Routing and Scheduling Problem) at
the very operational level.
By reading the literature related to
this topic (and here I recall Christiansen, Fagerholt and Ronen, 2003), you
will notice that each part of the whole problem has been solved in many
different ways using many different OR techniques, and very few connect the
models created for each part of the planning hierarchy through programming
because of the size of the problems. I think it would be quite nice to have
Semantic Modeling and M Language applied to modeling and connecting the
problems of Fleet Design, Fleet Deployment, Inventory Routing, Fleet
Routing, Cargo Loading (stowage of cargo holds of ships) and Fleet
Scheduling.
(Luiz Otto Abdenur, University of Sao
Paulo; Edmund W. Schuster, The MIT Data Center)
Healthcare
1.
Establishing a Semantic Data Standard for Genetic and Biotechnology
Industries
No single
data standard exists for medical data associated with genetic testing.
Using the M-Dictionary, a standard will be established that will integrate
existing and future data.
(Dr. David
Brock, The MIT Data Center; Edmund W. Schuster, The MIT Data Center)
The Data
Center News Letter
Published Six Times Per Year