16.412J/6.834J Cognitive Robotics, Spring 2004

Problem Set #1 Responses

Home | Lectures| Readings | Project Suggested Readings | Advanced Lectures | Final Projects


Student

Response

Brian Bairstow

Lawrence Bush

Thomas Coffee

Shuonan Dong

Justin Fox

Ethan Howe

Kaijen Hsiao

Tony Jiminez

Axel Kilian

Henry Lefebvre de Plinval-Salgues

James Lenfestey

Brian Mihok

Jason Miller

Emmanuel Munguia-Tapia

Jennifer Novosad

Jeremie Pouly

Shen Qu

Tom Temple

Michael Terry

 

Brian Bairstow

Part A:

Cooperative Robotics

In cooperative robotics, the group of robots have the same goals, and thus it is most efficient if they work together to achieve those goals. They can simultaneously work on different aspects of the goals, or can combine their efforts to do tasks that aren’t possible individually. If the problem is simple, or if enough is known about the situation, then the individuals can deduce what the other robots will do and thus what they need to do. However, this is often not the case and then coordination is required.

Machine Learning

Machine learning is essentially the recognition and extrapolation of patterns. This can be important to robotics, since the “artificial intelligence” of current robots is typically a set of commands for various circumstances. However, with elegant enough algorithms for machine learning, one can imagine a machine that can eventually teach itself unexpected things. This capacity to learn could develop an “artificial intelligence” which is more meaningful than an array of specifically instructed behaviors.

Neural Networks

Neural networks consist of nodes which pass data via links to other nodes. They are good at learning because the weightings of the links can be altered to adjust the output of the neural network. Thus the network can be trained on data until the outputs match the expected outputs. One use for this is to make parametric approximations of complicated models, so that a neural network can quickly come up with low fidelity results.

Part D:

Task Directed Imaging in Unstructured Environments by Cooperating Robots
Vivek A. Sujan

This paper covers visual sensing strategies for robots working on a task. The issue is to place sensors in order to see certain targets. Depending on the degree of sensor articulation, there is more or less leeway. With multiple robots sharing information, the problem of optimizing camera placements is more interesting. Sensor fusion can be used to combine the data, and areas which aren’t visible from one robot could be viewed with the other robot. A goal is to create a 3D geometric model by using multiple sensors.
The paper goes on to describe an algorithm to optimize camera pose while considering factors such as depth of field, target resolution, and visibility. A simulation was run on the algorithm.
This paper brings up a useful topic for rovers working together in the same area. One consideration in planning of cooperative robots is combining their sensor data in useful ways. For instance, when traversing difficult terrain it can be useful for a stationary rover to observe a moving rover to make sure everything goes well. This paper is not applicable for programmatic planning such as how to assign rovers to different geographic areas, except that it suggests that rovers can work well together.


Using Cooperative Robots for Explosive Ordinance Disposal
James McLurkin

This paper discussed techniques for using a number of cooperative robots for unexploded ordinance removal. Typically this will take place in an unconstrained and unknown environment, which is the case with Mars exploration. Ordinance removal is one of the most applicable areas for autonomous cooperative robots. Due to the large number of mines in a minefield and the desire to find them quickly, multiple robots would be used, and thus cooperation is useful to make them efficient. This paper explored using a “behavior-based approach to form a structured community from the local interactions of simple individuals.” These behaviors included clustering and dispersal to produce a swarm with the desired density of robots. Experiments were done with micro-robots to analyze the applicability of the behaviors.
This is applicable to Mars exploration in that it discusses how to cover ground with multiple robots. It is a good examination of cooperative techniques. The strategies used are probably too simple for Mars exploration, and they promote redundancy to ensure that mines are found. In Mars exploration redundancy is inefficient, because though you might find an interesting objective you would have missed, you would be better off covering brand new territory.
This paper relates to a different scope than the imaging paper, in that this is related to how to move the rovers relative to each other in order to find an objective, while the imaging paper was more interesting in placing the rovers relative to each other after the objective is found.

Scene Understanding using Cooperative Robotics
Declan O’Beirne, Michael Schukat

This paper is about object recognition combined with cooperative mapping. The goal is for a team of robots to map a complex environment using the objects within as landmarks. The benefits of cooperation in mapping include concurrency, reliability, and accuracy. Greater accuracy comes from multiple robots viewing the same object, which should provide a more accurate estimate of position and ameliorate sensor errors. Also multiple robots can view an object from different angles to get better 3D data. This 3D data can be used to create 2D images for object recognition.
This topic is similar to the first Imaging paper in that it concerns relative positions of robots in observing and object. This paper is more focused on mapping using visual object recognition. Like the Imaging paper, it suggests advantages for performing exploration with multiple cooperative rovers. However, positioning rovers for Martian surface exploration could be done differently with techniques such as GPS.

Part E:

I think an interesting topic would involve autonomous cooperative robots for unexploded ordinance removal. Utility would be provided by quickly and redundantly searching area. Number of robots and strategy used would be variables. Modeling the environment and the robots themselves might be a lot of work even before strategies could be implemented.
The world could be modeled as a grid, and the robots could move around up, down, left, and right. Exploring a space once could be worth 2 points, while exploring it a second time could be worth 1 additional point. The project could explore different strategies to try to maximize the score achieved in a minimum number of moves. The world should be extremely simple at first, because even that could take a lot of coding, and it is not the interesting part of the project.
This is a project I would be interested in pursuing.

Part A: Topics of Fascination


In this section I will discuss the following three topics of interest in the area of reasoning applied to robotics:


Collaborative Task Execution
Resource Allocation
Learning


Collaborative Task Execution and Resource Allocation


Collaborative Task Execution and Resource Allocation are closely related. Therefore, I will address them
together. The objective of collaboration is to produce a higher quality result than could be achieved alone.
The objective of resource allocation is to increase efficiency or lower costs. These two concepts embody
the quality – cost tradeoff of many operations research optimization problems.
The central idea is that teams or colonies of robots would need to be able to self organize in a way that
allocates them in an efficient manner. The tasks that need to be done would change as time goes on.
For a given task, the robots need to decide who does what. For example, in a search and rescue operation,
the robots need to spread out over an area to search for the missing person.
The way in which the robots are deployed will significantly affect the success of the mission. Suppose that
prior knowledge indicates that the missing person is more likely to be in one location over another. Also,
certain areas are easier (lower cost) to search than others. Consequently, if each robot operates
independently, they may all choose the area that is easiest to search and has the highest probability of
success while neglecting other areas. A more optimal solution can be found. Therefore, another way to
collaborate is desirable for this application.
The robot allocation could be centrally planned and modeled as a Markov Decision Process (MDP) or
constraint satisfaction problem or it could be constructed using a market-based approach. These are very
interesting approaches and will be discussed more in Part D.


Learning


Learning is a very broad topic in artificial intelligence. The index of Artificial Intelligence, A Modern
Approach 2nd Edition, by Stuart Russell and Peter Norvig (AIMA) lists over 100 pages in the book that
address the topic of learning. The broad reaching nature of the topic is captured in titles like “universal
reinforcement learner.” While that goal seems intractable, robots still need to be able to perceive reason
and act in new and changing environments.
A variety of learning methods have been studied for many applications. Traditional learning algorithms
have a narrow application scope. These approaches are very interesting, important and challenging.
However, robots need to learn from experience in a more general sense. They need to learn to identify
what is important, internalize this into their knowledge base and use it to make decisions in the future.
Two approaches to this problem are recurrent neural networks (http://www.idsia.ch/~juergen/rnn.html) and
optimal ordered problem solver (http://www.idsia.ch/~juergen/oops.html). The referenced web site is flat
out fascinating. With that said, I find this reasoning area to be overly ambitious. My approach to a
problem this big is to nibble away at the edges. In short, I intend to choose a problem that is more specific
and narrower in scope.


A Summary of Robotics


This section provides an overall view of robotics. I wrote this section in order to become familiar with the
different types of robots, sensors, reasoning techniques and applications in the field. This section
essentially summarizes Chapter 25 of AIMA.
This section is organized in the following order:


• Sensors
• Reasoning
• Types of Robots
• Applications


A robot is a feedback and control system that involves sensing and reasoning about its environments, which
leads to a decision on how to act. The first three parts of this section are organized in the order in which
they are used in a feedback and control system. The final part covers how robots can be applied.


Sensors
There are many kinds of sensors that a robot can use [1] to sense its environment:
Range Finders
Laser Range Finder
Sonar
Global Positioning System
Tactile Sensors (whiskers)
Imaging Sensors
Camera
Infrared
Other
Temperature
Odor
Acoustic
Velocity
Light Intensity


The sensors that are chosen for the particular application become part of the feedback loop described
above. This feedback loop is sometimes called perception. Perception is essentially a filtering task, which
involves sensing, updating the internal belief state and acting. For example, the important and well-studied
localization problem involves sensing the robot’s location (i.e. using a range finder), updating the robot’s
estimated location (i.e. Kalman Filter) and moving in the direction that leads to the goal.
Proprioceptive Sensors are another class of sensors that measure the internal state of the robot. Some
examples of these are:
• Inertial (gyroscope)
• Force / torque sensors
• Odometer
While these measure the internal state of the robot, they can also be used to improve the robot’s
interpretation of its environment. For example, an inertial sensor will reduce the position uncertainty in a
localization task.


Reasoning


After sensing the environment, a robot must update its internal state and make a decision about how to act.
This reasoning step can have a variety of objectives, for example:
• Localization
• Mapping
• Task Planning
• Resource Allocation
• Generic Search
• Learning (in general)


Acting (Types of Robots)


The manifestation of the acting step depends on the type of robot and how it can affect its environment.
Robots can be grouped into 3 types: manipulators, mobile and hybrid robots. Manipulators are able to
change their environment through grasping, pushing, lifting etcetera. An assembly line robot is a popular
example of this type of robot.
A mobile robot is able to move about via wheels or legs. Some examples of mobile robots are unmanned
land vehicle, unmanned air vehicle, autonomous underwater vehicles, and planetary rovers.
A hybrid robot is a mobile robot equipped with manipulators, for example a humanoid robot.


Applications


A robot’s ability to sense, reason and interact with its environment is application dependent. Robots are
used in industry, exploration, health care and personal services. The following are some possible
applications:
• Industry
o Mining
o Agriculture
o Material Handling
o Welding
o Painting
o Harvest
o Excavate Earth
• Transportation
o Of goods between warehouses
o Retrieving items from a warehouse
• Exploration
o 3-D map of a mine
o Mars
o Robotic arms assist astronaut in deploying and retrieving satellites
o Military “drones”
• Health care
o Assist surgeons with instrument placement (heart, eyes, brain)
o Aides for the elderly (robotic walkers, pill reminders)
• Personal Services (perform daily tasks)
o Vacuum cleaners
o Lawn mowers
o Golf caddies
o Snow removal
• Public Safety
o Search and Rescue
o Fire fighting
• Entertainment
o Robot dog
o Robotic soccer
• Human Augmentation
o Walking machines
o Robotic teleoperation (hepatic interface)


Current Research Areas


The above section provides a survey of the relevant areas of Robotics. This section serves to identify
where the current research areas are. In order to do this, I surveyed the research accepted at the
Proceedings of the Fourth International Cognitive Robotics Workshop, Valencia, Spain, August 23-24,
2004. I took the approach of summarizing every paper at the conference in order to find out what is
important.
I review each paper in a necessarily short format, which includes a summary of the main components of the
paper and a list of the application, type of reasoning and techniques used to solve the problem.


Paper: Specifying Failure and Progress Conditions in a Behavior-Based Robot Programming System,
Authors: F. Kabanza, K. B. Lamine
Uses Linear Temporal Logic to monitor the progress of a behavior-based robotic system toward a global
goal.
Application: Navigation
Form of Reasoning: Progress Monitoring
Techniques: Linear Temporal Logic


Paper: Robel: Synthesizing and Controlling Complex Robust Robot Behaviors,
Authors: B. Morisset, G. Infante, M. Ghallab, F. Ingrand

Uses a Markov Decision Process to model / learn the relationship between the supervision state and the
proper action. The objective is a high level task, thus the actions are modeled using a Hierarchical Task
Network.
Application: Navigation
Form of Reasoning: Progress Monitoring and Policy Learning
Techniques: MDP


Paper: Patterns in Reactive Programs,
Author: P. Haslum


Discusses analogous software design patterns and reactive task procedures.
Application: Unmanned Air Vehicle Navigation and target tracking
Form of Reasoning: Navigation and Tracking
Techniques: Object Oriented Programming and Design


Paper: Decision Theoretic Planning for Playing Table Soccer,
Authors: M. Tacke, T. Weigel, B. Nebel


Maximum expected utility action selection (decision theoretic planning) using forward simulation.
Application: Robotic Table Soccer (Foosball)
Form of Reasoning: Planning
Techniques: Forward Simulation


Paper: On-line Decision Theoretic Golog for Unpredictable Domains,
Authors: A. Ferrein, C. Fritz, G. Lakemeyer


This paper integrates a decision theoretic planner with real-time updatable location uncertainty.
Application: Robotic Soccer (RoboCup)
Form of Reasoning: Planning
Techniques: Decision Theoretic Planning


Paper: Learning Partially Observable Action Models,
Author: E. Amir


This paper presents principles and algorithms for learning and filtering a logical action model.
Application: Turning on a light.
Form of Reasoning: Learning
Techniques: MDP, POMDP, HMM, EM, STRIPS


Paper: Building Polygonal Maps from Laser Range Data,
Authors: J. Latecki, R. Lakaemper, X. Sun, D. Wolter


This paper presents a heavily feature based localization and mapping approach (to the exclusion of
odometer data).
Application: Generic Navigation
Form of Reasoning: Localization and Mapping
Techniques: Shape Based Object Recognition


Paper: Hierarchical Voronoi-based Route Graph Representations for Planning, Spatial
Reasoning and Communication,
Authors: J. O. Wallgr¨un


This paper presents a new graph based path planning approach.
Application: Office Robot
Form of Reasoning: Path Planning
Techniques: Spatial Representation, Graph Model


Paper: Schematized Maps for Robot Guidance,
Authors: D. Wolter, K-F Richter
This paper presents a multi-robot central localization, mapping and path planning system.
Application: Service Robots
Form of Reasoning: Navigation
Techniques: Shape Matching, Multi-robot learning


Paper: How can I, robot, pick up that object with my hand,
Authors: A. Morales, P.J. Sanz, A.P. del Pobil
This paper presents vision based feature selection learning for reliable grasp comparison and prediction.
Application: Service robots
Form of Reasoning: Grasping
Techniques: Nearest Neighbor Classification


Paper: On Ability to Automatically Execute Agent Programs with Sensing,
Authors: S. Sardina, G. DeGiacomo, Y. Lesperance, H. Levesque


This paper addresses the inconsistency of real time sensing uncertainty into an entailment and consistencybased
paradigm.
Application: Lumberjacking
Form of Reasoning: Planning
Techniques: Situation Calculus (Golog)


Paper: Have Another Look: On Failures and Recovery in Perceptual Anchoring,
Authors: M. Broxvall, S. Coradeschi, L. Karlsson, A. Saotti


This paper addresses perceptual error recognition and recovery.
Application: Generic Navigation
Form of Reasoning: Knowledge Based Planning, Recovery from Perceptual Errors
Techniques: Anchoring


Paper: Flexible Interval Planning in Concurrent Temporal Golog,
Authors: A. Finzi, F. Pirrri


This paper discusses resource allocation and the exploration / exploitation tradeoff.
Application: Search and Rescue team
Form of Reasoning: Multi-Agent Planning and Scheduling of Localization and Mapping
Techniques: Situation Calculus (Golog)


Paper: Imitation and Social Learning for Synthetic Characters,
Authors: D. Buchsbaum, B. Blumberg, C. Breazeaal


This paper discusses a method to identify actions and goals through imitation.
Application: Watching Cartoons
Form of Reasoning: Action and Goal Identification, Imitation Learning
Techniques: Simulation theory


Paper: On Reasoning and Planning in Real-Time: An LDS-Based Approach,
Authors: M. Asker, J. Malec


This paper addresses the complexity of Labeled Deduction Systems for planning.
Application: Theoretical
Form of Reasoning: Planning
Techniques: Labeled Deductive Systems,


Paper: Exploiting Qualitative Spatial Neighborhoods in the Situation Calculus,
Authors: F. Dylla, R. Moratz


This paper integrates a spatial theory into situation calculus (Golog) for robot navigation.
Application: General robot navigation.
Form of Reasoning: Localization, Mapping and Navigation
Techniques: Dipole Calculus, Line Segment Calculus, and Situation Calculus

Part D: Researching a Critical Reasoning Method


In this section I discuss three papers in the area of multi-agent planning. The three papers are applicable to
the resource allocation reasoning capability described in Part B. Specifically, the application is a robotic
search and rescue team. This team would be capable of collaboratively searching a pre-defined area. The
search plan would need to balance multiple priorities, robot abilities and initial conditions. The team would
be connected via wireless communication. One of the reasoning capabilities that this would require is
resource allocation.
The three papers each provide a different perspective on how to solve the multi-agent planning / resource
allocation problem in a distributed system. I selected these three papers primarily because they represent
the 3 main schools of thought in this area, which are:


• Distributed Markov Decision Process
• Distributed Constraint Satisfaction
• Market Based Framework


Title: Using Cooperative Mediation to Solve Distributed Constraint Satisfaction Problems
Authors: Mailler,Roger; and Lesser,Victor
Publication: Proceedings of Third International Joint Conference on Autonomous Agents and
MultiAgent Systems (AAMAS 2004)
Why I selected this paper:
I selected this paper because it is relevant to the problem (distributed resource allocation), it used one of the
three primary ways of addressing the problem, it was in a respected publication (AAMAS 2004) and I was
impressed by one of it’s authors, Victor Lesser, who had a mind-boggling 7 papers accepted to the same
conference.
Major contribution:
This paper presents a new algorithm (Asynchronous Partial Overlay) to solve the distributed constraint
satisfaction problem. The major contribution involves a mediator agent, who takes over the process
(effectively centralizing it) when the distributed constraint satisfaction (DCSP) fails to solve the problem.
Depending on the situation, the mediator agent may centralize only a portion of the graph.
Computer simulations show that the new algorithm is faster than the current best algorithm (Asynchronous
Weak-Commitment). A proof shows that the distributed portion knows if it is locally unsolvable, the localcentralized
protocol is dead-lock free and the worst case fully centralized solution will always find a
solution if one exists (or terminate if a solution does not exist).
Strengths:
The strengths of this presentation are the inclusion of algorithm pseudo code, the clearness of the example
problem, and the combined use of simulation and mathematical proofs.
Weaknesses:
I found no notable weaknesses.


Title: Hoplites: A Market-Based Framework for Planned Tight Coordination in
Multi-robot Teams

Authors: N. Kalra, D. Ferguson, and A. Stentz
Publication: Proceedings of the International Conference on Robotics and Automation, April, 2005.
Why I selected this paper:
I selected this paper because it is relevant to the problem (coordination in multi-robot teams), it used one of
the three primary ways of addressing the problem (market based framework), it is a fascinating subject and
has a cool name.
Major contribution:
The major contribution of this paper is a new market based coordination method that improved (over
MVERT, P-MVERT and PC-MVERT) significantly the quality of the results.
With that said, the way in which the auction based coordination was orchestrated bears some explanation.
In most market-based resource allocation methods, each robot earns points, which measure how much the
robot’s action moved the team toward its goal. Each task also has a team resource cost. The robots can
trade tasks though auctions and negotiations in order to get the most profitable tasks.
In the Hoplites method there are 2 modes, active and passive. In the passive mode, the agents formulate
and trade plans, which inform each other of their intentions. This allows team members to make informed
decisions. In the active mode, teammates publish plans that require participation to work. They them pay
for that participation. These active plans do not always work, however, they are initiated when tight
coordination is required to achieve for a more optimal solution.
Strengths:
The new algorithm improved significantly the quality of the coordination results. I was particularly
impressed that the made an effort to optimize the other algorithms to the application before comparing
them. The algorithm was also demonstrated in real robots, which is even more interesting.
Weaknesses:
I found no notable weaknesses.

Title: Multi-agent Planning with Factored MDPs
Authors: Carlos Guestrin, Daphne Koller, and Ronald Parr
Publication: Advances in Neural Information Processing Systems NIPS-14, 2001
Invited Talk: Proceedings of Third International Joint Conference on Autonomous Agents and Multi-
Agent Systems (AAMAS 2004)
Why I selected this paper:
I selected this paper because it is relevant (multi-agent planning) and it was by an author (Daphne Koller)
whose book on Bayesian Networks I have read and liked. The paper was also an invited talk in a respected
conference (Proceedings of Third International Joint Conference on Autonomous Agents and Multi-Agent
Systems), which gave it added credibility.
Major contribution:
This paper explains a new way to decompose the value function of an MDP in order to make the algorithm
tractable in a distributed framework. This is an approximate method, which achieves close to the
theoretically optimal results.
Essentially, you can use an MDP to find the optimal policy for a resource allocation problem. However, to
do this in a distributed sense, each agent needs to know its portion of the overall system reward, in order for
it to make a decision. This decomposition method provides new way to figure out each agent’s portion.
Strengths:
The author explains some concepts well. The idea is very interesting.
Weaknesses:
This author explains things very well in her book. This algorithm is really interesting, however, it requires
more than 7 pages to explain precisely.


Part E: A Simple Project For Your Cognitive Robot


Application: Search and rescue mission.
Reasoning: Collaborative Resource Allocation
Motivation
Resource allocation is an important form of reasoning because it is a step toward high-level reasoning.
Essentially, resource allocation is deciding what to do next in the context of a society. The robots are
deciding what they can do that will be of greatest benefit to the society as a whole. Regardless of whether
this uses a cooperative of competitive motivation mechanism, the robots want to do the task that is most
beneficial. For this project, I propose a decision domain that does not require a complicated knowledge
base, which will allow us to do research in high-level decision-making.
Setup
My project involves the simulation of a simulated robotic search and rescue team. The team will consist of
ground moving wheeled robots. The search area will be represented as a grid. The team will
collaboratively search the pre-defined area. The search plan would need to balance differing robot abilities
and initial conditions. The team would be connected via wireless communication. The reasoning
capabilities that this would require are collaborative learning, resource allocation.
Algorithms
I have reviewed 3 different types of algorithms for solving this type of problem. Each of the algorithms is
better than the ones that it was compared to. My proposal is to implement all 3 and compare them to each
other, in our problem domain. We will then use the best of the 3 algorithms as our new starting point and
improve upon it.
Note: The MDP algorithm cannot be implemented directly from the description in the paper. I would need
to get clarification from the author on the details of the algorithm.
The teams will use the following collaboration schemes:
1. An auction based collaboration scheme similar to the one presented in “Hoplites: A Market-
Based Framework for Planned Tight Coordination in Multi-robot Teams,” by N. Kalra, D.
Ferguson, and A. Stentz.
2. The MDP based algorithm presented in “Multi-agent Planning with Factored MDPs,” by
Carlos Guestrin, Daphne Koller, and Ronald Parr.
3. The constraint satisfaction based algorithm presented in “Using Cooperative Mediation to
Solve Distributed Constraint Satisfaction Problems,” by Mailler, Roger; and Lesser, Victor.
Note: The probability that I will pursue this project is .75.

A summary of last year’s projects:
This section provides a summary of the projects (on the web site) from last year’s Cognitive Robotics class.
I wrote this section to help formulate the scope and range of topics for an acceptable course project.
I review each paper in a necessarily short format, which includes a summary of the main components of the
paper and a list of the application, type of reasoning and techniques used to solve the problem.
Paper: Model-based Programming for Cooperating Vehicles
Seung Chung
Robert Effinger
Thomas Léauté
Steven D. Lovell
This project studied cooperative path planning for autonomous UAV forest fire-fighting.
Application: Forest Fire-Fighting with Autonomous Cooperative UAVs
Form of Reasoning: Path Planning, Goal (Mission) Planning
Techniques: D*-Lite, Dynamic Backtracking, Strategy-driven Kino-dynamic Cooperative
Path Planning, Receding Horizon Continuous Planner
Paper: Cooperative Q-Learning
Lars Blackmore
Steve Block
This project compared cooperative and independent Q-Learning for learning a navigation policy about a
maze.
Application: Robot navigation
Form of Reasoning: Cooperative Navigation Policy Learning
Techniques: Cooperative Q-Learning, MDP


Paper: SLAM for Dummies
Morten Rufus Blas
Soren Riisgaard
This group implemented and wrote a tutorial for SLAM.
Application: Generic Localization and Mapping
Form of Reasoning: Localization and Mapping
Techniques: SLAM
Paper: Towards Visual SLAM in Dynamic Environments
Vikash Mansinghka
This project critiqued and developed some new concepts for Visual SLAM.
Application: Navigation (Localization and Mapping)
Form of Reasoning: Localization and Mapping
Techniques: Visual SLAM

Paper: Autonomous Visual Tracking Algorithms
Alexander Omelchenko
This project investigated the visual tracking of ground vehicles via the collaboration between an
autonomous UAV and a rover (ground vehicle).
Application: Visual tracking of ground vehicles
Form of Reasoning: Tracking
Techniques: Image Processing: Contour Tracking and Center of Mass Tracking

 

Thomas Coffee

Part A: Topics of Fascination


An agent like the one described above will draw upon three core areas of research:
1. Knowledge Representation. Clearly the agent will require mechanisms for representing the behavior rules it has learned,
but the main applications of well-developed representational schemes will lie in effectively managing the input data from its
environment. A hierarchical ontology, likely containing several specialized elements, will be required to abstract visual and
other sensory stimuli to a level providing appropriate inputs to decision algorithms. Due to the complexity of these elements
of the system, I plan to avoid the complexities of real-world knowledge representation and develop decision algorithms
based on data from "toy" examples.
2. Statistical Learning. The agent must be able to observe the behaviors of a human operator in an environment and
develop decision strategies based on patterns extracted from this sample data. The agent must also be able to dynamically
incorporate new sample data during its operations and incorporate new inferences into its strategic model. The complexity
of the model should be optimally adjusted to the volume of training data available, and should be independently controllable
by the method based on resource constraints.
3. Probabilistic Reasoning. While arguably a component of statistical learning, I regard the agent's actual decision
procedures as a somewhat distinct element of the system, which may be handled independently from the formulation of
those procedures based on learning. The execution of these decision procedures will almost certainly require probabilistic
inference, although the specific methods involved will depend upon the strategies produced by the learning framework.

Part D: Researching a Critical Reasoning Method


The primary motivations for algorithm selection will come from flexibility— the ability to recognize general patterns and
estimate general functions— and feasibility— the ability to perform classification and decision-making using a compact
strategic representation. Hence the general thrust of this effort will lie in searching for powerful generalizations of more
elementary schemes (e.g., parameter decision trees constructed by ID3). This intuition motivates discussion of the following
three papers from the recent literature:


1. Weiss SM, Indurkhya N. 1995. Rule-based machine learning methods for functional prediction. Journal of Artificial
Intelligence Research 3:383-403.

The authors initially show how to generalize traditional discrete decision trees used for classification to regression trees used
for functional estimation. Like decision trees, regression trees perform partitioning based on a disjunctive normal form
strategy, which has advantages for clarity of knowledge organization and traceability to features. However, for pattern
recognition applications, they propose a more general rule-based approach to regression, which eliminates the DNF
constraint and can potentially find much more compact representations. This can be important for large spaces, and can
potentially find rules with substantially clearer interpretations. Through a number of real-world examples, they show that
rule-based regression algorithms using numerical optimization techniques can significantly outperform tree-based methods
both in performance and speed. They also show how this approach can be effectively combined with partitioning and
nearest-neighbor methods (e.g., bounding pseudo-classes on the basis of a fixed neighborhood population) in order to
improve performance still further. They also pursue sample storage compression enhancements with some success.
The methods presented here move in the desired direction in terms of generalizing the approach to classification and
estimation, and provide concrete algorithms and examples demonstrating their effectiveness in certain situations not unlike
exploration behavior. However, the fundamental methodology still relies upon complete storage of training samples and a
somewhat discretized pseudo-classification approach to pattern decomposition. That is, rule-based regression may not be
general enough to provide the kind of dynamic adaptability and scaling to training data desirable in our agent. The other
papers in this collection introduce and develop a more general approach based on "support vectors," which may provide a
better high-level framework for our applications.


2. Boser BE, Guyon IM, Vapnik VN. 1992. A training algorithm for optimal margin classifiers. In: Proceedings of the
5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA. 144-152.

These authors take a step beyond both tree-based and rule-based parametric decision methods by devising a method to
effectively reparametrize the space of observations according to the most useful global indicators. That is, they construct a
dynamically generated basis for the input space using "support vectors" optimally chosen to maximize the resolution of the
boundary between decision classes. This provides significant improvement on traditional regression-based approaches,
which tend to smooth over any atypical patterns as represented in the original input basis. Their training algorithm also
grows dynamically with new input data, while incorporating many other linear and nonlinear methods as special cases.
Moreover, the authors show how to construct a dual space representation (of reduced dimension) for the actual decision
kernel, which allows the underlying quadratic optimization problem to be solved efficiently using standard numerical
techniques. The authors demonstrate empirical performance on many classical pattern recognition problems (such as
handwritten digit recognition) significantly exceeding other leading algorithms, in some cases even those with pre-defined
task-specific models for those problems.
This approach demonstrates both generality for uninformed pattern recognition and dynamic adaptation and scaling to
enlarging training data sets that we desire for our agent. By comparison to the Weiss et al. paper, however, it still does not
demonstrate applications to functional estimation as well as pattern recognition, which will also be a key element of the
agent's exploratory behavior. For this further development, we turn to the last paper in this collection.


3. Vapnik V, Golowich S, Smola A. 1997. Support Vector Method for Function Approximation, Regression Estimation,
and Signal Processing. In: Neural Information Processing Systems, Vol. 9. MIT Press, Cambridge, MA.

In this paper, the authors extend the support vector method to three critical classes of applications: function approximation,
regression estimation (the principal subject of Weiss et al.), and signal processing. The primary content of the paper consists
of explicit mathematical algorithms for carrying out each of these tasks in a generalized fashion, but relatively simple
example applications are demonstrated for each one, with sufficient realism to provide indicators about performance. The
authors show in general how the reduced effective dimensionality of the support vector basis translates into lower
complexity in all these areas; that is, complexity is driven by the desired complexity of the result rather than the complexity
of the initial parametrization of the problem. Performance bounds are generally impressive, demonstrating feasibility for a
wide range of problems.

Ultimately, it seems likely that our agent may be built upon a support vector machine architecture, which may or may not
rely on some initial modeling knowledge, depending upon the simplicity of the "toy" environment chosen for development
and testing. For explicitly discrete classes of problems, rule-based methods may yet be of some utility, since they generally
afford simpler symbolic representations.


Part E: A Simple Project for Your Cognitive Robot


One potentially feasible project for developing this framework might pursue a specific two-dimensional visual task, using
human training data captured by a PC-based input device (such as a mouse). To truly demonstrate the success of the
architecture, the task should be sufficiently complex as to defy simple rule-based description, as might be encoded directly
in software by a programmer well-versed in the activity. A plausible analogy with exploration tasks (navigation and/or
feature recognition), such as locating a target, would be preferable though not essential. It might be beneficial for
pre-defined agent models and sample training data to be produced by different people, so as to enforce the uninformed
pattern recognition aspect of the problem.

Shuonan Dong


Part A: Topics of Fascination
1. Probabilistic reasoning

When an autonomous agent is interacting with the real world, it is often the case that the environment is non-deterministic. This could occur if the agent is dealing with a dynamic environment, or if the agent needs to interact with human beings. In such situations, the agent may need probabilistic reasoning techniques to model the environment and make decisions. If the agent can make observations over time, then techniques such as hidden Markov models, Kalman filters, and dynamic Bayesian networks could be useful to make future predictions. Probabilistic reasoning techniques could be useful in applications such as speech recognition.
2. Machine learning with incomplete data

One possible attribute of an intelligent system is the capability to learn to make correct decisions based on available and possibly incomplete training data. It may require learning probability models such as Bayesian networks and making maximum probable hypotheses. Algorithms such as Markov chain Monte Carlo and expectation-maximization, and models such as Kernel models and neural networks may be useful.
3. Communicating, perceiving, and acting

A robot that needs to interact with the environment must have a variety of methods of sensing and communication. Language recognition, tactile sensing, and visual image processing become critical for an interactive robot. In visual perception, object recognition, edge detection, contour mapping, and motion detection would be useful to study.


Part D: Researching a Critical Reasoning Method
On the topic of probabilistic reasoning, I have found a myriad of papers about Bayesian networks and various aspects of Markov Decision Processes. Having little initial knowledge on the relevance of these papers on my project goal, the following three papers were somewhat chosen at random from the Uncertainty in Artificial Intelligence Conference Proceedings.

Hand motion

Signal detection

Filtered signal

Pattern matching

Command generation

1. Myers, J. W., Laskey, K. B., and Levitt, T. S., Learning Bayesian Networks from Incomplete Data with Stochastic Search Algorithms

Current research in Bayesian networks generally assume complete data so that the Bayesian network problem can be reduced to learning the graph structure and parameters. However, problems arise when there is incomplete data because there are no closed form expressions for the scoring metric used to evaluate network structures. Some researchers have taken parametric approaches such as the expectation-maximizatino (EM) algorithm, but this method is prone to “getting stuck” on local optima.
Myers et al. propose a stochastic search method to avoid the local optima problem. They use a combination of evolutionary algorithms (EA) and Markov chain Monte Carlo (MCMC) algorithms to create a new algorithm called the Evolutionary Markov Chain Monte Carlo (EMCMC) in order to learn Bayesian networks from incomplete data.
The evolutionary algorithm is similar to the evolution processes that occur in nature. Evolutionary algorithms consist of “a population of individual solutions that are selected and modified in order to discover overall better solutions in the search space.”
Initially, a population of solutions are selected, from which individuals are chosen based on how good a solution it provides. These individuals are genetically modified using the common genetic operators crossover and mutation. Thus evolutionary algorithms deal with incomplete data by “turning the incomplete data problem into a complete data problem by evolving the missing data and imputing these values into the data.”
The Markov chain Monte Carlo algorithm is often used to model systems that seek a minimal free energy state. The microstate of a system, or the state of a system’s atomic structure, has an associated energy, E(s). For a given microstate of a system, the equilibrium probability that minimizes free energy is given by the Boltzman distribution. Markov chains are useful because they can be designed to converge to a stationary distribution (such as Boltzman distribution), and from that point the convergent distribution is the only one necessary to predict occurrences in the future.
Evolutionary algorithms “sample from modes of the distribution of interest.” “Markov chain Monte Carlo algorithms evolve to sample from the target stationary distribution.” Since EA improve fitness through exchanging information, “the same approach with MCMC will speed convergence by finding better fit solutions faster.” Empirical results show that the MCMC with adaptive mutation has a much faster rate of convergence than regular MCMC algorithm.

2. Flores, M. J., Gámez, J.A, and Olesen, K G., Incremental Compilation of Bayesian Networks

Bayesian networks are usually compiled into a junction tree or a join tree (JT) before probability propagation is performed. The compilation process is usually performed as one unit, where the entire join tree is created from the Bayesian network. If the network gets modified, the whole tree needs to be recompiled. The authors of this paper have devised a method for incrementally compiling a Bayesian network so that it increases the efficiency and stability of the join tree.
This paper was chosen because it may be useful to have the capability to deal with a more dynamic Bayesian network in an efficient way. Since the states of the environment that I will be dealing with in my project goal may not be completely determinable at the beginning, the generated Bayesian network may need to be modified through time, and thus it may be necessary to incrementally compile the network to a join tree.
The paper provides algorithms for dealing with all the structural changes in a Bayesian network, including modification of the probability distribution for a variable, modification of the states of a variable, removing an arc, adding an arc, removing a node, and adding a node. It starts off by decomposing the Bayesian network into its maximum prime subgraphs (MPS), which contain minimal triangulation. Then these MPSs can be triangulated independently.
The paper provides a clear set of algorithms for dealing with the steps for incremental compilation. Modifying potentials and the states of a variable generally leave the structure of the network the same, and thus can be simply updated in the join tree. When removing or adding an arc to the network, large changes could be made due to cycles in the network. Thus “it is beneficial to be able to identify the minimal part of the join tree that could be affected by the change and concentrate on a retriangulation of only that part. The removal or addition of a node can be performed by first removing or adding all the necessary arcs around the node, and then simply removing or adding the node itself.
3. Dearden, R., Friedman, N., and Andre, D., Model Based Bayesian Exploration

Reinforcement learning deals with the learning process of an agent in a dynamic environment, where the agent has the choice of exploring an untested set of actions or choosing a previously used set of actions that are known to be good (i.e. exploitation). Markov Decision Processes (MDPs) are useful frameworks to support reinforcement learning by enabling us to determine the actions that would maximize the expected future reward.
It may be useful to create a model of the dynamics of the environment so that the expected values of actions in the environment can be easily determined. Modeling the

environment can aid the efficiency of computation because it reduces repetition of approximating value functions that measure the desirability of each environment state.
The paper shows “how to maintain Bayesian belief states about MDPs” and “how to use the Bayesian belief state to choose actions in a way that balances exploration and exploitation. The paper was chosen because it provides a possible useful approach to reinforcement learning systems, which may be handy for my project.


Part E: A Simple Project


The complex system described in Part B can be toned down to a two-dimensional pattern matching problem. A user can make a motion using a mouse on a computer, and the goal of the agent is to learn this motion and correctly identify it as a certain command. The agent would need to filter out random motions of the mouse input to capture the important information. It then needs to build a library of the meanings of these motions, and after a period of training, be able to correctly identify the command.
This project is feasible and challenging enough that there is a relatively high probability that I will pursue this project during this course.


Justin Fox

Part A: Topics of Fascination

The topic I am primarily interested in learning more about is computational cognitive reasoning as it may be applied to game theory, specifically chess. I will never forget the day back in 1997 when Kasparov was actually defeated by Deep Blue. (Although I still contend that Kasparov was paid to throw the match. The evidence for this is that in game 2, he played the computer to a draw, but “didn’t realize it” and decided to concede rather than finish the game. IBM NEEDED Deep Blue to win! ?) From the brief literature survey I conducted on the subject, it appears that most of the mainstream literature has actually tended away from chess-playing in particular over the last few years, though there are a large number of papers from the 80’s and early 90’s describing the evolution of the minimax algorithm and the many adaptations people have incorporated into it. Aside from the basic search algorithms, more recently neural networks, genetic algorithms, and numerous other machine learning techniques have seen their application to this problem as well. It seems to me that a truly generic, efficient, and novel algorithm for “combinatorial games,” as the literature tends to call chess, checkers, and other games of this type, has not really been attempted in many years, with many authors simply biding their time until computers become powerful enough to minimax search the entire game tree. Hopefully something I learn in this class will prove that statement wrong!
A second topic I am interested in is computer vision. Most specifically, I would like to learn the state of the art in visual object detection and recognition techniques. About four years ago as an undergraduate, I worked for two summers at JPL on a project designed to make autonomous Martian rovers that would be as skilled at classifying rocks and geological features as our staff Martian geologist. Our techniques were pretty much state of the art at the time (Gabor filtering for texture analysis, Bayesian classifiers, feature analysis, 3-D stereo data for range and object segmentation), but I was really disheartened by how poorly our methods actually performed in all but certain doctored situations. In a machine learning class at Caltech, we learned about Lowe’s object detection algorithm, and I was extremely excited. Then I learned that it too had many of the same major limitations (for instance, changes in illumination direction), and in general did not always work as well as its creators seemed to suggest from their particular demonstrations. I would like to learn more about object classification algorithms and hear if there have been any improvements in the field.
Finally, I would be interested in learning about the use of reasoning techniques as they are applied to the diagnosis of engineering systems. This is probably one of the most useful and practical applications of autonomy at the present time and its improvement and extension will be necessary and profitable in the future. Deep Space 1 demonstrated that a spacecraft could autonomously repair itself in flight, and such algorithms will be necessary on future deeper space missions when communication lags would otherwise severely reduce the efficiency and sustainability of the mission. We learned an algorithm for single-fault detection in 16.413, but I would be interested now in learning the more complex algorithms for multiple faults.


Part D: Researching a Critical Reasoning Method

The first paper I chose to read was Playing Games with Algorithms: Algorithmic Combinatorial Game Theory by Erik D. Demaine. In reading this paper, I hoped to attain a general overview of the theory behind combinatorial games like chess, and that is exactly what I was given.
Demaine began by first giving the ubiquitous, but, at least in my practically ignorant case, extremely helpful introduction to and definition of combinatorial games. These games involve two players, named Left and Right, who alternate turns using deterministic moves. One of the most important aspects of this class of games is that they are by definition perfect information games, that is both players know all the states of the game at every instant, both players know what moves the other player has made, and all actions and consequences are completely deterministic and observable. While these restrictions are sufficiently loose to incorporate innumerable common games such as chess, checkers, and Go, they are restrictive enough to allow game theoreticians a handle by which they can manipulate this subclass of problems.
With the problem well-stated, the author then continued by introducing the most important game theoretical technique used in analyzing the computational complexity of problems associated with combinatorial games, Conway’s Surreal Numbers. A surreal number is actually defined by a pair of sets {L|R} in which all numbers in L are less than all numbers in R. The value of this surreal number is then “the ‘simplest’ number larger than all Left options and smaller than all Right options.” While these numbers may seem to be purely a demented mathematician’s idea of a strange joke, it turns out that they can be very useful in making analogies to combinatorial games. Demaine explained that games with a surreal value of 0, a positive number, or a negative number have the predictable outcomes that the second player to move wins, the player named Left wins, or the player named Right wins, respectively. Those games, such as {1|0} outside the set of surreal numbers have the predictable outcome that the first player to move wins. The paper goes on to explain a number of other ways in which surreal numbers have been used to prove numerous results regarding the predictability and complexity of combinatorial games.


Finally the paper analyzed a few well-known combinatorial games, including chess, with respect to their expected computational complexity. It was shown that given an arbitrary chess board configuration, it is EXP-TIME complete to determine the winner. This implies that there can be no polynomial time algorithm for computing the winner of a general chess game no matter how hard we try to think of one, a fact which is important to know so that we do not waste too much time on our project!


Overall, this first paper was meant to be a general introduction to the basics of game theory and some of its applications to chess and other games like it. In this respect, it served its purpose well. However, with respect to the other papers, I felt that for my purposes this one might have been slightly too theoretical, with not enough practical applications to truly be of use to my present project. The tidbit regarding the exponential complexity of determining a winner in chess was helpful however, and I will remember that trying to look for a checkmate even a few moves ahead is inherently an inefficient task.

The second paper I read was The Games Computers (and People) Play by Jonathan Schaeffer. This was actually a 70 page “chapter” written as an update to Arthur Samuel’s well-known 1960 chapter in the textbook Advances in Computers. I chose to read it because in browsing it, I noticed that not only did it explore in-depth various search methods and their extensions, but it also exhibited a fairly detailed analysis of the Deep Blue chess program and its match against Kasparov, making it a great basis for the present project.
The main focus of the first section of the paper was in describing the mini-max and alpha-beta game tree search algorithms. We briefly covered these algorithms in 16.413 and I have already used alpha-beta searching to create my own checkers program, so much of this section was review. What I found extremely interesting and useful, however, was that the author not only described the basic algorithm as Russel and Norvig did, but also detailed many of the improvements that had been made in recent years to the simple application. In particular, Schaeffer discussed four techniques that have been used to speed up or otherwise improve alpha-beta search, describing these techniques as “the recipes for building high-performance games.” The first of these techniques was caching. In games where multiple sequences of moves may evolve the board into the exact same state, storing the results of previous searches in a hash table can have dramatic effects. Schaeffer quotes as much as a 75% reduction for chess searches and an 89% reduction for checkers. A second method for reducing the cost of an alpha-beta search is move ordering. The benefits of pruning are realized most dramatically when the search is able to cut off a branch after examining only a very small number of its leaves. Schaeffer describes a number of ways in which this ordering can be performed, including the use of the cached transposition tale and application dependent heuristics. The third method for improving alpha-beta search was guessing a smaller search window. That is, at each iteration, alpha and beta are normally initialized to be [-inf,inf]. However, these extreme values are rarely realized and reducing the range of the window can greatly enhance the search speed. Schaeffer described this as a “gamble” because reducing the window too much may require another search to be performed, but he also described a number of heuristics for intelligently choosing the search window which would minimize the risk of this gamble. The fourth and final method described by Schaeffer he described in financial terms. If you are investing in stocks, you put more money in the stocks you think will perform well and less in the stocks you believe will perform badly. Similarly, when you search, you search deeper in branches that seem to be getting somewhere and cut off early branches that seem to be going nowhere. He mentioned a number of heuristics for doing this in chess in particular, many of which may be directly applicable to our final project.


The other section of the chapter that I found very interesting was Schaeffer’s description of Deep Blue’s algorithm and its performance against Kasparov. Deep Blue was the result of a decade of IBM funded research and was made up of parallel processors, dedicated chess chips, and the consultation of a chess grandmaster, Joel Benjamin. Schaeffer, I feel, accurately appraised the significance of the match, saying that it is only one intriguing data point with no statistical significance and that Deep Blue’s construction was, while definitely an improvement in computer chess-playing, largely unproductive since it was immediately dismantled and its funding completely cut. The analyses contained in this section highlight some of the important pitfalls typical computer programs fall into that humans are able to avoid and will be priceless when analyzing our own algorithm.


I am really glad I found this paper. It will be invaluable for our project. We could even just compare the efficiencies and performances of the four types of methods described in the paper just to give a quantitative understanding of how each one affects the search. Compared to the other papers, it was broad and basic enough to be useful and understandable, and practical enough to be useful in an applied sense.

The third and final paper I chose to read was entitled An Evolutionary Approach for the Tuning of a Chess Evaluation Function Using Population Dynamics and was written by Kendall and Whitwell. After reading Schaeffer’s review of the Deep Blue algorithm, it became apparent that the most important part of a computer chess player’s performance is often its evaluation function, and these authors presented an interesting approach to developing such a function.


Computer chess programs have historically used a linear combination of weighted parameters to heuristically evaluate the “goodness” of a particular board variant. The general purpose of this paper was to create a method whereby a computer chess evaluation function could be created without a human having to hand-tune each of the individual weighting parameters in the function. This was done through the use of an evolutionary algorithm. Basically, a particular evaluation function was considered, namely a weighted sum of the player’s number of kings, number of queens, number of rooks, number of bishops, number of knights, number of pawns, and the number of legal moves available to the player. Fifty such functions were initially randomly generated with randomized weight values. These functions were then inserted into a chess program that played the two candidates as opponents. The losing function was removed from the population. Whichever function won the match was then slightly mutated and replaced in the population. It was also allowed to “breed,” and another slightly mutated copy of it replaced the losing function. In this way, new functions were evolved which tended to play better and better chess. Of course, the paper described a number of details involving the need to account for local minima and the algorithms used to mutate and compete the various functions. Once the function population had reached quiescence, their final values were averaged and taken to be the evolved evaluation function.


The authors tested their initial evaluation function against the final evolved one using a commercial computer game that gave ranking information regarding the player’s skills. The undeveloped function lost within 60 moves. The developed function was actually able to win one game and lost the other game in almost 100 moves, showing quite vividly that the evolution process had created an improvement. In addition, the developed function was rated a 1750 on a scale that Garry Kasparov was rated a 2805. The undeveloped function, meanwhile rated a meager 650. Clearly the evolutionary approach provides a fast, simple method for creating the weights of a computer chess evaluation function which does not need human intervention to optimize.


I had always heard about evolutionary methods and understood more or less how they worked. This is, however, the first paper I have actually read that detailed the equations, the mutation probabilities, and such. I found it to be extremely interesting, clearly written, and hopefully useful. I am even considering “evolving” a checkers evaluation function in my spare time to beef up my own autonomous player. I think this would definitely be an option in selecting an evaluation function for our project though I’m afraid five weeks may fly by faster than I would like! ?


Part E: A Simple Project For Your Cognitive Robot



As I mentioned earlier, an autonomous general for the military is a bit out of reach at the moment. Therefore, I would like to implement an autonomous chess player that would demonstrate that computers could at least reason capably in a “simulated war” and that tactics and strategy could be well within the reach of future generations of machines.
We would probably create a simulation in Visual Basic, perhaps researching, understanding, and implementing one or two of the various algorithms currently available and maybe even finding some way to add our own personal tweak to them. We could hopefully then play the algorithms against one another to benchmark their relative performances. If this project is acceptable, I think it would be pretty likely that I would choose to work on it.

 

Ethan Howe


Part A:

  1. Neural Network Design: I know the basic ideas and methods of neural networks but I
    am interested in how the layout of such networks effects their performance. There has
    been research into actively restructuring neural networks during learning to achieve
    better characterization of the goal. I would like to learn more about research into
    understanding neural structures in animals applied to mimicking particular functions in
    artificial networks. This would also involve learning about recurrent nets. These
    methods would be useful to robotics in performing motor skills, reflexes, and
    recognition. For these tasks, hardware solutions for neural networks would also be
    useful as a part of intercommunicating robotic system.
  2. Dynamic Bayesian Networks: I am interested in DBNs for representing hidden Markov
    model problems and their usefulness in exploiting sparseness. I would like to learn
    methods for utilizing DBNs as general constructions. Applications of DBNs to
    predicting the likelihood of different outcomes from a robot's actions. It would also be
    interesting to understand how adequate networks are constructed and automated
    techniques. These topics would be useful in complicated production robots were the
    same models can be reused many times.
  3. Sub-optimal planning methods: I would like to learn about advanced planning methods
    such as FF and LPG. These tools are in many systems and would be useful to build
    upon. In the process of learning these methods, we could also summarize basic traits of
    state-action planning. Planning applications are becoming widespread in commercial
    systems and so would be useful.


Part D:


Hoffmann, J., Porteous, J. and Sebastia, L. (2004) "Ordered Landmarks in Planning",
JAIR, Volume 22, pages 215-278.

I selected this paper because as I suggested above breaking down the building a
structure into logical steps seems very natural and would undoubtedly simplify the
planning of such a large scale objective. The paper builds on Hoffmann's earlier work on
reasonable orders for top level goals and extends and adapts it to the task of reasonable
orders for landmarks. The paper goes through all the steps of identifying candidates and
verifying landmarks, assessing (obedient) reasonable orders, and using them break down
the search space into sub-tasks for planners to handle. The results are then compared
among ordering and non-ordering of each search.


The paper was easily readable with a limited amount of jargon, good step-by-step
definitions, and examples of challenging concepts. Not only does it give a good
background and theory for their addition to path planning, it goes through
implementation, effectiveness assessment, and future work to be done. The paper's
weaknesses include the substantial length needed to go through everything and the fact
that the method does not stand on its own and the reader needs other advanced material to
utilize it. Another topic only lightly treated in the paper is that of how best to resolve
cycles in the ordering graph. Test were alluded to but none were detailed for this step in
the algorithm. An almost excessive number of tests are detailed such that you really feel
the authors surveyed a broad cross-section of planning problems. While this is good for
understanding the improvements made, it can also be deceptive as to what types of
planning are left out and being over confident in the methods positive effects.


Hoffmann, J. and Nebel, B. (2001) "The FF Planning System: Fast Plan Generation
Through Heuristic Search", JAIR, Volume 14, pages 253-302.

I selected this paper to compliment the previous paper and get a good basis for the
actually planning techniques needed to solve the proposed problem. The paper introduces
a method for planning that puts together known pieces in a novel way and adds certain
performance tweaks. FF planning is a heuristic search planner, HSP, variant that uses
GraphPlan to generate a heuristic based on relaxed paths to the goal and enforced hillclimbing
search along with a couple specific pruning methods. The GraphPlan algorithm
was shown to run in polynomial time for relaxed search and heuristics of No-Ops first,
minimal difficulty, and action set linearization for the backtracking phase.


The paper has many of the strengths of the above since the main author is the same but
it is a bit less theoretical and has more of an applications motivated feel. For each portion
of the algorithm, the authors looked at efficient implementation and while this added
many small details, the total was an very effective search. One of the weaknesses in the
FF method described is its over reliance on benchmarks to motivate design choices.
While this makes it effective in solving these sets of problems efficiently, in the future
there may arise an important set for which FF performs horribly. To address this concern
the authors say they are working on classifying what is the underlying trait FF works well
on. I also question the authors' decision that upon failure of enforced best-first search all
progress should be discarded rather than backtrack a bit and use another full heuristic
search. They also described some methods for finding and avoiding dead-end states said
they did not wish to use but could have considered using upon failure.


Ahmadabadi, M., and M. Asadpour. "Expertness Based Cooperative Q-Learning."
IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 32, no. 1
(February 2002).

This paper would be useful for taking account of unexpected circumstances arising in
the real world context. The system would have to learn and adjust for individual and
environmental differences. The text was meant to continue previous work on weighted
strategy sharing (WSS) by the same authors. It reviewed the previous method and added
more realistic conditions (communication uncertainty, false data) and more testing. This
paper complements the other two by looking at the layer below action planning. It deals
with effective action execution and learning. The robots experiences need to feed back to
the way actions are carried out for instance if one of them is working slower than others it
should figure into our heuristic for actions and this could be accomplished through
expertness criteria.


The subjects were described in an easy to understand fashion with a minimum of
implementation details. The paper assumed a background in such systems including
knowledge of Q-Tables and how a learning system would be implemented in the first
place. The methods described were for addition to such a multiple learning robot platform
to take better advantage of cooperative learning. I found that the paper did not have much
to offer in impressive results and the method was tested on only one benchmark. The
authors also could have presented their results more understandably and also put them in
the abstract. The organization and explanation of relative effectiveness could have been
clearer emphasizing the most striking cases out of the large tests. Overall the results could
be useful in that where positive reenforcement dominated taking the positive weight
signals worked best and the same held for negative reenforcement and signals.


Part E:


For my project, I could design a system to extract actions and states from a full
blueprint including all materials and positions for a portion of a building for a group of
robots to constructed. The planner would break down sequences of events into necessary
landmarks in the process and assign efficient tasks to sets of robots. The system could use
some kind of knowledge set to construct the state network and then use the FF planning
and reasonable orders detailed above. Then if time allows another layer could be added to
the planning to take account of feedback and learning from the robots as the structure
progresses. The robots actions would be simulated and have random discrepancies added.

Kaijen Hsiao

Part A: Topics of Fascination


1) I am primarily interested in SLAM. I plan to do my project on an
application of SLAM, and thus I am interested not only in the method we
learned about in class, but also the method outlined in the DP-SLAM
reading on the course webpage. I only want to implement one method, but I
need to know how both methods work so that I can figure out which is
easiest/best to implement. I will probably want to do my advanced lecture
on a SLAM algorithm.


2) I am also interested in probabilistic roadmaps. We covered basic
probabilistic roadmaps in class, which is something that I use in my own
research, to control two simulated robotic arms. I am also interested in
learning more about RRTs and MOPs, and would not mind doing an advanced
lecture on a specific algorithm related to them.


3) Finally, as a topic I am very interested in learning about in class
that hasn't been covered, I am particularly interested in MDPs and POMDPs.
I have learned about HMMs, but not about POMDPs, and the information in
Russell and Norvig is woefully inadequate.

Part D: Researching a Critical Reasoning Method
Paper 1: Thrun, Sebastian."A probabilistic online mapping algorithm for
teams of mobile robots." International Journal of Robotics Research,
20(5):335-363, 2001.

The SLAM algorithm we learned about in class (Leonard's method of
using multiple overlapping submaps with Kalman filters) is useful when
errors in your robot's motion can be accurately represented using a
Gaussian. However, for robots with nonlinear, inaccurate motion, using a
Gaussian to represent the robot pose can be a poor approximation.
Instead, one can use particle filters, which represent the robot pose
essentially with a cloud of possible poses. Instead of a mean and
covariance matrix, the uncertainty in the robot pose can be captured by
recording a large set of 'particles' that each contains one possible robot
pose. Each particle is weighted by the probability that the robot is
actually in that pose based on the latest observation. At each time step,
the particles are sampled according to their probabilities, and the motion
model is used to extend them according to the new motion input (with
randomly sampled error included). Thus, the particle cloud expands as the
robot moves, but a good observation will weight only those particles that
are likely correct with a high weight, causing the cloud to shrink again.
This is also a good method to represent branching hypotheses; for
instance, if the robot could likely be in one of two separate places, the
standard Kalman filter method cannot represent this fact with a Gaussian.
The idea of particle filters is the basis for a whole set of major
SLAM algorithms, mostly developed by Sebastian Thrun. The reason I picked
this paper is because, unlike later papers that all assume that you know
all about particle filters and basic SLAM, it gives a coherent explanation
that starts from the beginning. This includes what particle filters are,
as well as how to do other low level details of the SLAM algorithm, such
as how to compute the most likely robot pose given a map estimate (using
gradient ascent), or how to represent the map as an occupancy grid whose
squares reflect the probability that there is an obstacle there.
The actual system discussed in this paper is somewhat primitive
compared to later particle-filter-based SLAM algorithms. In summary: the
robot uses a laser rangefinder, and the map is represented using an
occupancy grid rather than landmarks. However, only the maximum
likelihood map is kept track of, rather than multiple map hypotheses, as
we will see in later papers. The system is able to close loops, but it
does so in one lump adjustment using a lot of gradient ascent once a
conflict is found. The paper also discusses mapping with multiple robots
that can communicate with each other constantly (one robot is the 'team
leader', and new robots must be able to localize themselves in the team
leader's map, at which point they start adding to the map just as the
first robot does) and 3-D structural and texture mapping (their robot gets
a panoramic camera; the 3-D info is not used in SLAM in any way, the 3-D
images are just built using the SLAM results).
The algorithm used in this paper will probably not be used for my
cognitive robot, but many of the low-level details from this paper will
probably be used, and knowing this algorithm helps one to understand the
algorithms in later papers. While it might be useful for multiple spy
robots to be able to make the map together, it is not necessarily safe to
assume that they can be in constant communication with each other.


Paper 2: M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. "FastSLAM: A
factored solution to the simultaneous localization and mapping problem."
In Proceedings of the AAAI National Conference on Artificial Intelligence,
Edmonton, Canada, 2002. AAAI.

FastSLAM appears to be a well-known, much-used algorithm, which is
why I chose this paper. As before, a laser rangefinder is used to collect
data, and the algorithm is based on particle filters. Unlike the last
paper, landmarks are used rather than occupancy maps. Also unlike the
last paper, each particle contains not only the robot pose, but also a
Gaussian for each landmark in the map. Thus, each particle records one
hypothesis for the map, and these can diverge widely if need be, rather
than relying only on the one maximally likely map as in the last paper.
Despite estimating the full posterior distribution over both the robot
pose and all the landmark locations, the computation time scales
logarithmically with the number of landmarks in the map (O(M logK), where
M is the number of particles and K is the number of landmarks). This is
not quite as good as Leonard's submap algorithm's time complexity of O(1),
but it appears to be quite sufficient for quite large maps (with 1 million
landmarks) even with only 100 particles, and was just as accurate as
Leonard's method on the locations tried.
This paper also discusses the data association problem, which is
the problem of deciding whether a landmark is the same as one you've seen
before (and which one it is) or if it's a new one, which is another
low-level detail not often covered in these papers.
For my cognitive robot, I would use a particle-filter algorithm,
but may or may not want to use landmarks (vs. an occupancy map). If I
were to use landmarks, this is probably the algorithm to use.


Paper 3: "DP-SLAM: Fast, Robust Simultaneous Localization and Mapping
Without Predetermined Landmarks." Austin Eliazar and Ronald Parr, 18th
International Joint Conference on Artificial Intelligence (Acapulco,
August 2003).

This paper is on the reading list for the class, and thus it is
the one I started with before realizing that I had no idea what particle
filters were. However, after reading the first two papers (the algorithm
in this paper is an extension of the work in the first two papers), it
becomes much easier to understand.
The algorithm in this paper combines features of the first two
papers. Like FastSLAM, it represents the entire map with each particle,
and can thus keep track of hundreds of candidate maps. However, unlike
FastSLAM, and like the first paper, it uses an occupancy grid as its map
representation. As before, a laser rangefinder is used to collect data.
The main idea is that it finds a way to make copying of particles feasible
when each particle has to keep track of an entire occupancy grid. If each
new particle just copied over the old map and changed it, it would take
O(MP) time, where M is the size of the grid and P is the number of
particles. Instead, the algorithm presented keeps a particle ancestry
tree, so that each particle does not have to contain the entire map. Each
square of the occupancy grid keeps a separate tree that keeps track of the
IDs of every particle that has made changes to the occupancy of that
square, along with their observations. Each particle keeps track of its
ancestors, and thus when the particle wants to find the value of a grid
square, it checks each of its ancestors to find the most recent change to
that square. If nobody has changed that square, the occupancy of that
square is unknown.
The complexity of this algorithm can become somewhat bad (O(ADPlgP), where
A is the area that the laser sweeps out, D is the depth of the ancestry
tree, P is the number of particles, and M is the size of the occupancy
grid), but in their experiments, the complexity was closer to
A*lg(squared)P, which is < M.
If I wanted to use an occupancy grid rather than landmarks for my
cognitive robot, this is probably the algorithm I would use.


Part E: A Simple Project For Your Cognitive Robot


My cognitive robot was a tiny spy robot (think cockroach-bot), capable of
slipping in through a crack in a building and autonomously mapping the
place while detecting motion and trying to avoid being seen. Even though
I would do my project in simulation, the full capabilities of such a robot
would be too much to implement.


Thus, here is a manageable project that embodies the salient
features: a tiny simulated robot, driven by a human (to avoid having to
develop a solid exploration function) to map the inside of a simulated
building using either FastSLAM or DP-SLAM. Even ignoring the
motion-detection-and-people-avoidance problem, this is already a
significant project. This is because the tiny size of the robot requires
the robot to do things a bit differently than most SLAM robots. The robot
would have to be able to get over low thresholds and the like, and so a
real-life tiny spy robot would probably have to be legged, likely giving
it terrible odometry. (The simulated robot need not be legged; it just
needs to approximate the movements of a legged robot.)


While no tiny laser rangefinders are currently available, it is quite
conceivable that a tiny laser rangefinder could be developed. This is
because the laser itself is just a tiny diode; most of the mechanism of a
laser rangefinder is for precisely aiming the laser. If a MEMS device
with tiny mirrors could be built to do that job, a laser rangefinder could
be made that was tiny enough to fit on a tiny cockroach-bot. However, the
robot would have to creep along walls to avoid being seen, rather than
charging down the middle of large corridors, possibly complicating the
typical problem of exploring corridors and rooms.

Tony Jiminez

Part A:


Support Vector Machines – Kernel Methods
Support vector machines (SVMs) is a relatively new family of methods for pattern analysis. They can be viewed as a particularized form of neural networks. SVMs generally are composed of two parts, a portion that handles mapping of the data into a feature space and a learning algorithm designed to discover linear patterns in that feature space. These can be used for a wide variety of data including simple types such as text and images through more complex objects such as biosequences, graphs, and grammars. I am particularly interested in how this new topic will contribute to machine vision.
Nello Cristianini and John Shawe-Taylor wrote An Introduction to Support Vector Machine which has since been used as a main research reference book for this topic. This past year, they have followed that work with Kernel Methods for Pattern Analysis. This work provides a large toolset of algorithm, kernels and solutions to be implemented that will be very useful for the aforementioned fields.

Path-Planning – MILP using Receding Horizon Control
I am particularly interested in this topic to see how it can be applied to the Mars Airplane project. Path planning, itself, has long (relatively) been a research focus in artificial intelligence. Russell and Norvig mention that STRIPS was the first major planning system, but I am particularly interested in Mixed Integer Linear Programming (MILP) to be applied towards path-planning for autonomous aerial vehicles. Jonathon How and Eric Feron at MIT have contributed recent work in this area that I want to further investigate.

Decision Making – POMDP’s
Sequential decision making can be specified with a Markovian transition model and additive rewards in what is called a Markov Decision Process (MDP). When the environment is only partially observable, this complicates the problem. Partially Observable Markov Decision Processes (POMDP) reflect an accurate description of the real world. The problem can be divided into belief states which then can be mapped to an optimal action. This provides the basic framework for approaching this problem.
The Witness Algorithm as presented by Kaelbling, Littman, and Cassandra in “Planning and acting in partially observable stochastic domains” provides a solution problems of smaller sizes and I am investigating current work that further delves into this topic.


Part D:


Georgios Theocharous and Leslie Pack Kaelbling, "Approximate Planning in POMDPs with Macro-Actions," Advances in Neural Information Processing Systems 16, Vancouver, 2004

This paper was selected since large-scale planning using macro-actions in POMDPs will help us to effectively manage and solve for POMDP solutions in a large belief space. This will directly apply to the Mars Airplane since we will need to manage the belief space for the potential areas that the system can travel.
This paper’s major contribution is to present an algorithm that will allow us to learn POMDP policies faster and execute information gathering more efficiently. The algorithm does this by using dynamic grid approximation to stratify and approximate the belief space and it uses macro-actions to reduce the number of belief spaces visited by the agent. In addition, it uses an effective reward-shaping approach that uses Monte Carlo updates to the Q-values.
While the paper shows the benefits of using macro-actions which in effect stratifies the belief space, it does not show how to dynamically generate good macro-actions that will be effective in creating a good policy. The paper notes that this is ongoing research and will be a further subject in my future literature search.
This paper relates to the next paper that I will be discussing since the trajectories created by that algorithm can be viewed as macro-actions for the Mars Airplane. The third paper relates to this paper by investigating planning rules that may help us in choosing good macro-actions.

John Bellingham, Arthur Richards, and Jonathan How, “Receding Horizon Control of Autonomous Aerial Vehicles, ” submitted to American Controls Conference 2002

This paper was selected since it presents a strategy for using Receding Horizon Control to effectively generate trajectories that we will be able to use for the Mars Airplane. A part of the concern for the airplane is for its ability to navigate the topography of Mars as it may be directed to fly closer to the terrain if its science missions require it for exploration. We will need to be able to quickly calculate new trajectories if the higher level mission planner decides to modify its path goals and investigate a new area due to its sensors.
The major contribution in this paper is to demonstrate a new strategy for using receding horizon control while solving a MILP problem that will allow us to avoid entrapment. They do this by using a cost map to estimate the future cost that is beyond the receding horizon. Receding horizon control allows us to solve for trajectories while reducing the complexity of the task, as measured by computation time. The trajectories generated are also still close to the time optimal trajectory. This is a major strength of the paper and will allow us to change course and redirect the system’s exploration.
The key aspect to this algorithm is the generation of the cost map. This is one area that I wish had more details of in the paper, but I intend to pursue other literature that will shed light on more of the process of creating the cost map.
As I mentioned earlier, I see the relationship of this paper with the prior paper as a component of the overall system. I hope to be able to integrate the strategies presented to help create a successful design.

Hanna M. Pasula, Luke S. Zettlemoyer, and Leslie Pack Kaelbling, "Learning Probabilistic Relational Planning Rules," Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling , 2004

This paper was selected to investigate the methods we can use to choose good macro-actions for the Mars Airplane. We will need to be able to come up with a system that can learn which areas will be useful for further investigation. This paper demonstrates a new algorithm for the learning of rule-based operators in a simple set of domains.
The paper does not investigate the partially observable cases, which will be critical to our system. But that is not its intended goal. It instead presents an algorithm that is demonstrated to be able to use relational rules to learn more effectively than Dynamic Bayesian Networks (DBNs).
The major strength of this work has been that it has represented the action effects using a set of alternative outcomes. Previous work in learning deterministic action models are unable to do this. This is important since actions can have an unlikely outcome that should be modeled explicitly.
While this paper is different than the other two papers since the algorithm is for a static and discrete domain, it provides an effective method for learning relational rules that is more robust than previous work in this area. I am hoping to use its concepts to help provide a system in discovering rules for effective exploration or other mission goals for the Mars Airplane.


Part E:


I am particularly interested in attempting to integrate a finite horizon POMDP into a high level mission planner for system integration with a MILP trajectory designer to be used on the Mars Airplane. I am fairly certain now that I will choose to work on this high level mission planner. I plan to use a simulator to feed in the topography of Mars and generate new mission goals with their uncertainties in real-time. These new mission goals may take the form of water erosion sites or ground rovers distributed across a plain. The simulator will have a flight model of the Mars Airplane and be designed to receive inputs from the MILP trajectory planner.
A potential expansion to this project would be the image analysis using pattern recognition that may use support vector machines or other tools, to recognize and score new mission goals. In particular, this can be used for scoring new sites for potential water erosion or it may be used to identify ground vehicles.
Another expressed interest for the Mars Airplane is model-based diagnosis for fault tolerance of the system. This again would be an expansion to this project and if others are interested, please contact me.
Another interest from Draper Laboratory is the use of path-planning to manage turbulence for air drops. To integrate this with the Mars Airplane, we can use the tools generated to manage the wind turbulence for the high level mission planner and coordinate a new path-plan in the MILP trajectory planner that will allow successful delivery of the equipment. We will have to then add to the simulation, air drops of new scientific tools and equipment for rovers on the ground.

 

Axel Kilian


Part A: Topics of Fascination


Robotic articulated systems for mobility
Robotic articulated structures other then humanoids robots, for instance in vehicle design.
There has been a lot of research done in humanoid robotic systems for the obvious
challenge of replicating some, or all of our human motorist and perceptual skills largely
driven by the interest to replace or assist humans in dangerous or demanding task from
fighting wars to assisted living.


Sports often act as catalyst in pushing technologic development in relation to human
activity forward. Sort equipment also often acts as an extension of the body augmenting
human movement and reach or interacting with the body.


Negotiating human control input and robotic autonomous control
How to translate human control input into a robotic articulated structure resulting in
"legal" moves that do not put the structure into danger but still try to follow the human
intention as closely as possible within the constraints of the system.


Teaching Robotic articulation beyond copying movements as a recorded motion.
If a robotic structure can articulate itself over several degrees of freedom, expression
becomes a possibility, just like an instrument offers a range of possibilities of being
played some better then others an articulated structure offer a range of often redundant
motion paths leading to the same result. Is there such a thing as graceful robotic
movement? Graceful of course can be defined on many levels, for instance from the
perspective of energy efficiency, shortest path, continuity of movement, acceleration,
appearance of movement path, registration with some external cue etc. But how to design
and teach such graceful motion is probably a challenge I would be interested in
exploring.


Part D:


Real-Time Randomized Path Planning for Robot Navigation
James Bruce and Manuela M. Veloso Computer Science Department Carnegie Mellon
University 5000 Forbes Avenue Pittsburgh PA 15213, USA {jbruce,mmv}@cs.cmu.edu

Reason to select the paper: The exploration of random trees for path planning in highly
dynamic domains could serve as a relevant precedent for driving the articulated vehicle.
Major contributions: Improve the efficiency of rapidly exploring large state spaces
without tiling. This improved path planning speed in combination with kinodynamic path
planners.


Strength: Clear illustration of the RRT path planning technique for the robo soccer
domain.
Weaknesses: The paper lacks an explanation in detail of how the randomized paths are
generated in a generalized fashion from the robots navigation constraints.


Relationship to the other two papers: The path planning addresses higher level
navigational task that look into the future and are only partially influenced by the
constraints of the embodiment of the robot itself. The randomly generated paths are all
valid paths though and therefore do not have to be translated into robot specific actions.


How is it applicable to the cognitive system I develop?
It is crucial for the vehicle to be able at anytime to scan the possible moves it can make
next in reaction to possible user input and choose the one that gets it closest to the goal.
Since the vehicle is riding on spatially highly constraint streets at relatively high speeds it
is important to configure RRT so the randomly generated paths are as close as continuous
as possible within the road boundaries.


The paper demonstrates the advantage of rapidly exploring random trees (RRT) in highly
dynamic domains, in this case robo soccer. The advantage of employing RRT is to turn a
reactive system into high performance path planning system.


A control structure for autonomous model helicopter navigation
Gilbert Lai, Kingsley Fregence, David Wang

Reason to select the paper: The paper addresses autonomous control systems in a
model-sized helicopter, which is related to the articulated chassis problem I am trying to
address. The paper lays out a hierarchical software architecture that structures the
problem into different layers.
Major contributions: Three layer hierarchical functional control structure
Strength: Clear layout of the problem domain and focus on the lower level helicopter
dynamics specific to this particular type of robot.
Weaknesses: The paper deviates very little from the conventional control layer hierarchy
in comparison to other papers I came across.
Relationship to the other two papers: This paper addresses much more the robo-soccer
paper the lower level robot specific issues in autonomous control systems
How is it applicable to the cognitive system I develop: The model helicopter is a good
example of a semi autonomous scale model exhibiting a number of degrees of freedom
and the paper illustrates the controller architecture used in a straight forward fashion.
Human-Machine Cooperative Telemanipulation with Motion and Force
Scaling Using Task-Oriented Virtual Tool Dynamics
Tomotaka Itoh, Member, IEEE, Kazuhiro Kosuge, Member, IEEE, and Toshio Fukuda,
Fellow, IEEE

Major contributions: The realization of a new human robotic Telemanipulation system
with motion and force scaling based on semi autonomous virtual tool dynamic.
Strength: The paper addresses clearly how the mapping of movements at human scale
and strength can be mapped scale independent onto a range of telemanipulated devices
using virtual tool dynamics.
Relationship to the other two papers: Where the first paper addresses higher level path
planning issues in relationship to a dynamic environment and the helicopter paper
demonstrates robot specific dynamics details for the control system the telemanipulation
paper addresses the issue of human machine interaction and the very important aspect of
scaling of forces. The human interaction can have an impact on all levels of the control
architecture and thereby a crucial component of any autonomous system that will be
interacting with humans.
How is it applicable to the cognitive system I develop: The articulated car should be
capable of processing human input to its chassis articulation as it is navigating the roads
in addition to the autonomous stability control. In addition the control movements of the
driver would have to be scaled up both in the range of motion as well as in the force to
allow the driver to experience driving as a body like extension. The introduction of a
virtual chassis model to negotiate the driver based interaction with the control mechanism
and the articulation of the car chassis seems like a promising technique to avoid
oscillation and feedback problems.


Part E:


Project for the cognitive robot
I am developing an articulated chassis for a ongoing group project that is exploring the
possibility of deploying an articulated chassis for a sportive vehicle.


One of the challenges is the mapping of the body movement onto the articulated chassis
in a way that allows for the exploration of new steering movements but at the same time
does not endanger the stability of the vehicle. The articulated vehicle should be capable
of operating autonomously in response to high level user input as for instance accelerate,
decelerate and steering input while keeping its balance and safe driving dynamics. In
addition a user should be able to override or augment the robotic control with human
input with the goal to allow for individual variations of the autonomously controlled
moves.


The project has a crude but functioning servomotor based mockup to test the interplay of
the range of degrees of freedom. The goal of the final project would be to develop a
simple software architecture that translates user steering input into the appropriate chassis
movement using for instance an acceleration sensor for feedback and a simple path
planning strategy to explore the range of possible motions to match the user input with
the robotic chassis.

Henry Lefebvre de Plinval-Salgues


Part A


I am fascinated with the following topics:
*Path Planning:
I am interested in learning answers to the following questions:
How to plan a shortest / fastest / cheapest path in a known map to reach a goal ? How to include constraints, like moutains / threats ? How to deal with unpredictable threats ? How to make a smooth path from a coarse path (for instance drawn by a human being as a draft) ?
*SLAM
How to build a map of the robot’s environment while locating itself on it ? Which strategy for team of robots performing SLAM (like: half of them moving, the rest remaining as landmarks, etc) ?
*Voice & Vision
I am fascinated with the idea of a robot being able to interpret its vision / noise environment. I would like to learn more about how a robot can deduce from a snapshot of its environment the things that are actually surrounding it. I would like to learn more about the strategies to identify a known object when it is hidden or seen from another point of view. I would like to learn how the voice recognition softwares work.

Part D


Paper 1
Title
Experimental Demonstration of Multiple Robot Cooperative Target Intercept, T W McLain, R W Beard, J M Kelsey
Why I chose this paper
I chose this paper because it deals with path planning for a team of robots cooperating to attack several targets at the very same time. This topic is obviously close to my project in that it implements path planning for a team of robot. Moreover, the constraints in this paper are close to those in my project: threats here are similar to the presence of mountains in my projects. Pop-up threats are similar to mountain not present on the robots’s map.


Major contributions
Its major contribution is to show an experimental demonstration of the concept. The robots achieve to plan their path in cooperation so as to reach the targets at the same time. The paper present a method to write the goal in a way that makes the decentralization easy. It presents also a way to generate kinematically acceptable paths.
Strengths
The paper presents a decentralized control strategy for the team to reach the targets together. It presents a method to generate trajectory according to the kinematic constraints. It shows the success of experiments made on those principles.
Weaknesses
The control strategy is not explained very well.
Relationship with the system I want to implement
In this paper, a team of robots plan their paths to targets they want to reach at the very same time. In my project, I will also work on path planning, and, if possible, also for a team of robots. The threats that the robots in this paper want to avoid are similar to the mountains my robots will avoid.


Paper 2
Title
Probabilistic Path Planning for UAVs, A Dogan
Why I chose this paper
This paper presents a probabilistic approach to path planning, which I find interesting. It also presents a smart way of making the trade-off between the length of the path and the probability to be shot down by the threats.
Major contributions
This paper presents a new probabilistic approach to path planning for a team of UAVs.
It makes such a parametrization of the problem that the user can easily make a trade-off between his desire for a short path and his frighten of the threats.
Strengths
The approach being probabilistic, it can be used in the same way for pop-up threats, without changing anything.
Weaknesses
The method presented does not avoid limit cycle in the paths. Even though it detects them, it is too late.

Also, the local minimization method does not provide an optimal path.
Relationship with paper1
This paper covers another aspects of my project: it deals more with the way of representing the constraints / threats. Also, it is for a single robot, not for a team as paper 1.
Relationship with the system I want to implement
I am interested in the idea of using a probabilistic description of the moutains. In my project, the robots may have or not a map. In any case, this map is not perfect, so a probabilistic description may fit well with my goal.
Paper 3
Title
Applying Kinodynamic Randomized Motion Planning with a Dynamic Priority System to Multi-Robot Space Systems, C M Clark, T Bretl, S Rock
Why I chose this paper
In this paper, a new dynamic priority system is presented. Thanks to it, the robots whose environment is the most crowded has the highest priority to plan its path.
Major contributions
This paper shows a new method to avoid collision: instead of the static priority system existing before, it presents a dynamic priority system, where the robots whose environments are the most crowded can decide their paths before others.
Strengths
Thanks to the algorithm presented in this paper, a large team of robots can perform path planning while avoiding collisions, by setting cleverly the priorities between the various robots.
Weaknesses
No special weakness noted.
Relationship with papers 1 and 2
This paper is more interested in collision avoiding, while paper 1 deals with cooperation to reach the goals at the same time. Paper 2 dealt with a probabilistic representation of the constraints.
Relationship with the system I want to implement
In the system I want to implement, I need a strategy to set the priorities in the team of robots. For instance, if the position of the people to rescue is unknown, they have to cooperate, in a certain priority order, to scan the whole area.

Part E


For my project, I would like to design an intelligent code for UAVs trying to rescue people lost in mountains, while avoiding obstacles.
There are various levels in this project, which I could do depending on how hard they are:
-at a first level, one helicopter (here a point with basic kinematic restrictions) planning the fastest path to reach a known goal, in a known moutainous environment.
-adding unknown threats.
-having the goal uncertain / completely unknown.
-adding other helicopters to cooperate (share map updates / people position updates / threats knowledge updates).
-adding helicopter-like behaviour for the motion / adding fuel constraints.
-adding altitude changes.
-multiple people to save.
-being as far as possible reachable by radio from the base.
I guess I could do the 3~4 first points. However, I do not know how ambitious it is.
I will actually pursue this project, or maybe do something slightly different and still related with Aeronautics/Space (for instance a team of UAVs attacking a target/several targets in ennemy territory). However, again, I do not know what is interesting or not, ambitious or not in this project.


James Lenfestey

Part A


i. Reward-based planning under uncertainty. What are effective ways to plan
given that state favorability is expressed with a known reward function but the world is
not completely observable? I.e., how do we marginalize the PO in POMDP?
ii. Bayesian inference distributed over multiple agents. Given a collection of
agents with partial knowledge of the world (derived from limited observations), how do
we aggregate the information into a form suitable for statistical inference?
iii. Rule-based actions and logical description of environment. What are suitable
ways of logically describing an environment and the robots actions? What are robust


Part D


Friedman et al., “Learning Probabilistic Relational Models”,

This paper provides an introduction to the concept of a Probabilistic Relational Model
(PRM). As I mentioned in the previous part, PRMs are like Bayesian networks except
that they admit a relational structure over the variables that are being described. Thus,
we have objects with certain types and attributes, where the values a certain object’s
attribute assumes are correlated with other objects to which it is related. Again, the
conditional probability relationships can be expressed in a graphical structure like a
Bayesian network. This paper describes how to find the parameters of the model given a
scheme and corpus of relational data. To do so, it builds on existing methods for training
Bayesian networks.
This paper is important because it is widely cited and seems to have started the
field. It is important for our purposes because it is necessary to understand PRMs before
one can move on to DPRMs, the structure that we will be using to model state as it
evolves through time.


Sanghai et al., “Dynamic Probabilistic Relational Models”,

In this paper DPRMs are defined. Put simply, DPRM is to PRM as a DBN is to a
conventional Bayesian net. PRMs have schemas, descriptions of their relational structure
(like in a database). An instantiation of a PRM is to fill in the schema with particular
objects and values for its attributes. The idea of a DPRM is to describe, probabilistically,
how an instantiation at a given instant in time evolves to the next time-step. Since an
instantiation is basically a set of objects (and associated values) and instantiations change
over time, this setup permits objects to enter or leave the world (or, our model of it).
The authors go on to describe a Monte-Carlo procedure for performing inference
in their model based on the application of simplifying assumptions. The technique is
based on the observation that DPRM can be unrolled into DBN, albeit a very large one.
Rao-Blackwellising the relational attributes reduces the size of the state space, without
which the procedure wouldn’t work. The authors show, by experiment, that this method
is able to perform inference accurately in domains consisting of over 1000 objects.
The significance of this paper is that it gives us a model that supports statistical
inference of structured data over unbounded domains. For our purposes, this means that
that number of friendly robots about which we have to reason is immaterial, so long as
they all fit the same relational description.


Doucet, et al. “Rao-Blackwellised Particle Filtering for Dynamic Bayesian
Networks”
,
This article provides an account of the development of Particle Filtering (PF) techniques
up to the use of Rao-Blackwellisation, an analytical technique used by Sanghai et al. to
make a tractable sampling technique for DPRMs. In its own right, PF is a Monte-Carlo
technique for doing inference in DBNs. They work by propagating a collection of
sample points from one time-step to the next via sampling in accordance with the
transition probabilities. The new points are then reweighted in accordance with the
observations at the new times step. Finally, the points are resampled according to their
weight, so that a greater number of samples lie in the region that is most probably the
correct value of the hidden variable. This process is repeated until convergence.
The Rao-Blackwellised PF (RBPF) is an advance in that it uses analytic methods
like the junction tree algorithm to marginalize away some of the hidden variables, thus
reducing the size of the sample space and hence the number of particles necessary to
perform accurate inference. The RBPF algorithm is described in this paper, which I will
use in the implementation of DPRM inference.


Part E


I propose to simulate a collection of mapping robots that explore some virtual
terrain. They will implement the reasoning capabilities described in Part C in the sense
that they describe themselves, other robots, and the world around using a DPRM. I will
simulate noisy and unreliable sensors, thus forcing the robot to consult its failure models.
To preserve salient features of the multi-agent model, the ability for two robots to
communicate, thus exchanging map data, will be left intact. Further, each robot will
maintain estimates of its peers’ regions of expertise.
Although a fair amount of intelligence can be imbued into the executive, at
present it is not the aim of this project. Instead, it will likely be hard coded with what
seems to be desirable behavior, nonetheless relying heavily on the robot’s estimates of
world state.

Brian Mihok

Part A: Topics of Fascination


Vision-based object recognition
I am interested in exploring vision-based object recognition in greater detail. The Sift algorithm created by David Lowe was one example presented in class. I am interested in learning about other scale and orientation invariant algorithms and their inherent advantages and disadvantages. However, I don’t want to limit my investigation to just Sift-like algorithms. I came across the use of tree-structured belief networks (TSBNs) and dynamic tree-structured belief networks (DTSBNs) as other examples in doing the literature investigation and I am curious as to the breadth of algorithms used in visual recognition and why one algorithm would be used over another. Also in my investigation, I came across a great deal of research in facial detection and recognition. I would like to see how much of this work is applicable to detecting and recognizing man-made objects in natural environments. Finally, I would like to better understand what the state of embedded visual object recognition is currently and what work is being done to improve this state.

Multi-Agent Planning
Another area I would be interested to investigate further would be multi-agent planning. More specifically, I am interested in learning about the dynamic reallocation of resources based upon new information as it becomes available. I would like to learn about what is the best way to communicate this new information amongst the agents. Also, I want to learn about where the best place to compute a plan is for the system. Said differently, I am interested in learning the tradeoffs associated with each individual agent computing a plan and then comparing solutions versus having one centralized computation center for action and then distribute the plan to the agents. Similarly, I like to learn about the different methods under development regarding how decisions are best made in a multi-agent environment. I have worked briefly with genetic algorithms but would like to know more.

Neural Nets
The final topic of fascination is neural nets. I honestly do not know much at all about neural nets right now. As a result, I am primarily interested in learning topics other more advanced students might feel to be elementary. I would like to know the context of the origins of neural nets and what work has been done in the past. I am interested to know what the inherent advantages and disadvantages of neural nets are and why they would be used and in what scenarios. I would like to better understand what current research is being done and where participants in the research feel that the work will lead them over the course of the next 5-10 years.

Part D: Researching a Critical Reasoning Method


The first article that I read was entitled “Detection of Artificial Structures in Natural-Scene Images Using Dynamic Trees” and was written by Sinisa Todorovic and Michael Nechyba at the University of Florida. This article was accepted to the International Conference on Pattern Recognition in August 2004 and can be found online at http://www.mil.ufl.edu/mav/publications/papers/16.pdf.

I selected this article for two reasons. The first reason was that the detection of artificial structures in natural environments is a key component to the success of the UAV I described in Part B, which had a goal of maximizing the tracking of high priority targets while also maximizing the area surveyed. The first step towards achieving this goal is detecting man-made objects in a natural scene. Also, the authors are associated with the Center for MAV (Micro Air Vehicle) Research at the University of Florida, specifically the endeavor entitled Active Vision for Control of Agile Autonomous Flight. The University of Florida is a leader in MAV technology and research on the university level and has done work on vision-aided control of the MAVs they have developed, including horizon-based stabilization. It seems logical that the work discussed in this paper will eventually find its way into one of the MAVs.

The authors assume that the man-made objects are characterized “primarily by geometric regularities, and that artificial structures are rigid and composed of smaller, uniformly colored sub-parts.” They motivated the methods performed in the paper by giving a very brief reference to previous methods. The first method used an algorithm that extracted edges from images into larger geometric structures and the extracted edges served as nodes of a graph, where the geometric relations between the lines were links in the graph. The authors say that the graphs only accounted for nearest neighbor relations, limiting the complexity of the artificial structures. Next, the authors state that tree-structured belief networks (TSBN) have recently been used since they represent “pixel neighborhoods of varying size.” Though successful, the authors state that TSBNs give rise to “blocky segmentations.” The authors then propose to model man-made objects by dynamic tree-structured belief networks (DTSBNs) to eliminate this problem.

The DTSBN structures provide a way to detect both whole objects and parts of an object. The whole objects are modeled through the “root nodes,” while subcomponents are likened to the parent-child relationship on the tree. The authors thus believe that DTSBNs can lead to recognition of an object in the case of occlusion.

In addition to the choice of how to model objects, the authors also detail how they selected the image features. In looking for an efficient edge extraction method, they became convinced that traditional methods such as wavelets and Gabor filters were flawed and that a new method was required that took into account more than the “multiscale and localization properties” of wavelets. The authors suggest that geometric and color cues should be used to discriminate man-made objects from the surroundings. To do this, the authors proposed the use of multiscale linear-discriminant analysis (MLDA). The authors claim that MLDA “encodes both color and texture through a dynamic representation of image details” and that it “extracts edges over a finite range of locations, orientations and scales, decomposing an image into dyadic squares of uniform color.”

The authors then describe the method for creating the dynamic trees and applying the MLDA to the image. Due to space constraints, I will leave that discussion to the authors in the actual paper. The analysis was applied to 100 256 x 256 natural-scene images with both artificial and natural objects captured by a ground camera at varied distances. This was done as a comparison between TSBNs and DTSBNs. The experimentation showed that DTSBNs outperformed TSBNs and that trained DTSBNs could be used for Bayesian image classification, leading to man-made object recognition in the future. The authors also concluded that DTSBNs could provide a “unified framework for unsupervised unknown object registration.”

The strength of this paper is that it detected artificial objects in an unsupervised, natural environment with a method that demonstrated improvement over some existing research. The weaknesses of this paper in regards to the UAV project discussed was that the method relies on color images, a luxury night afforded to the UAV during night operations, and that the camera was ground based.

The papers I reviewed were selected independent of each other, but were instead based upon needs created by the cognitive robot definition. As stated above, the need that this paper addressed was the ability to detect and recognize targets in a natural environment. While this method could be investigated for daytime application, it would not work at night or if a color camera could not be used. Thus, a different approach would need to be found.


The second article I reviewed was entitled “A Low Cost Embedded Color Vision System” written by Anthony Rowe, Charles Rosenberg, and Illah Nourbakhsh at Carnegie Mellon University. This article was accepted to the IROS 2002 conference and can be found online at http://www-2.cs.cmu.edu/~cmucam/Publications/iros-2002.pdf. The reason I selected this article is because a soldier-portable UAV requires an embedded image system that is restricted to a payload-like size and weight requirement. When the entire vehicle only weighs 10 lbs, the vision system is necessarily restricted to being at the very most 1 lb. Thus, a typical desktop-frame grabber scenario is invalid. I wanted to see what the capabilities of an embedded image system and the article served as one such benchmark. Also, the CMU system allows for the tracking of a colored object. Tracking a detected object is another vital piece of the required functionality of the combat zone UAV described in Part B.

The article describes a product called the CMUcam, which is an embedded vision system which can perform basic color blob tracking at 16.7 frames per second. The motivation for the product was to extend vision system capabilities to applications that are restricted by size, complexity, and budget. In these types of applications, the traditional desktop computer with separate frame grabber is not feasible. Instead, the development and increased availability of CMOS cameras and microcontrollers combined to make an embedded vision system possible. The authors state that a major advantage of CMOS cameras versus CCDs is the “ability to integrate additional circuitry on the same die as the sensor itself.” The result is a vision system that is 1.75” x 2.25” and less than 2” deep with the camera module and lens attached, with a power requirement of 5 V at 200 mA.

The camera used in the product is an Omnivision OV6620 CMOS camera with an image array of 101,376 pixels, a supported resolution of 352 x 288, and a maximum refresh rate of 60 frames per second. The camera interfaces with the microcontroller using a standard serial port.

The microcontroller that is used to process the video is a Ubicom SX28 operating at 75 MHz. It is a RISC processor and operates at 75 MIPS, with a 2048 word flash programmable EPROM and 136 bytes of SRAM. The microcontroller also has fast and deterministic interrupts and three multi-bit I/O ports that were used to implement a serial UART port, a standard hobby servo PWM output port, and to control a status LED on the system. All firmware for the vision board was written in C and compiled using the ByteCraft SXC v2.0 compiler.

The color blob tracking was done using a simplistic algorithm that user to specify a the color limits of the object to be tracked and the bounding box in which the tracking was to be done. Then, analysis was performed to compare the number of color pixels within boundary to the colored pixels outside the box. More detailed analysis could be done, such as determining whether only one compact object was contained in the box or multiple objects, as well as a slew of color statistics and windowing options, but the details are best left to the paper due to length considerations here.

The authors reported that once the camera’s color bounds were set, that the CMUcam could track a blue 14” x 15” x 10” recycling bin confidently up to 35 feet away. With the appropriate IR coated lens, the authors reported that the system “performs well in a wide range of lighting conditions, including direct sunlight outdoors,” with nearly identical results outdoors as inside.

Finally, the authors created a 4” x 3” x 3” robot containing a small differential drive mobile base and a PIC microprocessor to further test the camera’s tracking abilities. The demonstration robot could successfully track a “small brightly colored red doll” at distances up to 15 feet.

The strengths of this article is that it details a system that is on the correct scale for size and weight as that which is required for a vision system that would be employed on a soldier-portable UAV. However, the weakness was that the system was completely incapable of providing the necessary level of functionality for the task. The tracking algorithm relied on color, which would not be available during night operation. The objects being tracked were either monochromatic or very simply colored to facilitate the simplistic algorithm instead of man-made objects in complex terrain. Even with the simplifications, the range, processing power, and memory were drastically undervalued for the desired application.

As previously stated, the three articles were not selected to have any relation to each other, but were instead selected to address functional requirements of the purpose cognitive robot. This article addressed the need for an embedded vision system that weighed less than 1 lb but could still perform object tracking. The article gives an idea of the state of technology for embedded systems on the scale of interest for the desired project.


The last paper I reviewed was a bit different than the other two. Instead of being directly related to the algorithms required to add the necessary intelligence to the proposed UAV project, the paper discusses the creation of an image database created for facial recognition. While the direct application of the topic does not apply, the issues dealing with capturing an image database across different poses, illumination patterns, and expressions can be considered analogous to the problems that would be encountered if an image database were to be created of military targets. The article was entitled “The CMU Pose, Illumination, and Expression Database” and was written by Terence Sim, Simon Baker, and Maan Bsat. It can be found online at http://www.ri.cmu.edu/pub_files/pub4/sim_terence_2003_1/sim_terence_2003_1.pdf.

The authors addressed the need for a database consisting of “a fairly large number of subjects, each imaged a large number of times, from several different poses, under significant illumination variation, and with a variety of expressions.” To accomplish this, the authors suggest that imaging something from multiple poses either requires multiple cameras or multiple shots taken consecutively. They propose that the multiple camera option was more advantageous, if possible, for reasons including less data collection time, the fact that if the cameras are fixed in space, the relative pose is the same for every subject and there is less difficulty in positioning the subject to obtain a particular pose, and the fact that the imaging conditions are the same if the pictures are taken simultaneously. The disadvantage to this method is that multiple cameras, digitizers, and computers are needed, the cameras need to be synchronized such that the shutters all open at the same time, and that the cameras will have different properties.

The authors decided to use the multiple camera, simultaneous data collection method due to the “3D Room” at CMU. The 3D Room has 49 cameras, 14 of which were high quality Sony DXC 9000s. To produce similar image quality across the database, 13 of the 14 Sony cameras were used to take the pictures.

To obtain illumination variation in the pictures, the authors installed a “flash system” in the 3D room that was similar to one used at Yale by Georghiades et al in 2000. The authors used an Advantech PCL-734, 32-channel digital output board to control 21 Minolta 220X flashes. Generating a pulse on one of the output channels then caused the flashes to go off. The flash control code was integrated into the image capture routine such that flash occurred while the shutter was open. The authors modified the image capture routine so that they were able to capture 21 images, each with different illumination, in approximately 0.7 s. The capture routine took about 10 minutes per subject, allowing the authors to capture and store over 600 images from 13 poses, with 43 different illuminations, and with 4 expressions. The images were color and had a size of 640 x 486. This required approximately 600MB per person for a total of around 40GB.

While the topic of an image database of facial pictures does not have much direct application to a soldier-portable vehicle, the strength of the paper was that it provided background into some of the issues associated with creating an image database for objects. If such a database were created for military targets, some of the issues overcome by the authors of the article would need to be addressed. For instance, while the expression of a tank might not have any meaning, the illumination differences and the relative position of the viewer to the tank might make a difference, depending on the visual recognition algorithm. In order to obtain a full definition of all sides of an object, multiple pictures would have to be taken and analyzed, as was the case in this paper.

In addition to addressing some of the obstacles encountered in created such a database, the paper also gives a first approximation for time required to create such a database and the amount of storage required. While this certainly varies depending on the number of pictures taken per object and the amount of objects imaged, the paper at least provides a baseline estimate.

Part E: A Simple Project for your Cognitive Robot


I would like to pursue a project in either visual object recognition or in developing the preliminary work towards an automated landing capability for a UAV using information provided by a lidar system.

The first type of project mentioned above is still as of yet poorly defined in my mind primarily due to my lack of substantive knowledge on the subject. I have never worked with visual recognition of objects before so I do not have a good feel for the scope of a reasonable project given a 1-2 month timeframe. Based upon the requirements for the cognitive robot described in Part B, I would like to implement a visual object detection algorithm that can at the very least detect what objects are present in a given picture or video stream. Once these objects are detected it would be ideal to perform at least some nominal comparison of features to that of known objects to see if a match could be found. No requirement of tracking would be necessary for the project, as object detection and recognition serve as the foundation for the UAV functionality.

I do not know at this time if implementing such an algorithm is realistic in the given time frame and if so, which algorithm best lends itself to the application. To mitigate my lack of knowledge and specificity on a vision-based object detection system, another possible project of interest is working towards an automated landing procedure for a UAV based upon information by a lidar system. As mentioned in Part C, such a system would be applicable in a broad range of UAV application, would save in both operational cost and the efficiency of vehicle design, and is in demand by the military currently. A lidar system is a laser radar, which contains a laser that sweeps across a designated swath and returns the distance to the closest object at a given angle. If this tool was aimed downward at an angle such that it projected ahead of the vehicle, the resulting distances could be used to define a topological map of the area flown over by the aircraft. Combined with the UAVs position, the lidar could be used to map the landing area.

As discussed in Parts B & C, two different methods could be used to land the UAV with the lidar. In the first method, the UAV would attempt to land with no prior knowledge of the surroundings whatsoever. The autoland system would attempt to avoid obstacles as they are detected, but there would be no effort to find an ideal landing location.

In the second method, the UAV would utilize two passes over the designated landing region. During the first pass over the region, the lidar would collect the information and the topological map would be created. Then, as the UAV is in transit for the second pass, the autoland system would use the topological map to determine where the ideal location is for landing. It would then determine the optimum trajectory to guide the UAV to the selected landing site, thereby maximizing the chance for survival. At a certain elevation above the ground, as determined by the lidar system, the autoland system would execute a pitch-and-stall procedure to bring the vehicle down as gently as possible.

My proposal would be to use the second method to land the UAV. I would of course need several simplifying assumptions. The first obvious statement is that this would all be done in simulation and not hardware development. Next, I would assume that the simulated UAV would behave exactly as commanded in real-time without error. I would also assume that the first scan of the landing region was already performed and the lidar had properly detected and mapped the region. From these assumptions, I would then create a 3D topological map and select the best landing location. From this location, the surrounding terrain, and the known position, attitude, and velocity of the aircraft, I would then attempt to calculate a safe trajectory to the landing site. To be a bit more realistic with the given time constraints, I would say that a safe trajectory is more than adequate rather than an optimal trajectory and that the surrounding would need to be “reasonable,” so that the UAV would not be attempting to land in ridiculous environments in which no pilot would ever attempt to land.

Jason Miller


Part A


Fault detection and diagnosis: When operating in real-world environments, autonomous systems
must be able to monitor themselves and react appropriately when a failure occurs. In some
cases, the system may be able to work around the fault and continue. In others, it is sufficient to
avoid making the problem worse and await further instructions. In either case, the system must
be able to use limited numbers of sensors (which may themselves be noisy or faulty) to
determine, first, that something is wrong and, second, identify the specific fault that has
occurred. Both of these things can be difficult, especially given limited computational resources
and real-time constraints. I would like to learn more about current techniques used to solve these
problems in systems such as rovers and space probes.


Detection and tracking of moving objects: In the version of SLAM presented in class, an
assumption was made that the world was static, i.e. that any changes perceived by the sensors
were a result of the robot’s actions. I would like to learn more about techniques that could
remove this restriction and allow a robot to deal with other moving objects in the environment.
These objects might be obstacles (like other cars on the road for a driving robot) or things to
interact with (like other robots or the ball for soccer-playing robots). The first problem is to
separate moving objects from the static background, especially when the robot (and its sensors)
may itself be moving. The second problem is to track these objects and estimate their
trajectories for purposes of avoiding, following or intercepting them.


Face detection and recognition in computer vision: To interact naturally with humans, it is
helpful for robots to be able to detect and recognize faces. The first step is simply for the robot
to recognize that it is looking at some face. This information can used to count or track people in
its view since each person has exactly one face. In addition, a system such as an intelligent kiosk
might want to know when a person is in front of it and looking in its direction. The movement of
facial features can even be used to control aspects of a user interface. Even better than detecting
that there is some person in a scene is the ability to recognize who that person is. This is useful
for surveillance or access control but can also allow a robot to customize its behavior to an
individual. Both of these problems are complicated by the fact that, in the real world, people
tend to move around and may not be looking directly into the camera. I would like to learn more
about current techniques to identify and recognize faces, even when they are misaligned or
partially obscured. Some of these techniques operate on still images while others use video to
aggregate data from a variety of angles.


Part D


R. Washington, “On-Board Real-Time State and Fault Identification for Rovers,” In Proceedings
of the 2000 IEEE International Conference on Robotics and Automation.

This paper was selected because it describes a fault detection and identification system for a
rover-type robot. My robot would have appendages similar to what might be found on a rover
and could therefore suffer similar types of failures. This system is based on the more traditional
(than the next paper) techniques of Markov models and Kalman filters but attempts to fuse the
two to reduce the computational burden of state estimation. In my robot, this work could be used
to detect faults in the appendages or drive system.


The system presented is a hybrid of a discrete-state Markov model and a continuous state
Kalman filter. In this context, the state space includes both normal states of the robot and states
where one or more faults have occurred. Thus, by estimating which state the robot is in, we
determine both that a fault has occurred and identify which fault it is likely to be. A Markov
model is used to represent different global modes of rover operation (e.g. idle, driving, drilling).
Different modes imply different “normal” versus “fault” behavior. For example, sensor values
indicating a high drive current may be normal in some modes but indicate a fault in others. The
Markov model state is then used to set the parameters and state space for a Kalman filter. The
Kalman filter uses sensor data to estimate the continuous state within the Markov mode. In this
way the system prunes the state space that must be estimated by the Kalman filter. The
estimated state from the Kalman filter is then propagated back to the Markov model in the form
of a simulated “observation” matrix.


However, when the system changes from one state in the Markov model to the next, it must carry
over the continuous state to form the new initial conditions for the Kalman filter. Therefore,
different instances of the discrete states are actually made unique by the continuous state fed into
them. This effectively increases the number of possible Markov model states and can lead to an
explosion of model size. Therefore, the system limits the number of Markov states to ensure that
the computational complexity stays bounded. This means that some improbable states will be
lost and it is possible for the state update formula to return a null state. They claim that this
problem has been addressed in previous work but do not explain it here.


Based on their results, this system does a decent job detecting faults on a rover type robot and
runs in a reasonable amount of time with limited computational resources. However, the system
requires a large number of Kalman filter parameters and seems to be very sensitive to their
values. It would most likely require a large amount of data to be collected from various trials of
the robot to properly establish these values. In addition, the limitation on the size of the Markov
model limits the accuracy of the state estimations. While this is clearly a limitation, it also
allows for a trade off between computation and accuracy. If more computational resources are
available, the Markov model can be allowed to grow larger, increasing it’s accuracy.


V. Verma, G. Gordon, R. Simmons and S. Thrun, “Particle Filters for Rover Fault Diagnosis,”
IEEE Robotics and Automation Magazine special issue on Human Centered Robotics and
Dependability, June 2004.

This paper was selected because it presents an alternative solution to the fault-monitoring and
identification problem on rover-type robots. It is based on a substantially different idea than the
previous paper thereby broadening my view of the topic. Again, the algorithms presented would
be applicable to monitoring faults in the appendages or drive system.


Most traditional techniques for fault detection (e.g. the first paper) attempt to approximate the
problem that needs to be solved in order to reduce computational complexity. In the previous
paper, this took the form of limiting the size of the state space. This paper uses a technique
called particle filtering to find an approximate solution to the complete, complex problem. The
basic idea behind particle filtering is to use a Monte Carlo technique to pull samples (“particles”)
from the true solution. The distribution of the samples is then taken as an approximation of the
true distribution. The main problem with particle filters is that a large numbers of particles must
be drawn in order to accurately estimate the state when there are low probability states (e.g. fault
states). This tends to negate the advantage of using particle filters in the first place.


This paper presents three techniques which allow particle filters to accurately model low
probability states using fewer particles. The first technique assigns a risk factor to different
states and uses it to influence the choice of particles. Fault states, which are usually lowprobability
but high risk to the mission, get a higher risk factor and are therefore more likely to
be sampled. This helps to ensure that potentially serious faults will not be over-looked, even
when using relatively small numbers of particles. The second technique groups together multiple
fault states that have similar symptoms (like wheel failures) and treats them as a single state
which now has a higher probability. If this merged state begins to become more probable (i.e. it
appears that one of the faults in the grouping has occurred) it can be decomposed into individual
states to find the exact fault. The third technique incorporates sensor data into the choice of
particles, thereby making it more likely that states which explain those data will be sampled.
Using these techniques the authors demonstrate impressive accuracy using relatively small
numbers of particles. However, they do not provide data on the actual amount of computation
required to implement their system so it is difficult to compare it directly to the results from the
previous paper.


J.L. Bresina and R. Washington, “Robustness via Run-time Adaptation of Contingent Plans,”
Proceedings of the AAAI-2001 Spring Syposium: Robust Autonomy. Stanford, CA
.
This paper was selected to help answer the question of what happens after a fault is detected.
The authors describe a command language and execution environment which allows for plans to
be dynamically altered in response to failures or new opportunities without requiring a fullblown
re-planning. It is especially useful in environments like the current Mars rovers where the
robot is not capable of doing its own planning to begin with.


In this system, it is assumed that the robot receives a plan from an external source and then
operates autonomously until its next scheduled plan upload. With existing technology, if the
robot experiences a failure and cannot complete its plan, it will sit idle until it can receive
instructions. Using the system proposed here, the uploaded plan would include alternate plans
that could be executed if the robot detects that conditions were not as expected. This could allow
a rover to try alternate techniques to achieve its goals or even switch to different goals if it
encounters a difficulty with the original plan.


Two core techniques are presented, plan step skipping and alternate plan merging. Using plan
step skipping, the robot would be able to skip over portions of the plan which are no longer
reasonable and continue on with the rest of the plan, thereby achieving at least some of its
original goals. For example, if a plan included taking several types of measurements but of the
instruments had a failure, the other measurements could still be performed. Using plan merging,
the robot would able to splice in a plan fragment either by inserting it between plan steps or
replacing existing ones. This could allow the robot to retry an action which has failed, perform
alternate actions if it continues to fail or perform extra actions if resources permit. The key to
making this work it to assign utilities to different portions of the plan and the alternate plan
fragments. Each plan step also includes conditions which must be met for the step to be
executed. If the robot finds that a step cannot be executed, it can consider skipping steps or
merging alternate plans to find the new plan with the highest expected utility.


This paper is an interesting extension to the status quo for rover control. It allows rovers to
execute complex alternate plans by simply recalculating utility functions. While the resulting
plans are more limited than you would get from a complete replanning, the reduced
computational complexity makes it a good solution where computational resources on the robot
are extremely limited.


Part E


The robot proposed in part B would be a very ambitious project. I believe that a manageable
alternative would be to design one or two appendages, model them in a simulation and
implement a fault detection system for those appendages. Since I would not have an actual
robot, I would have to simulate failures and sensor noise. I could create a library of motion plans
for the appendages by hand rather than implementing a planner.


I’d say the chances of my pursuing this project are about 40% but I would still like to pursue a
project having something to do with fault detection and replanning.

 

Emmanuel Munguia-Tapia


Part A: Topics of fascination


Dynamic Bayesian Networks and Continuous Time Bayesian Networks
Dynamic Bayesian networks are nowadays the method of choice for supervised sequential learning. One particular case of DBN is hidden markov models that are widely used in different domain such as speech recogntion. One extension to DBNs are continuous time Bayesian networks. Continuous time Bayesian networks (CTBN) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. CTBNs are relatively new in the field of machine learning, they provide a way to do inference on continuous state spaces without having to discretize the state space as usually done in dynamic Bayesian networks. I would like to learn more about them and probably implement a reasoning system using DBNs or CTBNs.
Filtering and state space estimation with particle filters
Particle filters are nowadays very popular and probably the method of choice for doing inference on Bayesian networks and dynamic Bayesian networks. The reason is because they are an approximated way of doing inference based on monte-carlo sampling techniques and their accuracy depend on the number of particles used (samples of the system). Also, particle filters are widely used in position estimation since they can represent and solve non-linear problems that the Kalman filter and EKF cannot solve or takes excessive time to solve. I would like to gain practical experience by using this technique applied to inference on large Bayesian networks and state spaces.
Reinforcement learning
There is a lot of research done in applying machine learning algorithms and pattern recognition techniques to infer human activity by smart homes and robots. However there is very little research on how to fix the models of human activity in real-time if they are not performing well. More importantly, most of these ML techniques require extensive training data to perform with a reasonable accuracy and most of the time, in the home setting, it is not feasible to assume that the user will provide 50-100 examples of activities to recognize. I believe that

novel reinforcement learning techniques could provide a robust way to fix the models in real-time by just having to provide answers such as ‘Yes’ or ‘No’ whenever the system correctly or incorrectly predicts the human activities.

Part D: Researching a critical Reasoning method


The three papers selected are the following:
Philipose, M., K. Fishkin, et al. (2003). "Guide: Towards Understanding Daily Life via Auto-Identification and Statistical Analysis." UbiHealth Workshop at Ubicomp 2003.
In this paper, a system for recognizing activities in the home setting is proposed by using a combination of dynamic Bayesian networks and RFID tags attached to everyday objects in the environment. I think that this is a very interesting approach to the problem of figuring out what a person is doing in real-time. The authors assume that a person using such a system in the home would wear a wearable RFID glove in their hand that will read the RFID tags every time the person interact with different everyday objects. The main idea is the following: You wake up, touch the toothpaste, then the toothbrush and also open the cold water faucet. If you have sensors in these different objects and an inference algorithm, a computer or robot could infer that you are brushing your teeth.

Another novelty of the paper is the fact that they automatically extract human activity models from the web by mining websites such as ehow.com that have “recipes” on how to carry out different human activities. Thus, this approach do not require extensive examples of activities in order to train the machine learning algorithms. The authors compile the models of human activities from the web in dynamic Bayesian networks and use particle filters to infer what the most likely model is given the trace of RFID sensor activations. I basically have two different criticisms: one is that we cannot assume that a person at home will always use a wearable glove that is the basis of the system to work. Furthermore, I think that the approach of compiling human activities mined from the web into DBNs is promising, however, I believe that it would be important to provide a way to fix or retrain the system in case the system is not recognizing the activities accurate enough. I think this is the main flaw of this particular approach. This paper is relevant to the cognitive robotic system I am planning to build because I will use a similar approach. Instead of making the person wear a glove, I will place the sensors in the environment. Small sensors everywhere that could provide me information about object’s usability.


Wilson, D. "Simultaneous Tracking & Activity Recognition (STAR) Using Many Anonymous, Binary Sensors." Pervasive 2005.
In this paper, a system for recognizing activities in the home setting and for tracking people’s location in a room base level is proposed. The system uses sensors commonly found in current security systems such as contact reed switches located in doors and windows, as well as motion sensors. For the inference, the author proposes an inference algorithm based on layers of dynamic Bayesian networks in order to infer the inhabitant’s activities. The beauty of using layers of DBNs is that the lower layers of the DBNs can be retrained to compensate for changes in sensor readings due to noise or different home settings while keeping the upper layers with no changes. Given the limited amount of sensors that the author suggests, the granularity of the system in terms of the different activities that it is able to recognize is not big enough. The author proposes to recognize a subset of 5 different everyday activities as well as to track one persons location in the environment. This paper is relevant because it uses sensors in the environment to track people’s location as well as people’s activities. I like the idea of using different layers of dynamic Bayesian networks to perform inference at different levels of granularity about human activities. I will probably implement something similar for my final project and probably add another layer for correcting the system whenever a misclassification is performed. Finally the system proposes to train the system by providing explicit training examples which I think might be impractical for some activities. I believe that the combination of prior information about human activities (probably mined from the web) as well as from explicit training examples could prove to be particularly powerful.

Patterson, D., D. Fox, et al. (2003). "Expressive, Tractable and Scalable Techniques for Modeling Activities of Daily Living." UbiHealth 2003: The 2nd International Workshop on Ubiquitous Computing for Pervasive Healthcare Applications.
This paper is relevant since it touches on different aspects of tractability and scalability in methods for inferring and modeling human activity. The methods proposed are similar to dynamic Bayesian networks and the main inference algorithm proposed is again a particle filter. Prior information about human activities is provided in the form of activity models. The author proposes some changes to the conventional way probabilities are used in DBN systems and proposes the concept of mutually exclusive probabilities. This paper is relevant because it also uses similar reasoning techniques to infer people’s activities but touches on issues of scalability and tractability.


Part E: A Simple Project for your Cognitive Robot


I think that for my final project I will implement a stationary robot embedded in the home environment that will be able to reason about its inhabitants activities and take intelligent decisions according. One possible approach could be to infer the right time to interrupt the user to provide some information or reminder. Another interesting approach could be to try to do some action execution monitoring for elderly people. Some people suffering from dementia are have problems performing activities such as preparing coffee or tea because they forget if they have already put sugar in the coffee. A system that would provide assistance is such domain should be able to take readings from sensors placed in different objects, infer the current step in an activity that the user is in, and provide a plan of action towards what the next step in the activity sequence should be.
This problem combines interesting aspects: A reasoning algorithm to infer the current system state, and an action/planning algorithm to find what the next step or state of the system should be.


Jennifer Novosad


1 Part A: Three Topics of Interest


1.1 human-robot interaction
Part of the effort to make robots integrated into society is the effort to improve human-robot
interactions. Human-Robot interactions can be likened to an intelligent user interface, which might
need to modify its behavior based on human moods, attitudes, expertise, et. Some of the current
work in general human-robot interaction includes language processing, fuzzy logic, decision trees,
emotion imitating, personality, gesture/facial recognition, and modeling human behavior. I am
interested both in what qualities make for a robot that is easy to interact with, and how a robot
can model human behavior.
1.2 Bayesian Belief Networks
These networks bring together probabilities and directed graphs, and are used frequently in machine
learning. The graph carries information about the conditional dependencies between nodes, and
about the probability distribution functions of the nodes (often assumed Gaussian). Nodes can be
random variables, parameters, observations, or hypotheses. The problem with using these networks
is that acquiring knowledge and updating them are time costly. Some problems, such as inference,
are NP-hard. I would like to more about how these networks can be applied, and what sorts of
approximation algorithms exist to help learn a Bayesian network.
1.3 Multi-agent Task Solving in Robotics
Some of the problems of multi-agent robotics include cooperation, diversity, communication, dis-
tributed learning and distributed planning. Some of the test beds of these principles include robotic
soccer, foraging, and defending a home base. Some of the current research looks at optimizing ones
individual task for the benefit of the group, collective locomotion, insect behavior, and learning
team solutions. I would like to learn how a group of entities learns about a problem, as opposed to
a single entity.


2 Part D: Researching a Critical Reasoning Method


I chose three papers on work in virtual tutors, focusing on weakness identification and student
models. Of the three papers, one paper provides a current implementation example, the second
describes a higher level goal for what should be included in a virtual tutor, and the last gives a
mathematical model for a student. Aaron D’Souza’s paper gives an example of what sorts of work
are currently being done in commercial virtual tutoring, and provides concrete implementation
details. The second paper, “Tactical Action Officer Intelligent Tutoring System (TAO ITS)” by
Richard Stottler, describes a direction in which to move ITS, and describes the effectiveness of
ITS in a high dimensional learning environment. The last paper, section 4 of “An Intelligent
Distributed Environment for Active Learning” by Yi Shang, Hongchi Shi, and Su-Shing Chen,
gives a mathematical description of a student model, and describes the strengths and weaknesses
of this model. Because the student model is only roughly covered by the previous two papers, it
fleshes out some of the important details that need to be understood before implementing a virtual
tutor. Each of these papers describe a different architecture, and focus on teaching a different
kind of knowledge to the student; however, all of them give information on student models and
interactive tutoring.


2.1 Paper 1: An Automated Lab Instructor for Simulated Science Experiments
Source:
Aaron D’Souza, Jeff Rickel, Bruno Herreros, and W. Lewis Johnson. “An automated lab
instructor for simulated science experiments”
. In Proceedings of the International Conference
on Artificial Intelligence in Education (AI-ED 2001), pp. 65-76, San Antonio, TX, May 2001.


I selected this paper while searching for ideas on how to identify student weaknesses. I found few
papers that mentioned student weaknesses identification at all, and most of those that did referenced
this paper as a concrete implementation. This paper turned out to be about the implementation
of an automated lab instructor (ALI). The ALI program allows students to modify variables in a
few simulated experiments, and asks them multiple choice questions on what they have learned.
ALI also suggests experimental variables to modify. ALI’s big successes are asking appropriate
questions and suggesting good variables by noticing when an experiment demonstrated a concept
well, and remembering if the student has already learned that concept.


This paper made several contributions towards implementing a virtual tutor. It claims to be
one of the first fusions of previous work done in language, knowledge representation, tutoring, and
physics simulation. Many commercial tutors used language or physical simulation, but not both.
This paper provides an architecture of the program, which could be used to implement similar
projects. It also provides evidence of the system’s effectiveness with a small study of the program
on 6 high school students.


This paper’s biggest strength is in its descriptions of implementation details, which are fairly
clear and explain enough to help someone else build a similar system. The study’s methodology
also made a strong case; after using the program, the students gave more concrete answers and
focused on the relationship between variables on the post test.


This paper’s greatest weakness was a lack of emphasis on big ideas, or higher level concepts.
It main idea is the architecture for organizing the important components, such as a student model
and low level reasoner. However, each of these components is very simplified. For instance, the
student model only stored if the student had seen concepts, had them explained, and correctly
answered questions on them. It did not store the way in which the student learned concepts, or
what problems the student could apply them to. The paper did not make clear if the program
could easily be extended to include this information, or what other components were similarly
incomplete. Also, while the methodology of the study seemed very good, the size of the study (6
students) was too small to be significant.


After reading this paper, I went back and revised my cognitive robotic system’s architecture.
This paper’s emphasis on symbolic representation also gave me the idea to embed the higher level
concepts into the programming language, similar to the fault isolation lecture in class.

Many of the low level implementation details in this paper would not apply to cognitive robot,
because of the differences in environment. While ALI could work in an environment in which the
student behavior was restricted to modifying a small number of variables in a controlled situation,
or choosing a multiple choice answer in response to a question, a robotic karate teacher would need
to work in an environment where the student response was more varied, and the student knowledge
for each concept could not be expressed well as ’done’ or ’not done’.


2.2 Paper 2: Tactical Action Officer Intelligent Tutoring System


source:
Stottler, R., M. Vinkavich (November 2000), ”Tactical Action Officer Intelligent Tutoring Sys-
tem
”, Proceedings of the Industry/Interservice, Training, Simulation & Education Conference
(I/ITSEC 2000)

I found a reference to the TAO ITS while searching for a method of creating a student model
when the simulation has more unknowns than how the student will modify two or three experimen-
tal variables. The TAO ITS is a program owned by the U.S. Navy, used to help train future officers
in tactics. The tutor observes the student playing a ship on a simulated battle field, and provides
feedback after game play on how successful the student was. The program can also design simula-
tions, warn a human teacher about student difficulties, give reading material on tactical concepts,
and replay student errors. This program interested me because the student has a range of actions
that he can choose to solve a problem in which enemy goals are ambiguous. I selected this report
on the system because it appeared to have the most details about the program and what were the
goals of the designers.


This paper’s biggest contribution was the description of the qualities of a good student model.
A student model should include tasks the student has performed, and his performance on those
tasks. A student model also needs a method to estimate mastery of skills, knowledge, and the
ability to apply them when appropriate. The tutor must also keep track of what circumstances the
student can apply the knowledge. This student model not only accounts for factual knowledge, but
the ability to integrate that knowledge into experience. The high level ideas in this paper were its
strength. The ideas in the paper suggested a direction in which ITS research should move.
However, very little implementation suggestions were made. After describing what ITS is, and
the high level goals of it, the paper mostly focused on look of the user interface. Additionally, while
the paper provided a promising looking study of the opinions of 12 students on the program, the
study did no evaluation to see if the students actually learned by using the program.


This paper describes several aspects of an ITS that I feel are necessary for the karate tutor
cognitive robot. The student model and the weakness finding of the tutor model share several key
features. For example, in both tactics and sports the ability to apply knowledge at the appropriate
time is as important as the knowledge itself. Also, the student learning environment in this system
has both environmental uncertainty, and many options for student behavior. The karate tutor will
need to be able to function in an equally difficult environment. However, there are some differences
between the tactics and karate teaching, such the value of waiting to the end of a simulation to
provide feedback, which would need to be modified for the karate tutor.


2.3 Paper 3:An Intelligent Distributed Environment for Active Learning


Source:
Yi Shang, Hongchi Shi, and Su-Shing Chen, “An Intelligent Distributed Environment for Active
Learning
” Section 4, Department of Computer Engineering and Computer Science at University of
Missouri-Columbia, CopyrightMay 1-5, 2001, Hong Kong.

While looking though summary papers on student models, I noticed that several of them men-
tioned using a Bayesian model for the student. I chose this paper because it provided a mathemat-
ical explanation of how to feasibly model a student as a Bayesian model, and blatantly listed the
strengths and weaknesses of the model.


The major contributions of this section are concisely explaining what methods of student mod-
eling have already been attempted, what kinds of weaknesses are common in student models, and
proposing a student model which is both computationally reasonable and models uncertainty in
student behavior. The proposed model is explained in mathematical terms, and the model’s weak-
nesses and strengths are listed.


Section 4 provides the information needed to understand many other papers on ITS. It also
completely opens up the black box student model, and shows a great deal of the details that would
be needed. This section’s main weakness is that it doesn’t provide an example of this being used,
or show evidence of any testing.


This paper shows some of the internals of a student tutor that the karate tutor would need
to model student behavior. This model is robust against students accidentally making errors or
guessing, to it gives a better explanation of students than the boolean model described in the first
paper. Also, it is a linear time algorithm in the number of data items, so it should be able to be
run in real time. While this exact model does not meet the karate tutor’s needs because it can only
model one skill, it could be extended to meet the karate tutor’s needs.


3 Part E: A Simple Project


While I would really like to give an advanced lecture on this sort of material, I am worried that its
scope is too large to simplify into a simple term project.


A simple project in this area would be to create and test a student model which could identify
student weaknesses in addition to their strengths. To complete this goal, at least 5 modules would
be needed: an environment, a abstract concept representation, a student model, and two reasoners.
The simple project would not have a tutor model to avoid the need for language processing.
The environment in which the student learns should be at least as complex as a lode runner
type game, where there are a variety of actions which constitute the same higher level concept,
and there are several scenarios where each concept could be applied in. Ideally, the game should
involve some uncertainty, such as other agents or two human players, so that the space of learn-able
concepts is richer.


In order for the student model to be tested, the system would also need two reasoners. One
reasoner would identify situations in which a concept could be learned. The second reasoner would
identify how the student behaved in that situation. Both of these reasoners would be dependent
on a good concept representation model.

Rather than have a tutor, the student model could print out suggestions for a human tutor
listing the concepts seen and the expected level of understanding. Creating a tutor in addition to
a student model would probably be too large a project for the course of one term, because of the
amount of labor that would need to go into language processing, user interface, and creation of new
environments to practice the skills that are weak.


In order to test the success of the student model, several human students should interact with
the program, with the observation of a human teacher. Between sessions, the human teacher should
provide tutoring in specific concepts, while not providing tutoring in others. The response of the
model over time should indicate a faster strengthening in the tutored concepts than those that are
not tutored.


This project would embody a few of the salient features of the karate tutor. Like the karate
tutor, the learning environment of this project would be reasonably complicated, and the student
would have a variety of possible actions to take.


I don’t think it is likely that I will do this project. I am worried about the size of the project,
compared to the value. I believe that this project would be a rediscovery of work that has already
been done with TAO ITS. In addition to the student model and reasoners which I am interested
in working on, this project also requires creating the environment for the student to learn in, and
learning enough about this environment to develop higher level concepts for them to study. I worry
that the cost of this project is much greater than the value gained.


Another project that I have been thinking of is extending SLAM to include multiple agents
that could communicate maps to each other. For this project, I would need to create a software
environment, re-implement SLAM, and create some robotic agents for the environment. I would
then need to extend SLAM to incorporate simplified maps from other agents, and identify moving
objects. Agents would behave slightly differently, so that the map received from another agent
would have an unknown level of trust. I feel that this project is a better term project because there
are incremental levels of completion, and it is easy to add other interesting features if I finish early.
Also, most of the programming would be focused on the reasoning skill, rather than the test bed.

Jeremie Pouley

Part A:

The three topics in reasoning applied to embedded systems I would like to learn more about are the following. The first two are related to the project I would like to pursue, and the third one to something I already partially study and that is a real key issue for incoming space exploration.

• Search algorithms: I don’t know much about this topic which is the most important for chess players. Today’s computers allow us to do amazing calculations and to explore tons of possibilities in very short times, and it is easy to forget that for even pretty simple real-life problems, the pool of answers in so huge that we can’t just explore everything. The real issue when we want to solve a problem is not the capability of the computer to theoretically solve it, but rather the time it needs to solve it. Indeed if it will need six months to find an answer, it can be useless since the problem itself could change during these six months. This is even truer for potentially infinite problems such as what a chess player should play in a certain chessboard configuration. Most of the time there won’t be any solutions that will lead to victory whatever the human player choose to play (100% probability victory). In such cases we have to decide when to stop the algorithm searching for the best move, knowing that it could be a potentially better move that has not been found yet. As the limit is usually given by the calculation time – actually it is the deepness of the search but this is chosen functioned of the calculation time – the real challenge is to find faster search algorithms. Thus the level of a chess player is firstly given by the search algorithm that it uses.

• Machine learning: by that I mean the ability of a computer program to evaluate from its original state to improve its performance. This is also a real issue in robotics and any AI system. Theoretically I believe that if we can develop a good enough learning process, we will be able to create human-like robot able to grow up from “childhood-like state” to adult state and that will really be unique and autonomous as we are. As far as current interest, I would like to learn more about this topic because this is the real essence of AI and robots. If we are to send robots on Mars, it would be a real improvement if those could act more by themselves and not wait all the time for the eight minutes communication delay with Earth. However as we don’t really know the Martian environment, this could only succeed if the robots are able to learn alone from what they see and touch. Closer to my project, it can be useful to develop a chess player that can improved itself by playing against the same group of human and even against humans in general (for I think humans in general – with the exception of world champions – tend to play according to a certain scheme). It could allow the AI chess player to define its own heuristic process that will lead it to victory.

• Communication with humans: Future space exploration should involve human / robot cooperation since one day we will be bored to send only robots and we will like to bring the space exploration to another level, and because, at the same time, it’s cheaper and safer to send robots than humans. Then if we want to use both robots and humans in cooperation, we must find a way for them to communicate while carrying a mission on Mars for example. The robots are currently seen as “extra-hands” for the astronauts, therefore they must be able to understand the orders and request the humans will tell them. This is the first part of communication and I already worked on this problem of voice recognition. I coded a Matlab program able to recognize a small dictionary based on voice characteristics. But at the same time robots should be able to address concerns, warning or simple answers to the astronauts: we would largely improve the cooperation if communication could go both ways. I would be glad to conceive a robot able to discuss with humans, play trivia or tell jokes and stories in a pretty fluent language – even limited to certain fields.


Part D:

I chose three papers on computer chess player dealing with both learning and search algorithms as it is the foundation of the AI that I will use. My last paper is not directly related to what I will do during my project, but is rather an opening to show what we can do with AI chess player beside only play normal chess.


“An Evolutionary Approach for the Tuning of a Chess Evaluation Function using Population Dynamics”. Graham Kendall, Glenn Whitwell.

I chose this paper because it explains how to use computers real power, which is to be able to do the same thing hundreds of time in a really short time with no error, to find the best chess evaluation function. It has the advantage of giving an optimal evaluation function with a rather low cost. I also chose this article because it shows that even light modifications to the base of the chess player, like the weighing of the pieces, can really change the way the AI plays.
The paper presents how to use machine learning in order to find the optimal evaluation function just by comparison between all the possibilities. Instead of imposing a certain weighing to the evaluation function, the idea of the article is to use machine learning techniques to allow the computer to choose the best estimate of the optimal evaluation function within a random population. The algorithm first step is to generate the random population of evaluation function candidates. Then the members of the population compete against each other, to find out which one is the more efficient. When a candidate looses a game, it is discarded from the pool and a clone of the winning candidate is added to the population instead, in order to accelerate the convergence, until there is only one member left in the population. The authors point out that if we apply just a naïve method like that, we might end up with the best estimate of the optimal evaluation function within the population, but this may still be far from the optimal itself. In order to reach this objective, they present another way to replace the discarded candidates by a crossover evaluation function child of the two parents with weightings which depends upon the result of the game and the standard deviation of the parents’ parameters. This allows us to find a final function, toward which the population is converging, closer from the optimal evaluation function. If the initial repartition of the population was fair enough, we will hopefully find the optimal itself. The final evaluation function that comes up at the end of this algorithm is then tested successfully against a commercial chess program which proves the efficiency of the method.
What I liked in this article is the idea that a simple algorithm can lead to a great improvement of the chess player. Indeed both the original idea to optimize the evaluation function and the way to do it are at the same time simple and smart and the results are great. On the other hand, there are maybe some points that need more explanations like the fact that the members of the population compete only against some other members and not all of them. Maybe one kind of weighting might be efficient against another one but worse in average… At the same time this is only validate for three moves ahead, which is pretty low for computer chess and one can think that it could for example help the bishops against the knights since the latter needs more moves to be efficient.
Compare to the two other papers, this article is like a first step as it chooses the coefficients that will be used in the search algorithm. It is therefore needed if we want the search to be efficient and it must be done before trying to improve the search itself.
It will apply this method to my chess player because it leads straight forward to the optimal evaluation function for the search algorithm.

“New Advance in Alpha-Beta Searching”. Jonathan Scaeffer, Aske Plaat.

I chose this article because it describes the core of any computer chess player: the alpha-beta searching. Indeed most chess players use this pretty simple and brute algorithm. I found the paper interesting because it states how this algorithm works, what are its strengths and it weaknesses, and above all what can be done to improve it despite the years of researches already spent on it.
After recalling the importance of the Alpha-Beta algorithm in game tree search, the paper summarizes its evolution from the earlier version to the actual one. It points out four improvements that have already been done and then presents three new enhancements. The first four modifications that has been done is the use of transposition tables to reduce the complexity of the tree by recognizing previously visited nodes (cutoffs), and buy calculation time with memory resources. Then move ordering can maximize cutoffs effectiveness (using a transposition table) if we sort the moves at each nodes in a best-to-worst order instead of a depth order. The heuristic choice of the search window at the beginning of the search also helps a lot to eliminate many nodes that with a certain level of confidence shouldn’t lead to the best solution. And finally, we can save resources using a variable search depth that allocates more resources to the promising nodes and leaves aside the weaken ones. The paper then proposes three new enhancements to the current version of the Alpha-Beta search used for computer chess. The first idea is useful if the search algorithm ends in a fail low - the value of the first move is less than the search window - because of an unlucky guess of the search window. The authors propose to restart the algorithm using the transposition table to seed a new search which seems to improve the quality of the search even if it is hard to evaluate the real value of the improvement. The second idea helps to narrow the search window with a guessing of the values that we should obtain at the end of the algorithm so that the calculations time is a lot smaller. It uses the evaluation function score from previous iterations, according to the MTD algorithm, which improves the algorithm by 9%. Finally the last enhancement is an attempt to maximize the benefits of the transposition table by doing additional lookups after cutoffs. This is called the Enhanced Transposition Cutoffs and it leads to a reduction of the search tree by 28%. The results are illustrated with experiments comparing current computer chess player with specific one that present these new enhancements and it appears that with the three new ideas presented in this article we may be able to reduce the search effort in chess by 35%.
The strengths of this paper are all the tricks it gives to improve the Alpha-Beta search – both the old and the new ones – but at the same time this looks a little bit like a laundry list. Indeed, there is no link between the tricks and the theoretical justifications are often dark. They seem to present empirical process that the reader has to believe rather than theoretical analysis.
This paper is dealing with the main aspect of the chess player which is the search algorithm and it is the continuation of the first paper in a sense that it finds some tricks to improve the search algorithm using the evaluation function designed in the first paper.
In my project, the results presented in this paper will be very useful because it will allow me to improve the efficiency of the Alpha-Beta search in order to construct deeper trees with less time resources.

“Information-Theoretic Advisors in Invisible Chess”. Bud, Albrecht, Nicholson, Zukerman.

This article is a little bit different from the two first one because it won’t be useful directly for my project. However it gives a good idea on how to use chess as a test for decision making strategy in complicated real-life problems. I am really interested in chess player by themselves because it is fun to apply my computer science knowledge on a game, but at the same time I am also interested in creating a pseudo-AI able to play chess because it is the first step to develop more powerful AI for real-life robots. This paper is therefore really appropriated since it shows how we could use chess as a test for real-world strategic situations.
If games have some properties of real-world situations the main difference between chess and most of the real life problems is the level of uncertainty. Indeed, whereas real-life is full of uncertainty, chess are known to be one of the only games with no uncertainty at all. Therefore, in order to use chess as a test-bed for real-world problems, the article introduces the concept of invisible chess. This is basically a normal chess game except that each player has a certain number of pieces that his opponents can’t see. This is interesting because by choosing the number of pieces that are unknown, we can change the degree of uncertainty of the game – to match real-world situation – and as it is still chess we can keep a high strategic level. Since each player can’t see some pieces of the other player, if we were to use only normal chess trees to find the best move, there would be a combinatorial explosion. The authors propose to counterbalance the new strategic complexity of invisible chess by adding an Information Theoretic Advisor to the Strategic Advisor. Indeed the early results – tests using only the strategic advisor – tend to show that the player with more information about the game will win more often, which seems pretty obvious. We can therefore increase the chances of success by using a weighed combination of the two kinds of advisor. According to the results presented in the articles, the efficiency of the algorithm is better if we use a combination of the two advisors; however, if the weightings are too much in favor of one advisor the results become quickly worse.
Thus this paper proves that we can use chess to demonstrate results about more complicated problems closer to real-world situations. I liked this idea of using chess as a demonstration field for something (uncertainty in this article) that has nothing to do with this game, because it illustrates why chess has been such an important step in AI development. However, the paper misses some details about the strategic advisor they used and what could be the applications in real-world situations of these results. Moreover, I think that more experimental data on the advisors, with maybe more that only three ratios for the relative influence of the advisors could have brought interesting subtle differences in the results.
This paper is like an opening after the two first articles, showing that we can test with chess more that only strategic issues. This is the main reason why I chose it.
This paper won’t really help me in my project, but it could help me after my project if I want to use what I did in another context.


Part E:

During the last third of the class I would like to conceive a chess player applying improved search algorithms – based on my computer knowledge – and heuristic processes – based on my chess experience. I would also like to include machine learning to have the AI improving its level with games. This project could be done by a group of students designing the core of the AI together, but each developing his own version of the chess player so that we can have them play against each other at the end of the term to see the relative efficiency of the heuristic processes.

Shen Qu

Part A

Planning with observations/sensor data
Traditional planning assumes that we have complete knowledge of the world in which the plan runs in. However, this assumption is unrealistic for real world scenarios where parts of the environment may be unknown or uncertain. There are several approaches to dealing with planning which involves sensor data that that can only be obtained after the initiation of a plan: reactive planning, contingency planning, etc. But the one I’m more interested in is when a system interleaves planning and execution. This method requires the planner to make as complete a plan as possible prior to execution, and modifies incomplete parts of the plan as information becomes available. There are benefits and issues associated with this approach which will be covered more in section D.

Human Robot interactions
This is a complicated subject that involves various types of reasoning. The first thought that comes to mind when thinking of human robot interaction would be the method of communication between the 2 parties which could involve techniques such as image mapping, sound recognition, body language/identifications of gestures, etc. Another could be the functionality of the robot and how it may be useful to humans. However, a question always on the back of my mind is the optimal yet safe level and method of control that a human should have over a robot with autonomous capabilities. This question originates from the aircraft example of who gets final authority between human pilot and onboard computer system. And I’m not sure how much research was done on this topic within robotics, or whether it’s even an issue. But as robots get more advanced with faster and better reasoning algorithms, it is possible to face situations where a robot’s analysis of the state of a system is different from that of the human controlling or accompanying it. Then the questions becomes, if a human issues a command that conflicts with the robots internal reasoning system, should the robot be able to override it? If not, and a conflicting command is indeed issued, how would that affect the robot’s logic and reasoning, and how should the robot deal with it? Hopefully, I can explore this question a little more through this course and other opportunities.

Learning Algorithms
For me, rather than a topic of interest, machine learning can probably be more accurately described as a topic of fear. Through some basic knowledge, state-of-art algorithms, and recognition of patterns from real world experience, a system can build on its knowledge and effectively become “smarter.” Now, combine that with the possibility of a robot with authority or override human command (as described above). This is probably where all sci-fi nightmares began….


Part D

1. “An Analysis of Sensor-based task planning.” By Olawsky, Krebsbach, Gini 1995

This is an older paper (not published within the last 5 years). However, I am presenting it here because it gives a good, comprehensive overview of interleaving planning and execution along with major drawbacks and issues to consider. More recent papers, on the other hand, tend to dive deeply into specific methods and details and fail to address the big picture. (Or it is possible that more recent yet comprehensive papers exist, and I have simply failed to uncover them within the time available to research this topic.)

Traditionally, the planning process is completed before execution begins; but this paper describes the approach of incorporating sensor data collected at execution time into the plan when it is beneficial to do so. In the case where planner lacks sufficient information, it may either assume a default value or defer that portion of the plan to a later time when sensor data can be obtained. The former can increase the uncertainty in the plan; however, it does allow the planner to generate a more complete plan prior to execution. Deferring, on the other hand, can greatly reduce uncertainty but it does present problems of its own: 1) deferring may be prohibitively expensive, 2) early execution of actions without certain required information may interfere with goals not yet considered, and 3) data later collected may still be incorrect or uncertain.

Given these considerations the authors identified three key factors for consideration when selecting a strategy for making such a plan:
1. Goal ordering – what goals the planner should try to satisfy first, not necessarily the order of execution, but the order in which a particular portion of plan is completed. The key strategy here is to sense as much of the information as possible in the early parts of the plan.
2. When to sense – when to switch between planning and execution. There are various methods, but the authors choose to defer goals where further planning requires sensing and only begin execution when plans for all goals are either completed or deferred.
3. What to sense – what to sense and what to default. These decisions are made based on default reliability, sensor reliability, planning costs, execution costs, and the cost of human intervention if applicable.

The authors also provided two metrics for assessing the quality of a plan generated: execution cost, cost of actions during all phases of execution; success rate, percentage of problems where no action needs to be undone due to premature execution of a plan.

Given these factors of consideration and plan quality measurements, authors then proceed to dive into their test world call the Tool Box and to provide detailed cost and success analysis on tests performed in this world. The major flaw of this paper is that the Tool Box world is much more simplified than the real world, and their planner exploit domain depended properties of the Tool Box world not useful for a generic world; there are also various simplifying assumptions made throughout the analysis. Therefore, although the Tool Box does demonstrate the functionalities of interleaving planning and execution, it does not serve as convincing evidence that this method would be feasible for more general situations.

Finally, the paper concludes on a particularly helpful note by briefly covering other methods of dealing with missing information such as reactive and contingency planning, emphasizing the difference between uncertainty and missing information, and defining open world and close world assumptions used.

2. “Interleaving Execution and Planning via Symbolic Model Checking.” Bertoli, Cimatti, Traverso. 2003

This is practically the only recently presented paper I found that considers interleaving execution and planning. Most of the other research in the area seems to focus on dealing with uncertainty, conformant planning, and how to reduce the large, impractical conditional tree plans so that one may hope to generate the entire plan off-line, prior to execution. But although this paper focuses on interleaving planning and execution, the method presented is very different than that described in the previous one.

Like the title suggests this paper presents an interleaving algorithm that uses symbolic model checking to plan over nondeterministic, partially observable domains. In order to achieve this, it uses a state-of-the-art off-line conditional plan generator between executions, and uses observations made during executions to further propagate the plan. An oversimplified version of the algorithm follows:

1. Start with a given belief state
2. Generate a partial conditional plan up to the next observation point. (A conditional plan is a tree structured plan which branches over non-deterministic domains and noisy sensor inputs. Each branch in the tree transverses a series of belief states)
3. The system executes the plan and is guaranteed to follow one of the branches in the partial conditional plan.
4. The system takes in the new current belief state and generates another conditional plan. 5. Algorithm continues until termination.

One key point about this algorithm is that it is guaranteed to terminate because there is a condition in the algorithm which states that every run must transverse a belief state that have not been previously visited. However, the major weakness of this algorithm is that it does not guarantee a plan which reaches the goal state. This is to say, one of the termination points could be a state from where the goal cannot be reached; so the algorithm is guaranteed to terminate at the goal state only within safely explorable domains, where the algorithm cannot get trapped in a state where no strong path to the goal exist.


3. “Complexity of Planning with Partial Observability.” Rintanen. 2003

Perhaps the greatest weakness of this paper (for the purpose of this assignment) is it contains nothing directly applicable to the design of a planner or my potential final project for this class. While the first 2 papers discussed above either present an algorithm or a planner capable of performing planning interleaved with execution, this third paper is much more theoretical in nature and focuses on the complexity of the planning problem. However, I feel that the complexity bound is what gives us the bottom-line evaluation metric for the difficulty of a problem. Thus, although not suited for practical use, the material presented in this paper is no less important than the other 2 and thus deserves to be reviewed.

The main finding presented in this paper is, “for non-probabilistic (success probability 1) partially observable planning the plan existence problem is 2-EXP-complete.” The rest of the paper is mostly spent on proving this finding as well as providing more direct proves for the EXP-hardness of conditional planning with full observability and the EXPSPACE-hardness of conditional planning without observability. This paper also pointed out that, “plan existence for classical planning (deterministic, one initial state) is [only] PSPACE-complete.” At least for me, this really put into scope just how difficult planning for real or realistic scenarios is and the need for better, faster algorithms if we wish to accomplish reasonable planning outside the controlled lab environment.

The paper also leaves open the question of complexity of plans with success probability less than 1 and the difficulty level of optimal planning.

Part E

The somewhat obvious project that derives from parts B and C would be to implement a planner which interleaves planning and execution and uses observations to revise or complete plans. The exact type of planner and algorithm that I would use is still unknown at this point. And depending on its complexity, other features can be added or removed, such as algorithm to identify target (or goals), a simulation (via on screen display) where the plan generated is actually carried out, etc. Sensory data can be taken from a simple piece of hardware linked to the planner or simulated using keyboard inputs or even data files.

The likelihood for me to pursue this project depends largely on what I learn about this type of planning through the course of the semester. I may also be swayed by what interesting projects other people came up with. (I would very much like to do something involving human interaction with robot!) And the direction of my research will also play a role in project selection.

 

Tom Temple


Part A


1. Reinforcement Learning
I am interested in reinforcement learning primarily because I think it is one
of the better models for how humans act. Take for instance the problem
of the inverse kinematics of a robot. With very good measurements of the
system, they can be painstakingly computed. But if you let a person play
with the controls for a while, they can make the robot do what they want,
without measurement or calculation. Furthermore, the human control is
more robust.
2. Exploration vs Exploitation
I think this is a fascinating problem. I am regularly faced with the decision
of how to drive from Cambridge to Lexington. Let me tell you that this is
a diÆcult problem. I have some estimates of the means and variances of
the time certain routes will take. But I don't really have much data on the
true distributions especially when conditioned on the other information I
have at my disposal in the morning. Luckily, topology and speed limits
bound the number of possible paths to a manageble number. Even so,
when rt 2 is a parking lot, it brings into play new potential routes for
which I naturally have less information.
3. Dynamic Baysian Networks
I nd DBN fascinating because they are a powerful tool for representing
temporal information. They are what I would consider the usable version
of the HMM. As fascinating and useful as the Kalman lter is, its restric-
tions to gaussian distributions is potentially over-limiting. Lets say, for
instance, that I have this sensor that is perfectly correct except for once
in a while just it says 7 regardless. An astute person will quickly learn to
trust the 7's far less than the 8's. I think that a computer should be able
to do so also. The DBN is able to capture a much larger swath of the real
world. I am excited to learn more about making optimal decisions with
under more general probability distributions.


Part D


Inverted Helicopter
Inverted autonomous helicopter via reinforcement learning, An-
drew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben
Tse, Eric Berger and Eric Liang. In International Symposium on Experimental
Robotics, 2004

I selected this paper since there was a picture of a helicopter in AIMA whose
controller was developed by Andrew Ng. On his webpage he had a number of
interesting articles. This was among the most relevant.
The important feature of this paper was that reinforcement learning was
able to hover an autonomous helicopter inverted. The methodology consisted of
having a human pilot the craft inverted and collecting data. Then they used
this data to construct a model. They ran simulations using this model in order
to nd a policy that could the helicopter inverted. Finally, they successfully
implemented the controller on the real helicopter.
A feature of the simulations was that each potential policy was tested on
the same set of pre-determined random trials. That prevented variation in the
trials to unduly a ect which policies were considered the best. He described
this previously in (Pegasus. Ng and Jordan 2000). The important idea is that
one can eliminate the non-determinism of an MDP by generating a set of ran-
dom trial outcomes and evaluating each policy on the same trials, treating the
transitions as deterministic. As long as the trials are representative, the results
will be representative of the true, non-deterministic case.
While this was a very exciting result, the methodology was nothing new.
Furthermore, the reward function had a number of magic parameters that are
not described and I can only suppose they had to be hand tuned. I would also
like to limit the amount of learning that is done in simulation. I think that if
the controller starts o very slowly (and with a little guidance), it could do all
of the learning online without any crashes. People manage to do precisely that.
Relational Reinforcement Learning
Relational Reinforcement Learning. Saso Dzeroski. 2003
I found this paper in a citation by Forbes who was cited by AIMA.
This paper presents the RRL algorithm in which the Q-function is approx-
imately represented by a regression tree. This regression tree is maintained so
that similar examples are combined (or one is rejected) and suÆciently di er-
ent examples yield an additional branching of the tree. Thus, a size constraint
can be maintained. In cases where states are best described relationally, this
data structure is intuitive and compact. For instance, it was e ective in nding
optimal solutions to the block stacking problem. The tree is populated by gen-
eralizing a set of manually provided optimal solutions.
A problem with this algorithm is that it scales poorly. The number of po-
tential relationships grows very quickly. The algorithm over-generalized when
it received too many training cases and was pretty bad at tetris. More im-
portantly to the task in question, since the relationships were propositional, it
doesnt generalize into the continuous case very well.
E ective Reinforcement Learning
William D. Smart and Leslie Pack Kaelbling, "E ective Reinforcement Learning
for Mobile Robots," International Conference on Robotics and Automation, May
11-15, 2002.

I found this one on LPK's webpage because I had her as a professor and knew
she was into this stu .
This paper deals with the problem of Q-learning with sparse rewards. Since
the initial distribution of rewards is so the agent ends up having to pick
actions arbitrarily which means it might take a very long time for the agent to
nd a good policy. She xes the problem manually much like in Dzeroski, with
manual information. In one case, she provides a simple starting learning policy
that ensures that the goal is found. This allows the Q-function to start being
lled in. Then she drives the agent manually towards the goal. In this case,
unlike in Dzeroski, it doesnt matter if this initial input is optimal or not, it is
only to aid out the Q-values. In all three papers, a human operator is
used to hone in on the region of the state space that is of interest.
I thought this paper was fantastic. It was interesting and well written. I
guess if I had to complain about something it would be that there is no discussion
of choosing the discount factor or learning rate. Also, some details of how the
Q-values are updated are not speci ed.


Part E


I propose implementing a controller for the Quanser helicopter that learns how
to on its own and does not rely on prior knowledge of the system model. I
plan on using Q-learning or policy search. I would start by providing a dense
reward surface for hovering and let the controller determine the control. Then
I will try to extend that to general trajectory following. If that goes well, I
will try a much sparser set of rewards. If possible, I would like to have it learn
from a human operator like in the Kaebling paper. I estimate the probability
of persuing this project at 3/4.

Michael Terry

A. Topics of Fascination


The first topic I would like to explore is probabilistic reasoning with Bayesian nets. I see that reasoning under situations of uncertainty is a very important area of AI, and it is apparent that these techniques are the standard approach for effectively modeling this uncertainty. I am particularly interested in learning how to model complex relationships between variables in spite of redundant links in Bayesian Nets. Also, I would like to study to the automatic generation of Bayesian Nets from large training set.
Secondly, I would like to investigate reasoning about opponents using game theory and Minimax search. Beyond the typical examples using a single opponent (ie chess), I would like to explore further the idea of reasoning in a world with potentially more than one cooperating and/or adversarial agent.
Finally, I would like to explore reasoning using first order logic. As described in Russell and Norvig, these concepts and methods seem to be very refined for basic objects, their properties, and the relationships between them. This is very useful for implementation of these properties in object-oriented data structures in any artificial intelligence application.

D. Multi-Agent Game Theory and Minimax: Further Investigation


i.) Approximating Game-Theoretic Optimal Strategies in Full-scale Poker by Billings et. Al.
This paper’s main contribution is a model for reducing the search space of Poker minimax through the use of “bucketing”. This concept is very basic, and is used by most human players. Rather than describing a given situation using every detail, situation can be grouped into categories. You will often hear poker players describe these situations, such as “I had top pair top kicker”, or “She turned a set on me”. These situations frequently have their own notations (Top Pair Top Kicker = TPTK) in poker literature. I find that Billings et Al. have performed an appropriate grouping of situations based on a particular poker expert/writer, David Sklansky’s rank of hands. However, they have oversimplified the problem to poker involving only one opponent. In real poker, you are very rarely up against one opponent, and so in some sense they’ve cheated by reducing the minimax search space beyond what is useful. I intend to use their bucketing system for my search without the two-agent reduction.
ii.) On Pruning Techniques for Multi-Player Games by Sturtevant and Korf.
The major contribution of this paper is that it shows that minimax search against multiple opponents, known as MaxN, has very limited capacity to be pruned via alpha-beta and branch-and-bound pruning. Using examples from the games of Sergeant Major and Hearts, they have shown a way to turn multiplayer minimax search into a “paranoid” two-player situation, in which every opponent has formed a coalition against you. This is an oversimplification, probably induced by their bias toward the game of hearts. In most games, each opponent is usually out for each other as much as they are out for you. In hearts, however, when you are attempting to shoot the moon, at some point your opponents may form an explicit coalition against you. This situation is exclusive to hearts, and I cannot think of another game where this is relevant.
I think their point regarding the limited utility of pruning in Multi-Player minimax is valid. However, I do not necessarily agree with the usefulness of transforming a MaxN tree into a simple two-player model. Also, in their conclusion, they allude to the fact that the incorporation of domain knowledge is key to reducing the search space. I will work to reduce this multi-player search space by incorporating deterministic and probabilistic knowledge about the game in my robot’s strategies.

iii.) Deep Blue by Campbell et. Al.
This paper describes and evaluates some of the design decisions regarding the design of Deep Blue. I believe the major contribution of the design of this system was not necessarily the intelligence (or lack thereof) behind its reasoning, but with the positive press and attention this brought to the AI community.
I have mixed feelings regarding Deep Blue as an accomplishment in AI. In some sense, the fact that this machine beat a world-class player is an accomplishment in computing, but not necessarily intelligence. I liken this to a situation where our calculators reduce complex mathematical formulas in microseconds. Although this is impressive, most people do not consider this a marvel of intelligence.
The common understanding is that Deep Blue was a brute force attack on the search space. As we begin to address more complex information spaces with search, it becomes increasingly important to incorporate knowledge from the particular domain to reduce that space. This is essentially the theme that I introduced in the evaluation of the previous paper, and I hope to incorporate this notion into my implementation of strategy for games.

E. Simple Project: Win Small Stakes Holdem


Although the challenges of playing poker in a physical world are great, the most difficult task for my robot is actually determining a winning strategy. I would like to focus on this particular task, and the multiple levels of reasoning. In particular, my goal is to be able to win Small-Stakes/Low-Limit Texas Holdem. Although this game is characterized by poor opposition making numerous mistakes, it is by no means an easy game. However, I am optimistic given the body of literature and availability of large sets of training data that this can be done in the last third of the course. The likelihood of me choosing this project is very high.

References

[1] Billings, D., Burch, N., Davidson, A. Holte R., Schaeffer, J., Schauenberg, T., & Szafron, D. (2003). . Approximating Game-Theoretic Optimal Strategies in Full-scale Poker. In Proceedings of the 2003 International Joint Conference on Artificial Intelligence.
[2] Campbell, M., Hoane, A. J., & Hsu, F. (2002). Deep Blue. Artificial Intelligence (134), pp. 57 – 83.

[3] Russell, S. & Norvig, P. (2003). Artificial Intelligence A Modern Approach. Pearson Education International.
[4] Sturtevant, N. R., & Korf, R. E. (2004). On Pruning Techniques for Multi-Player Games. Computer Science Department, University of California, Los Angeles.