16.412J/6.834J Cognitive Robotics, Spring 2004
Problem
Set #1 Responses
Home | Lectures| Readings | Project Suggested Readings | Advanced Lectures | Final Projects
Student |
Response |
Brian Bairstow |
|
Lawrence Bush |
|
Thomas Coffee |
|
Shuonan Dong |
|
Justin Fox |
|
Ethan Howe |
|
Kaijen Hsiao |
|
Tony Jiminez |
|
Axel Kilian |
|
Henry Lefebvre de Plinval-Salgues |
|
James Lenfestey |
|
Brian Mihok |
|
Jason Miller |
|
Emmanuel Munguia-Tapia |
|
Jennifer Novosad |
|
Jeremie Pouly |
|
Shen Qu |
|
Tom Temple |
|
Michael Terry |
Brian Bairstow
Cooperative Robotics
In cooperative robotics, the group of robots have the same goals, and thus
it is most efficient if they work together to achieve those goals. They can
simultaneously work on different aspects of the goals, or can combine their
efforts to do tasks that aren’t possible individually. If the problem is simple,
or if enough is known about the situation, then the individuals can deduce what
the other robots will do and thus what they need to do. However, this is often
not the case and then coordination is required.
Machine Learning
Machine learning is essentially the recognition and extrapolation of patterns. This can be important to robotics, since the “artificial intelligence” of current robots is typically a set of commands for various circumstances. However, with elegant enough algorithms for machine learning, one can imagine a machine that can eventually teach itself unexpected things. This capacity to learn could develop an “artificial intelligence” which is more meaningful than an array of specifically instructed behaviors.
Neural Networks
Neural networks consist of nodes which pass data via links to other nodes.
They are good at learning because the weightings of the links can be altered
to adjust the output of the neural network. Thus the network can be trained
on data until the outputs match the expected outputs. One use for this is to
make parametric approximations of complicated models, so that a neural network
can quickly come up with low fidelity results.
Task Directed Imaging
in Unstructured Environments by Cooperating Robots
Vivek A. Sujan
This paper covers visual sensing strategies for robots working on a task.
The issue is to place sensors in order to see certain targets. Depending on
the degree of sensor articulation, there is more or less leeway. With multiple
robots sharing information, the problem of optimizing camera placements is more
interesting. Sensor fusion can be used to combine the data, and areas which
aren’t visible from one robot could be viewed with the other robot. A goal is
to create a 3D geometric model by using multiple sensors.
The paper goes on to describe an algorithm to optimize camera pose while considering
factors such as depth of field, target resolution, and visibility. A simulation
was run on the algorithm.
This paper brings up a useful topic for rovers working together in the same
area. One consideration in planning of cooperative robots is combining their
sensor data in useful ways. For instance, when traversing difficult terrain
it can be useful for a stationary rover to observe a moving rover to make sure
everything goes well. This paper is not applicable for programmatic planning
such as how to assign rovers to different geographic areas, except that it suggests
that rovers can work well together.
Using
Cooperative Robots for Explosive Ordinance Disposal
James McLurkin
This paper discussed techniques for using a number of cooperative robots for
unexploded ordinance removal. Typically this will take place in an unconstrained
and unknown environment, which is the case with Mars exploration. Ordinance
removal is one of the most applicable areas for autonomous cooperative robots.
Due to the large number of mines in a minefield and the desire to find them
quickly, multiple robots would be used, and thus cooperation is useful to make
them efficient. This paper explored using a “behavior-based approach to form
a structured community from the local interactions of simple individuals.” These
behaviors included clustering and dispersal to produce a swarm with the desired
density of robots. Experiments were done with micro-robots to analyze the applicability
of the behaviors.
This is applicable to Mars exploration in that it discusses how to cover ground
with multiple robots. It is a good examination of cooperative techniques. The
strategies used are probably too simple for Mars exploration, and they promote
redundancy to ensure that mines are found. In Mars exploration redundancy is
inefficient, because though you might find an interesting objective you would
have missed, you would be better off covering brand new territory.
This paper relates to a different scope than the imaging paper, in that this
is related to how to move the rovers relative to each other in order to find
an objective, while the imaging paper was more interesting in placing the rovers
relative to each other after the objective is found.
Scene
Understanding using Cooperative Robotics
Declan O’Beirne, Michael Schukat
This paper is about object recognition combined with cooperative mapping.
The goal is for a team of robots to map a complex environment using the objects
within as landmarks. The benefits of cooperation in mapping include concurrency,
reliability, and accuracy. Greater accuracy comes from multiple robots viewing
the same object, which should provide a more accurate estimate of position and
ameliorate sensor errors. Also multiple robots can view an object from different
angles to get better 3D data. This 3D data can be used to create 2D images for
object recognition.
This topic is similar to the first Imaging paper in that it concerns relative
positions of robots in observing and object. This paper is more focused on mapping
using visual object recognition. Like the Imaging paper, it suggests advantages
for performing exploration with multiple cooperative rovers. However, positioning
rovers for Martian surface exploration could be done differently with techniques
such as GPS.
I think an interesting topic would involve autonomous cooperative robots for
unexploded ordinance removal. Utility would be provided by quickly and redundantly
searching area. Number of robots and strategy used would be variables. Modeling
the environment and the robots themselves might be a lot of work even before
strategies could be implemented.
The world could be modeled as a grid, and the robots could move around up, down,
left, and right. Exploring a space once could be worth 2 points, while exploring
it a second time could be worth 1 additional point. The project could explore
different strategies to try to maximize the score achieved in a minimum number
of moves. The world should be extremely simple at first, because even that could
take a lot of coding, and it is not the interesting part of the project.
This is a project I would be interested in pursuing.
In this section I will discuss the following three topics of interest in the
area of reasoning applied to robotics:
Collaborative Task Execution
Resource Allocation
Learning
Collaborative Task Execution and Resource Allocation
Collaborative Task Execution and Resource Allocation are closely related. Therefore,
I will address them
together. The objective of collaboration is to produce a higher quality result
than could be achieved alone.
The objective of resource allocation is to increase efficiency or lower costs.
These two concepts embody
the quality – cost tradeoff of many operations research optimization problems.
The central idea is that teams or colonies of robots would need to be able to
self organize in a way that
allocates them in an efficient manner. The tasks that need to be done would
change as time goes on.
For a given task, the robots need to decide who does what. For example, in a
search and rescue operation,
the robots need to spread out over an area to search for the missing person.
The way in which the robots are deployed will significantly affect the success
of the mission. Suppose that
prior knowledge indicates that the missing person is more likely to be in one
location over another. Also,
certain areas are easier (lower cost) to search than others. Consequently, if
each robot operates
independently, they may all choose the area that is easiest to search and has
the highest probability of
success while neglecting other areas. A more optimal solution can be found.
Therefore, another way to
collaborate is desirable for this application.
The robot allocation could be centrally planned and modeled as a Markov Decision
Process (MDP) or
constraint satisfaction problem or it could be constructed using a market-based
approach. These are very
interesting approaches and will be discussed more in Part D.
Learning
Learning is a very broad topic in artificial intelligence. The index of Artificial
Intelligence, A Modern
Approach 2nd Edition, by Stuart Russell and Peter Norvig (AIMA) lists over 100
pages in the book that
address the topic of learning. The broad reaching nature of the topic is captured
in titles like “universal
reinforcement learner.” While that goal seems intractable, robots still need
to be able to perceive reason
and act in new and changing environments.
A variety of learning methods have been studied for many applications. Traditional
learning algorithms
have a narrow application scope. These approaches are very interesting, important
and challenging.
However, robots need to learn from experience in a more general sense. They
need to learn to identify
what is important, internalize this into their knowledge base and use it to
make decisions in the future.
Two approaches to this problem are recurrent neural networks (http://www.idsia.ch/~juergen/rnn.html)
and
optimal ordered problem solver (http://www.idsia.ch/~juergen/oops.html). The
referenced web site is flat
out fascinating. With that said, I find this reasoning area to be overly ambitious.
My approach to a
problem this big is to nibble away at the edges. In short, I intend to choose
a problem that is more specific
and narrower in scope.
A Summary of Robotics
This section provides an overall view of robotics. I wrote this section in order
to become familiar with the
different types of robots, sensors, reasoning techniques and applications in
the field. This section
essentially summarizes Chapter 25 of AIMA.
This section is organized in the following order:
• Sensors
• Reasoning
• Types of Robots
• Applications
A robot is a feedback and control system that involves sensing and reasoning
about its environments, which
leads to a decision on how to act. The first three parts of this section are
organized in the order in which
they are used in a feedback and control system. The final part covers how robots
can be applied.
Sensors
There are many kinds of sensors that a robot can use [1] to sense its environment:
Range Finders
Laser Range Finder
Sonar
Global Positioning System
Tactile Sensors (whiskers)
Imaging Sensors
Camera
Infrared
Other
Temperature
Odor
Acoustic
Velocity
Light Intensity
The sensors that are chosen for the particular application become part of the
feedback loop described
above. This feedback loop is sometimes called perception. Perception is essentially
a filtering task, which
involves sensing, updating the internal belief state and acting. For example,
the important and well-studied
localization problem involves sensing the robot’s location (i.e. using a range
finder), updating the robot’s
estimated location (i.e. Kalman Filter) and moving in the direction that leads
to the goal.
Proprioceptive Sensors are another class of sensors that measure the internal
state of the robot. Some
examples of these are:
• Inertial (gyroscope)
• Force / torque sensors
• Odometer
While these measure the internal state of the robot, they can also be used to
improve the robot’s
interpretation of its environment. For example, an inertial sensor will reduce
the position uncertainty in a
localization task.
Reasoning
After sensing the environment, a robot must update its internal state and make
a decision about how to act.
This reasoning step can have a variety of objectives, for example:
• Localization
• Mapping
• Task Planning
• Resource Allocation
• Generic Search
• Learning (in general)
Acting (Types of Robots)
The manifestation of the acting step depends on the type of robot and how it
can affect its environment.
Robots can be grouped into 3 types: manipulators, mobile and hybrid robots.
Manipulators are able to
change their environment through grasping, pushing, lifting etcetera. An assembly
line robot is a popular
example of this type of robot.
A mobile robot is able to move about via wheels or legs. Some examples of mobile
robots are unmanned
land vehicle, unmanned air vehicle, autonomous underwater vehicles, and planetary
rovers.
A hybrid robot is a mobile robot equipped with manipulators, for example a humanoid
robot.
Applications
A robot’s ability to sense, reason and interact with its environment is application
dependent. Robots are
used in industry, exploration, health care and personal services. The following
are some possible
applications:
• Industry
o Mining
o Agriculture
o Material Handling
o Welding
o Painting
o Harvest
o Excavate Earth
• Transportation
o Of goods between warehouses
o Retrieving items from a warehouse
• Exploration
o 3-D map of a mine
o Mars
o Robotic arms assist astronaut in deploying and retrieving satellites
o Military “drones”
• Health care
o Assist surgeons with instrument placement (heart, eyes, brain)
o Aides for the elderly (robotic walkers, pill reminders)
• Personal Services (perform daily tasks)
o Vacuum cleaners
o Lawn mowers
o Golf caddies
o Snow removal
• Public Safety
o Search and Rescue
o Fire fighting
• Entertainment
o Robot dog
o Robotic soccer
• Human Augmentation
o Walking machines
o Robotic teleoperation (hepatic interface)
The above section provides a survey of the relevant areas of Robotics. This
section serves to identify
where the current research areas are. In order to do this, I surveyed the research
accepted at the
Proceedings of the Fourth International Cognitive Robotics Workshop, Valencia,
Spain, August 23-24,
2004. I took the approach of summarizing every paper at the conference in order
to find out what is
important.
I review each paper in a necessarily short format, which includes a summary
of the main components of the
paper and a list of the application, type of reasoning and techniques used to
solve the problem.
Paper: Specifying Failure and Progress Conditions in a Behavior-Based
Robot Programming System,
Authors: F. Kabanza, K. B. Lamine
Uses Linear Temporal Logic to monitor the progress of a behavior-based robotic
system toward a global
goal.
Application: Navigation
Form of Reasoning: Progress Monitoring
Techniques: Linear Temporal Logic
Paper: Robel: Synthesizing and Controlling Complex Robust Robot Behaviors,
Authors: B. Morisset, G. Infante, M. Ghallab, F. Ingrand
Uses a Markov Decision Process to model / learn the relationship between the
supervision state and the
proper action. The objective is a high level task, thus the actions are modeled
using a Hierarchical Task
Network.
Application: Navigation
Form of Reasoning: Progress Monitoring and Policy Learning
Techniques: MDP
Paper: Patterns in Reactive Programs,
Author: P. Haslum
Discusses analogous software design patterns and reactive task procedures.
Application: Unmanned Air Vehicle Navigation and target tracking
Form of Reasoning: Navigation and Tracking
Techniques: Object Oriented Programming and Design
Paper: Decision Theoretic Planning for Playing Table Soccer,
Authors: M. Tacke, T. Weigel, B. Nebel
Maximum expected utility action selection (decision theoretic planning) using
forward simulation.
Application: Robotic Table Soccer (Foosball)
Form of Reasoning: Planning
Techniques: Forward Simulation
Paper: On-line Decision Theoretic Golog for Unpredictable Domains,
Authors: A. Ferrein, C. Fritz, G. Lakemeyer
This paper integrates a decision theoretic planner with real-time updatable
location uncertainty.
Application: Robotic Soccer (RoboCup)
Form of Reasoning: Planning
Techniques: Decision Theoretic Planning
Paper: Learning Partially Observable Action Models,
Author: E. Amir
This paper presents principles and algorithms for learning and filtering a logical
action model.
Application: Turning on a light.
Form of Reasoning: Learning
Techniques: MDP, POMDP, HMM, EM, STRIPS
Paper: Building Polygonal Maps from Laser Range Data,
Authors: J. Latecki, R. Lakaemper, X. Sun, D. Wolter
This paper presents a heavily feature based localization and mapping approach
(to the exclusion of
odometer data).
Application: Generic Navigation
Form of Reasoning: Localization and Mapping
Techniques: Shape Based Object Recognition
Paper: Hierarchical Voronoi-based Route Graph Representations for Planning,
Spatial
Reasoning and Communication,
Authors: J. O. Wallgr¨un
This paper presents a new graph based path planning approach.
Application: Office Robot
Form of Reasoning: Path Planning
Techniques: Spatial Representation, Graph Model
Paper: Schematized Maps for Robot Guidance,
Authors: D. Wolter, K-F Richter
This paper presents a multi-robot central localization, mapping and path planning
system.
Application: Service Robots
Form of Reasoning: Navigation
Techniques: Shape Matching, Multi-robot learning
Paper: How can I, robot, pick up that object with my hand,
Authors: A. Morales, P.J. Sanz, A.P. del Pobil
This paper presents vision based feature selection learning for reliable grasp
comparison and prediction.
Application: Service robots
Form of Reasoning: Grasping
Techniques: Nearest Neighbor Classification
Paper: On Ability to Automatically Execute Agent Programs with Sensing,
Authors: S. Sardina, G. DeGiacomo, Y. Lesperance, H. Levesque
This paper addresses the inconsistency of real time sensing uncertainty into
an entailment and consistencybased
paradigm.
Application: Lumberjacking
Form of Reasoning: Planning
Techniques: Situation Calculus (Golog)
Paper: Have Another Look: On Failures and Recovery in Perceptual Anchoring,
Authors: M. Broxvall, S. Coradeschi, L. Karlsson, A. Saotti
This paper addresses perceptual error recognition and recovery.
Application: Generic Navigation
Form of Reasoning: Knowledge Based Planning, Recovery from Perceptual Errors
Techniques: Anchoring
Paper: Flexible Interval Planning in Concurrent Temporal Golog,
Authors: A. Finzi, F. Pirrri
This paper discusses resource allocation and the exploration / exploitation
tradeoff.
Application: Search and Rescue team
Form of Reasoning: Multi-Agent Planning and Scheduling of Localization and Mapping
Techniques: Situation Calculus (Golog)
Paper: Imitation and Social Learning for Synthetic Characters,
Authors: D. Buchsbaum, B. Blumberg, C. Breazeaal
This paper discusses a method to identify actions and goals through imitation.
Application: Watching Cartoons
Form of Reasoning: Action and Goal Identification, Imitation Learning
Techniques: Simulation theory
Paper: On Reasoning and Planning in Real-Time: An LDS-Based Approach,
Authors: M. Asker, J. Malec
This paper addresses the complexity of Labeled Deduction Systems for planning.
Application: Theoretical
Form of Reasoning: Planning
Techniques: Labeled Deductive Systems,
Paper: Exploiting Qualitative Spatial Neighborhoods in the Situation
Calculus,
Authors: F. Dylla, R. Moratz
This paper integrates a spatial theory into situation calculus (Golog) for robot
navigation.
Application: General robot navigation.
Form of Reasoning: Localization, Mapping and Navigation
Techniques: Dipole Calculus, Line Segment Calculus, and Situation Calculus
In this section I discuss three papers in the area of multi-agent planning.
The three papers are applicable to
the resource allocation reasoning capability described in Part B. Specifically,
the application is a robotic
search and rescue team. This team would be capable of collaboratively searching
a pre-defined area. The
search plan would need to balance multiple priorities, robot abilities and initial
conditions. The team would
be connected via wireless communication. One of the reasoning capabilities that
this would require is
resource allocation.
The three papers each provide a different perspective on how to solve the multi-agent
planning / resource
allocation problem in a distributed system. I selected these three papers primarily
because they represent
the 3 main schools of thought in this area, which are:
• Distributed Markov Decision Process
• Distributed Constraint Satisfaction
• Market Based Framework
Title: Using Cooperative Mediation to Solve Distributed Constraint Satisfaction
Problems
Authors: Mailler,Roger; and Lesser,Victor
Publication: Proceedings of Third International Joint Conference on Autonomous
Agents and
MultiAgent Systems (AAMAS 2004)
Why I selected this paper:
I selected this paper because it is relevant to the problem (distributed resource
allocation), it used one of the
three primary ways of addressing the problem, it was in a respected publication
(AAMAS 2004) and I was
impressed by one of it’s authors, Victor Lesser, who had a mind-boggling 7 papers
accepted to the same
conference.
Major contribution:
This paper presents a new algorithm (Asynchronous Partial Overlay) to solve
the distributed constraint
satisfaction problem. The major contribution involves a mediator agent, who
takes over the process
(effectively centralizing it) when the distributed constraint satisfaction (DCSP)
fails to solve the problem.
Depending on the situation, the mediator agent may centralize only a portion
of the graph.
Computer simulations show that the new algorithm is faster than the current
best algorithm (Asynchronous
Weak-Commitment). A proof shows that the distributed portion knows if it is
locally unsolvable, the localcentralized
protocol is dead-lock free and the worst case fully centralized solution will
always find a
solution if one exists (or terminate if a solution does not exist).
Strengths:
The strengths of this presentation are the inclusion of algorithm pseudo code,
the clearness of the example
problem, and the combined use of simulation and mathematical proofs.
Weaknesses:
I found no notable weaknesses.
Title: Hoplites: A Market-Based Framework for Planned Tight Coordination
in
Multi-robot Teams
Authors: N. Kalra, D. Ferguson, and A. Stentz
Publication: Proceedings of the International Conference on Robotics and Automation,
April, 2005.
Why I selected this paper:
I selected this paper because it is relevant to the problem (coordination in
multi-robot teams), it used one of
the three primary ways of addressing the problem (market based framework), it
is a fascinating subject and
has a cool name.
Major contribution:
The major contribution of this paper is a new market based coordination method
that improved (over
MVERT, P-MVERT and PC-MVERT) significantly the quality of the results.
With that said, the way in which the auction based coordination was orchestrated
bears some explanation.
In most market-based resource allocation methods, each robot earns points, which
measure how much the
robot’s action moved the team toward its goal. Each task also has a team resource
cost. The robots can
trade tasks though auctions and negotiations in order to get the most profitable
tasks.
In the Hoplites method there are 2 modes, active and passive. In the passive
mode, the agents formulate
and trade plans, which inform each other of their intentions. This allows team
members to make informed
decisions. In the active mode, teammates publish plans that require participation
to work. They them pay
for that participation. These active plans do not always work, however, they
are initiated when tight
coordination is required to achieve for a more optimal solution.
Strengths:
The new algorithm improved significantly the quality of the coordination results.
I was particularly
impressed that the made an effort to optimize the other algorithms to the application
before comparing
them. The algorithm was also demonstrated in real robots, which is even more
interesting.
Weaknesses:
I found no notable weaknesses.
Title: Multi-agent Planning with Factored MDPs
Authors: Carlos Guestrin, Daphne Koller, and Ronald Parr
Publication: Advances in Neural Information Processing Systems NIPS-14, 2001
Invited Talk: Proceedings of Third International Joint Conference on Autonomous
Agents and Multi-
Agent Systems (AAMAS 2004)
Why I selected this paper:
I selected this paper because it is relevant (multi-agent planning) and it was
by an author (Daphne Koller)
whose book on Bayesian Networks I have read and liked. The paper was also an
invited talk in a respected
conference (Proceedings of Third International Joint Conference on Autonomous
Agents and Multi-Agent
Systems), which gave it added credibility.
Major contribution:
This paper explains a new way to decompose the value function of an MDP in order
to make the algorithm
tractable in a distributed framework. This is an approximate method, which achieves
close to the
theoretically optimal results.
Essentially, you can use an MDP to find the optimal policy for a resource allocation
problem. However, to
do this in a distributed sense, each agent needs to know its portion of the
overall system reward, in order for
it to make a decision. This decomposition method provides new way to figure
out each agent’s portion.
Strengths:
The author explains some concepts well. The idea is very interesting.
Weaknesses:
This author explains things very well in her book. This algorithm is really
interesting, however, it requires
more than 7 pages to explain precisely.
Application: Search and rescue mission.
Reasoning: Collaborative Resource Allocation
Motivation
Resource allocation is an important form of reasoning because it is a step toward
high-level reasoning.
Essentially, resource allocation is deciding what to do next in the context
of a society. The robots are
deciding what they can do that will be of greatest benefit to the society as
a whole. Regardless of whether
this uses a cooperative of competitive motivation mechanism, the robots want
to do the task that is most
beneficial. For this project, I propose a decision domain that does not require
a complicated knowledge
base, which will allow us to do research in high-level decision-making.
Setup
My project involves the simulation of a simulated robotic search and rescue
team. The team will consist of
ground moving wheeled robots. The search area will be represented as a grid.
The team will
collaboratively search the pre-defined area. The search plan would need to balance
differing robot abilities
and initial conditions. The team would be connected via wireless communication.
The reasoning
capabilities that this would require are collaborative learning, resource allocation.
Algorithms
I have reviewed 3 different types of algorithms for solving this type of problem.
Each of the algorithms is
better than the ones that it was compared to. My proposal is to implement all
3 and compare them to each
other, in our problem domain. We will then use the best of the 3 algorithms
as our new starting point and
improve upon it.
Note: The MDP algorithm cannot be implemented directly from the description
in the paper. I would need
to get clarification from the author on the details of the algorithm.
The teams will use the following collaboration schemes:
1. An auction based collaboration scheme similar to the one presented in “Hoplites:
A Market-
Based Framework for Planned Tight Coordination in Multi-robot Teams,” by N.
Kalra, D.
Ferguson, and A. Stentz.
2. The MDP based algorithm presented in “Multi-agent Planning with Factored
MDPs,” by
Carlos Guestrin, Daphne Koller, and Ronald Parr.
3. The constraint satisfaction based algorithm presented in “Using Cooperative
Mediation to
Solve Distributed Constraint Satisfaction Problems,” by Mailler, Roger; and
Lesser, Victor.
Note: The probability that I will pursue this project is .75.
A summary of last year’s projects:
This section provides a summary of the projects (on the web site) from last
year’s Cognitive Robotics class.
I wrote this section to help formulate the scope and range of topics for an
acceptable course project.
I review each paper in a necessarily short format, which includes a summary
of the main components of the
paper and a list of the application, type of reasoning and techniques used to
solve the problem.
Paper: Model-based Programming for Cooperating Vehicles
Seung Chung
Robert Effinger
Thomas Léauté
Steven D. Lovell
This project studied cooperative path planning for autonomous UAV forest fire-fighting.
Application: Forest Fire-Fighting with Autonomous Cooperative UAVs
Form of Reasoning: Path Planning, Goal (Mission) Planning
Techniques: D*-Lite, Dynamic Backtracking, Strategy-driven Kino-dynamic Cooperative
Path Planning, Receding Horizon Continuous Planner
Paper: Cooperative Q-Learning
Lars Blackmore
Steve Block
This project compared cooperative and independent Q-Learning for learning a
navigation policy about a
maze.
Application: Robot navigation
Form of Reasoning: Cooperative Navigation Policy Learning
Techniques: Cooperative Q-Learning, MDP
Paper: SLAM for Dummies
Morten Rufus Blas
Soren Riisgaard
This group implemented and wrote a tutorial for SLAM.
Application: Generic Localization and Mapping
Form of Reasoning: Localization and Mapping
Techniques: SLAM
Paper: Towards Visual SLAM in Dynamic Environments
Vikash Mansinghka
This project critiqued and developed some new concepts for Visual SLAM.
Application: Navigation (Localization and Mapping)
Form of Reasoning: Localization and Mapping
Techniques: Visual SLAM
Paper: Autonomous Visual Tracking Algorithms
Alexander Omelchenko
This project investigated the visual tracking of ground vehicles via the collaboration
between an
autonomous UAV and a rover (ground vehicle).
Application: Visual tracking of ground vehicles
Form of Reasoning: Tracking
Techniques: Image Processing: Contour Tracking and Center of Mass Tracking
An agent like the one described above will draw upon three core areas of research:
1. Knowledge Representation. Clearly the agent will require mechanisms for representing
the behavior rules it has learned,
but the main applications of well-developed representational schemes will lie
in effectively managing the input data from its
environment. A hierarchical ontology, likely containing several specialized
elements, will be required to abstract visual and
other sensory stimuli to a level providing appropriate inputs to decision algorithms.
Due to the complexity of these elements
of the system, I plan to avoid the complexities of real-world knowledge representation
and develop decision algorithms
based on data from "toy" examples.
2. Statistical Learning. The agent must be able to observe the behaviors of
a human operator in an environment and
develop decision strategies based on patterns extracted from this sample data.
The agent must also be able to dynamically
incorporate new sample data during its operations and incorporate new inferences
into its strategic model. The complexity
of the model should be optimally adjusted to the volume of training data available,
and should be independently controllable
by the method based on resource constraints.
3. Probabilistic Reasoning. While arguably a component of statistical learning,
I regard the agent's actual decision
procedures as a somewhat distinct element of the system, which may be handled
independently from the formulation of
those procedures based on learning. The execution of these decision procedures
will almost certainly require probabilistic
inference, although the specific methods involved will depend upon the strategies
produced by the learning framework.
The primary motivations for algorithm selection will come from flexibility—
the ability to recognize general patterns and
estimate general functions— and feasibility— the ability to perform classification
and decision-making using a compact
strategic representation. Hence the general thrust of this effort will lie in
searching for powerful generalizations of more
elementary schemes (e.g., parameter decision trees constructed by ID3). This
intuition motivates discussion of the following
three papers from the recent literature:
1. Weiss SM, Indurkhya N. 1995. Rule-based machine learning methods for
functional prediction. Journal of Artificial
Intelligence Research 3:383-403.
The authors initially show how to generalize traditional discrete decision trees
used for classification to regression trees used
for functional estimation. Like decision trees, regression trees perform partitioning
based on a disjunctive normal form
strategy, which has advantages for clarity of knowledge organization and traceability
to features. However, for pattern
recognition applications, they propose a more general rule-based approach to
regression, which eliminates the DNF
constraint and can potentially find much more compact representations. This
can be important for large spaces, and can
potentially find rules with substantially clearer interpretations. Through a
number of real-world examples, they show that
rule-based regression algorithms using numerical optimization techniques can
significantly outperform tree-based methods
both in performance and speed. They also show how this approach can be effectively
combined with partitioning and
nearest-neighbor methods (e.g., bounding pseudo-classes on the basis of a fixed
neighborhood population) in order to
improve performance still further. They also pursue sample storage compression
enhancements with some success.
The methods presented here move in the desired direction in terms of generalizing
the approach to classification and
estimation, and provide concrete algorithms and examples demonstrating their
effectiveness in certain situations not unlike
exploration behavior. However, the fundamental methodology still relies upon
complete storage of training samples and a
somewhat discretized pseudo-classification approach to pattern decomposition.
That is, rule-based regression may not be
general enough to provide the kind of dynamic adaptability and scaling to training
data desirable in our agent. The other
papers in this collection introduce and develop a more general approach based
on "support vectors," which may provide a
better high-level framework for our applications.
2. Boser BE, Guyon IM, Vapnik VN. 1992. A training algorithm for optimal
margin classifiers. In: Proceedings of the
5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA. 144-152.
These authors take a step beyond both tree-based and rule-based parametric decision
methods by devising a method to
effectively reparametrize the space of observations according to the most useful
global indicators. That is, they construct a
dynamically generated basis for the input space using "support vectors"
optimally chosen to maximize the resolution of the
boundary between decision classes. This provides significant improvement on
traditional regression-based approaches,
which tend to smooth over any atypical patterns as represented in the original
input basis. Their training algorithm also
grows dynamically with new input data, while incorporating many other linear
and nonlinear methods as special cases.
Moreover, the authors show how to construct a dual space representation (of
reduced dimension) for the actual decision
kernel, which allows the underlying quadratic optimization problem to be solved
efficiently using standard numerical
techniques. The authors demonstrate empirical performance on many classical
pattern recognition problems (such as
handwritten digit recognition) significantly exceeding other leading algorithms,
in some cases even those with pre-defined
task-specific models for those problems.
This approach demonstrates both generality for uninformed pattern recognition
and dynamic adaptation and scaling to
enlarging training data sets that we desire for our agent. By comparison to
the Weiss et al. paper, however, it still does not
demonstrate applications to functional estimation as well as pattern recognition,
which will also be a key element of the
agent's exploratory behavior. For this further development, we turn to the last
paper in this collection.
3. Vapnik V, Golowich S, Smola A. 1997. Support Vector Method for Function
Approximation, Regression Estimation,
and Signal Processing. In: Neural Information Processing Systems, Vol. 9. MIT
Press, Cambridge, MA.
In this paper, the authors extend the support vector method to three critical
classes of applications: function approximation,
regression estimation (the principal subject of Weiss et al.), and signal processing.
The primary content of the paper consists
of explicit mathematical algorithms for carrying out each of these tasks in
a generalized fashion, but relatively simple
example applications are demonstrated for each one, with sufficient realism
to provide indicators about performance. The
authors show in general how the reduced effective dimensionality of the support
vector basis translates into lower
complexity in all these areas; that is, complexity is driven by the desired
complexity of the result rather than the complexity
of the initial parametrization of the problem. Performance bounds are generally
impressive, demonstrating feasibility for a
wide range of problems.
Ultimately, it seems likely that our agent may be built upon a support vector
machine architecture, which may or may not
rely on some initial modeling knowledge, depending upon the simplicity of the
"toy" environment chosen for development
and testing. For explicitly discrete classes of problems, rule-based methods
may yet be of some utility, since they generally
afford simpler symbolic representations.
One potentially feasible project for developing this framework might pursue
a specific two-dimensional visual task, using
human training data captured by a PC-based input device (such as a mouse). To
truly demonstrate the success of the
architecture, the task should be sufficiently complex as to defy simple rule-based
description, as might be encoded directly
in software by a programmer well-versed in the activity. A plausible analogy
with exploration tasks (navigation and/or
feature recognition), such as locating a target, would be preferable though
not essential. It might be beneficial for
pre-defined agent models and sample training data to be produced by different
people, so as to enforce the uninformed
pattern recognition aspect of the problem.
When an autonomous agent is interacting with the real world, it is often the
case that the environment is non-deterministic. This could occur if the agent
is dealing with a dynamic environment, or if the agent needs to interact with
human beings. In such situations, the agent may need probabilistic reasoning
techniques to model the environment and make decisions. If the agent can make
observations over time, then techniques such as hidden Markov models, Kalman
filters, and dynamic Bayesian networks could be useful to make future predictions.
Probabilistic reasoning techniques could be useful in applications such as speech
recognition.
2. Machine learning with incomplete data
One possible attribute of an intelligent system is the capability to learn
to make correct decisions based on available and possibly incomplete training
data. It may require learning probability models such as Bayesian networks and
making maximum probable hypotheses. Algorithms such as Markov chain Monte Carlo
and expectation-maximization, and models such as Kernel models and neural networks
may be useful.
3. Communicating, perceiving, and acting
A robot that needs to interact with the environment must have a variety of methods of sensing and communication. Language recognition, tactile sensing, and visual image processing become critical for an interactive robot. In visual perception, object recognition, edge detection, contour mapping, and motion detection would be useful to study.
Part D: Researching a Critical Reasoning Method
On the topic of probabilistic reasoning, I have found a myriad of papers about
Bayesian networks and various aspects of Markov Decision Processes. Having little
initial knowledge on the relevance of these papers on my project goal, the following
three papers were somewhat chosen at random from the Uncertainty in Artificial
Intelligence Conference Proceedings.
Hand motion
Signal detection
Filtered signal
Pattern matching
Command generation
1. Myers, J. W., Laskey, K. B., and Levitt, T. S., Learning Bayesian Networks from Incomplete Data with Stochastic Search Algorithms
Current research in Bayesian networks generally assume complete data so that
the Bayesian network problem can be reduced to learning the graph structure
and parameters. However, problems arise when there is incomplete data because
there are no closed form expressions for the scoring metric used to evaluate
network structures. Some researchers have taken parametric approaches such as
the expectation-maximizatino (EM) algorithm, but this method is prone to “getting
stuck” on local optima.
Myers et al. propose a stochastic search method to avoid the local optima problem.
They use a combination of evolutionary algorithms (EA) and Markov chain Monte
Carlo (MCMC) algorithms to create a new algorithm called the Evolutionary Markov
Chain Monte Carlo (EMCMC) in order to learn Bayesian networks from incomplete
data.
The evolutionary algorithm is similar to the evolution processes that occur
in nature. Evolutionary algorithms consist of “a population of individual solutions
that are selected and modified in order to discover overall better solutions
in the search space.”
Initially, a population of solutions are selected, from which individuals are
chosen based on how good a solution it provides. These individuals are genetically
modified using the common genetic operators crossover and mutation. Thus evolutionary
algorithms deal with incomplete data by “turning the incomplete data problem
into a complete data problem by evolving the missing data and imputing these
values into the data.”
The Markov chain Monte Carlo algorithm is often used to model systems that seek
a minimal free energy state. The microstate of a system, or the state of a system’s
atomic structure, has an associated energy, E(s). For a given microstate of
a system, the equilibrium probability that minimizes free energy is given by
the Boltzman distribution. Markov chains are useful because they can be designed
to converge to a stationary distribution (such as Boltzman distribution), and
from that point the convergent distribution is the only one necessary to predict
occurrences in the future.
Evolutionary algorithms “sample from modes of the distribution of interest.”
“Markov chain Monte Carlo algorithms evolve to sample from the target stationary
distribution.” Since EA improve fitness through exchanging information, “the
same approach with MCMC will speed convergence by finding better fit solutions
faster.” Empirical results show that the MCMC with adaptive mutation has a much
faster rate of convergence than regular MCMC algorithm.
2. Flores, M. J., Gámez, J.A, and Olesen, K G., Incremental Compilation of Bayesian Networks
Bayesian networks are usually compiled into a junction tree or a join tree
(JT) before probability propagation is performed. The compilation process is
usually performed as one unit, where the entire join tree is created from the
Bayesian network. If the network gets modified, the whole tree needs to be recompiled.
The authors of this paper have devised a method for incrementally compiling
a Bayesian network so that it increases the efficiency and stability of the
join tree.
This paper was chosen because it may be useful to have the capability to deal
with a more dynamic Bayesian network in an efficient way. Since the states of
the environment that I will be dealing with in my project goal may not be completely
determinable at the beginning, the generated Bayesian network may need to be
modified through time, and thus it may be necessary to incrementally compile
the network to a join tree.
The paper provides algorithms for dealing with all the structural changes in
a Bayesian network, including modification of the probability distribution for
a variable, modification of the states of a variable, removing an arc, adding
an arc, removing a node, and adding a node. It starts off by decomposing the
Bayesian network into its maximum prime subgraphs (MPS), which contain minimal
triangulation. Then these MPSs can be triangulated independently.
The paper provides a clear set of algorithms for dealing with the steps for
incremental compilation. Modifying potentials and the states of a variable generally
leave the structure of the network the same, and thus can be simply updated
in the join tree. When removing or adding an arc to the network, large changes
could be made due to cycles in the network. Thus “it is beneficial to be able
to identify the minimal part of the join tree that could be affected by the
change and concentrate on a retriangulation of only that part. The removal or
addition of a node can be performed by first removing or adding all the necessary
arcs around the node, and then simply removing or adding the node itself.
3. Dearden, R., Friedman, N., and Andre, D., Model Based Bayesian Exploration
Reinforcement learning deals with the learning process of an agent in a dynamic
environment, where the agent has the choice of exploring an untested set of
actions or choosing a previously used set of actions that are known to be good
(i.e. exploitation). Markov Decision Processes (MDPs) are useful frameworks
to support reinforcement learning by enabling us to determine the actions that
would maximize the expected future reward.
It may be useful to create a model of the dynamics of the environment so that
the expected values of actions in the environment can be easily determined.
Modeling the
environment can aid the efficiency of computation because it reduces repetition
of approximating value functions that measure the desirability of each environment
state.
The paper shows “how to maintain Bayesian belief states about MDPs” and “how
to use the Bayesian belief state to choose actions in a way that balances exploration
and exploitation. The paper was chosen because it provides a possible useful
approach to reinforcement learning systems, which may be handy for my project.
The complex system described in Part B can be toned down to a two-dimensional
pattern matching problem. A user can make a motion using a mouse on a computer,
and the goal of the agent is to learn this motion and correctly identify it
as a certain command. The agent would need to filter out random motions of the
mouse input to capture the important information. It then needs to build a library
of the meanings of these motions, and after a period of training, be able to
correctly identify the command.
This project is feasible and challenging enough that there is a relatively high
probability that I will pursue this project during this course.
The topic I am primarily interested in learning more about is computational
cognitive reasoning as it may be applied to game theory, specifically chess.
I will never forget the day back in 1997 when Kasparov was actually defeated
by Deep Blue. (Although I still contend that Kasparov was paid to throw the
match. The evidence for this is that in game 2, he played the computer to a
draw, but “didn’t realize it” and decided to concede rather than finish the
game. IBM NEEDED Deep Blue to win! ?) From the brief literature survey I conducted
on the subject, it appears that most of the mainstream literature has actually
tended away from chess-playing in particular over the last few years, though
there are a large number of papers from the 80’s and early 90’s describing the
evolution of the minimax algorithm and the many adaptations people have incorporated
into it. Aside from the basic search algorithms, more recently neural networks,
genetic algorithms, and numerous other machine learning techniques have seen
their application to this problem as well. It seems to me that a truly generic,
efficient, and novel algorithm for “combinatorial games,” as the literature
tends to call chess, checkers, and other games of this type, has not really
been attempted in many years, with many authors simply biding their time until
computers become powerful enough to minimax search the entire game tree. Hopefully
something I learn in this class will prove that statement wrong!
A second topic I am interested in is computer vision. Most specifically, I would
like to learn the state of the art in visual object detection and recognition
techniques. About four years ago as an undergraduate, I worked for two summers
at JPL on a project designed to make autonomous Martian rovers that would be
as skilled at classifying rocks and geological features as our staff Martian
geologist. Our techniques were pretty much state of the art at the time (Gabor
filtering for texture analysis, Bayesian classifiers, feature analysis, 3-D
stereo data for range and object segmentation), but I was really disheartened
by how poorly our methods actually performed in all but certain doctored situations.
In a machine learning class at Caltech, we learned about Lowe’s object detection
algorithm, and I was extremely excited. Then I learned that it too had many
of the same major limitations (for instance, changes in illumination direction),
and in general did not always work as well as its creators seemed to suggest
from their particular demonstrations. I would like to learn more about object
classification algorithms and hear if there have been any improvements in the
field.
Finally, I would be interested in learning about the use of reasoning techniques
as they are applied to the diagnosis of engineering systems. This is probably
one of the most useful and practical applications of autonomy at the present
time and its improvement and extension will be necessary and profitable in the
future. Deep Space 1 demonstrated that a spacecraft could autonomously repair
itself in flight, and such algorithms will be necessary on future deeper space
missions when communication lags would otherwise severely reduce the efficiency
and sustainability of the mission. We learned an algorithm for single-fault
detection in 16.413, but I would be interested now in learning the more complex
algorithms for multiple faults.
The first paper I chose to read was Playing Games with Algorithms: Algorithmic
Combinatorial Game Theory by Erik D. Demaine. In reading this paper, I hoped
to attain a general overview of the theory behind combinatorial games like chess,
and that is exactly what I was given.
Demaine began by first giving the ubiquitous, but, at least in my practically
ignorant case, extremely helpful introduction to and definition of combinatorial
games. These games involve two players, named Left and Right, who alternate
turns using deterministic moves. One of the most important aspects of this class
of games is that they are by definition perfect information games, that is both
players know all the states of the game at every instant, both players know
what moves the other player has made, and all actions and consequences are completely
deterministic and observable. While these restrictions are sufficiently loose
to incorporate innumerable common games such as chess, checkers, and Go, they
are restrictive enough to allow game theoreticians a handle by which they can
manipulate this subclass of problems.
With the problem well-stated, the author then continued by introducing the most
important game theoretical technique used in analyzing the computational complexity
of problems associated with combinatorial games, Conway’s Surreal Numbers. A
surreal number is actually defined by a pair of sets {L|R} in which all numbers
in L are less than all numbers in R. The value of this surreal number is then
“the ‘simplest’ number larger than all Left options and smaller than all Right
options.” While these numbers may seem to be purely a demented mathematician’s
idea of a strange joke, it turns out that they can be very useful in making
analogies to combinatorial games. Demaine explained that games with a surreal
value of 0, a positive number, or a negative number have the predictable outcomes
that the second player to move wins, the player named Left wins, or the player
named Right wins, respectively. Those games, such as {1|0} outside the set of
surreal numbers have the predictable outcome that the first player to move wins.
The paper goes on to explain a number of other ways in which surreal numbers
have been used to prove numerous results regarding the predictability and complexity
of combinatorial games.
Finally the paper analyzed a few well-known combinatorial games, including chess,
with respect to their expected computational complexity. It was shown that given
an arbitrary chess board configuration, it is EXP-TIME complete to determine
the winner. This implies that there can be no polynomial time algorithm for
computing the winner of a general chess game no matter how hard we try to think
of one, a fact which is important to know so that we do not waste too much time
on our project!
Overall, this first paper was meant to be a general introduction to the basics
of game theory and some of its applications to chess and other games like it.
In this respect, it served its purpose well. However, with respect to the other
papers, I felt that for my purposes this one might have been slightly too theoretical,
with not enough practical applications to truly be of use to my present project.
The tidbit regarding the exponential complexity of determining a winner in chess
was helpful however, and I will remember that trying to look for a checkmate
even a few moves ahead is inherently an inefficient task.
The second paper I read was The Games Computers (and People) Play by Jonathan
Schaeffer. This was actually a 70 page “chapter” written as an update to Arthur
Samuel’s well-known 1960 chapter in the textbook Advances in Computers. I chose
to read it because in browsing it, I noticed that not only did it explore in-depth
various search methods and their extensions, but it also exhibited a fairly
detailed analysis of the Deep Blue chess program and its match against Kasparov,
making it a great basis for the present project.
The main focus of the first section of the paper was in describing the mini-max
and alpha-beta game tree search algorithms. We briefly covered these algorithms
in 16.413 and I have already used alpha-beta searching to create my own checkers
program, so much of this section was review. What I found extremely interesting
and useful, however, was that the author not only described the basic algorithm
as Russel and Norvig did, but also detailed many of the improvements that had
been made in recent years to the simple application. In particular, Schaeffer
discussed four techniques that have been used to speed up or otherwise improve
alpha-beta search, describing these techniques as “the recipes for building
high-performance games.” The first of these techniques was caching. In games
where multiple sequences of moves may evolve the board into the exact same state,
storing the results of previous searches in a hash table can have dramatic effects.
Schaeffer quotes as much as a 75% reduction for chess searches and an 89% reduction
for checkers. A second method for reducing the cost of an alpha-beta search
is move ordering. The benefits of pruning are realized most dramatically when
the search is able to cut off a branch after examining only a very small number
of its leaves. Schaeffer describes a number of ways in which this ordering can
be performed, including the use of the cached transposition tale and application
dependent heuristics. The third method for improving alpha-beta search was guessing
a smaller search window. That is, at each iteration, alpha and beta are normally
initialized to be [-inf,inf]. However, these extreme values are rarely realized
and reducing the range of the window can greatly enhance the search speed. Schaeffer
described this as a “gamble” because reducing the window too much may require
another search to be performed, but he also described a number of heuristics
for intelligently choosing the search window which would minimize the risk of
this gamble. The fourth and final method described by Schaeffer he described
in financial terms. If you are investing in stocks, you put more money in the
stocks you think will perform well and less in the stocks you believe will perform
badly. Similarly, when you search, you search deeper in branches that seem to
be getting somewhere and cut off early branches that seem to be going nowhere.
He mentioned a number of heuristics for doing this in chess in particular, many
of which may be directly applicable to our final project.
The other section of the chapter that I found very interesting was Schaeffer’s
description of Deep Blue’s algorithm and its performance against Kasparov. Deep
Blue was the result of a decade of IBM funded research and was made up of parallel
processors, dedicated chess chips, and the consultation of a chess grandmaster,
Joel Benjamin. Schaeffer, I feel, accurately appraised the significance of the
match, saying that it is only one intriguing data point with no statistical
significance and that Deep Blue’s construction was, while definitely an improvement
in computer chess-playing, largely unproductive since it was immediately dismantled
and its funding completely cut. The analyses contained in this section highlight
some of the important pitfalls typical computer programs fall into that humans
are able to avoid and will be priceless when analyzing our own algorithm.
I am really glad I found this paper. It will be invaluable for our project.
We could even just compare the efficiencies and performances of the four types
of methods described in the paper just to give a quantitative understanding
of how each one affects the search. Compared to the other papers, it was broad
and basic enough to be useful and understandable, and practical enough to be
useful in an applied sense.
The third and final paper I chose to read was entitled An Evolutionary Approach for the Tuning of a Chess Evaluation Function Using Population Dynamics and was written by Kendall and Whitwell. After reading Schaeffer’s review of the Deep Blue algorithm, it became apparent that the most important part of a computer chess player’s performance is often its evaluation function, and these authors presented an interesting approach to developing such a function.
Computer chess programs have historically used a linear combination of weighted
parameters to heuristically evaluate the “goodness” of a particular board variant.
The general purpose of this paper was to create a method whereby a computer
chess evaluation function could be created without a human having to hand-tune
each of the individual weighting parameters in the function. This was done through
the use of an evolutionary algorithm. Basically, a particular evaluation function
was considered, namely a weighted sum of the player’s number of kings, number
of queens, number of rooks, number of bishops, number of knights, number of
pawns, and the number of legal moves available to the player. Fifty such functions
were initially randomly generated with randomized weight values. These functions
were then inserted into a chess program that played the two candidates as opponents.
The losing function was removed from the population. Whichever function won
the match was then slightly mutated and replaced in the population. It was also
allowed to “breed,” and another slightly mutated copy of it replaced the losing
function. In this way, new functions were evolved which tended to play better
and better chess. Of course, the paper described a number of details involving
the need to account for local minima and the algorithms used to mutate and compete
the various functions. Once the function population had reached quiescence,
their final values were averaged and taken to be the evolved evaluation function.
The authors tested their initial evaluation function against the final evolved
one using a commercial computer game that gave ranking information regarding
the player’s skills. The undeveloped function lost within 60 moves. The developed
function was actually able to win one game and lost the other game in almost
100 moves, showing quite vividly that the evolution process had created an improvement.
In addition, the developed function was rated a 1750 on a scale that Garry Kasparov
was rated a 2805. The undeveloped function, meanwhile rated a meager 650. Clearly
the evolutionary approach provides a fast, simple method for creating the weights
of a computer chess evaluation function which does not need human intervention
to optimize.
I had always heard about evolutionary methods and understood more or less how
they worked. This is, however, the first paper I have actually read that detailed
the equations, the mutation probabilities, and such. I found it to be extremely
interesting, clearly written, and hopefully useful. I am even considering “evolving”
a checkers evaluation function in my spare time to beef up my own autonomous
player. I think this would definitely be an option in selecting an evaluation
function for our project though I’m afraid five weeks may fly by faster than
I would like! ?
As I mentioned earlier, an autonomous general for the military is a bit out
of reach at the moment. Therefore, I would like to implement an autonomous chess
player that would demonstrate that computers could at least reason capably in
a “simulated war” and that tactics and strategy could be well within the reach
of future generations of machines.
We would probably create a simulation in Visual Basic, perhaps researching,
understanding, and implementing one or two of the various algorithms currently
available and maybe even finding some way to add our own personal tweak to them.
We could hopefully then play the algorithms against one another to benchmark
their relative performances. If this project is acceptable, I think it would
be pretty likely that I would choose to work on it.
Hoffmann, J., Porteous, J. and Sebastia, L. (2004) "Ordered Landmarks
in Planning",
JAIR, Volume 22, pages 215-278.
I selected this paper because as I suggested above breaking down the building
a
structure into logical steps seems very natural and would undoubtedly simplify
the
planning of such a large scale objective. The paper builds on Hoffmann's earlier
work on
reasonable orders for top level goals and extends and adapts it to the task
of reasonable
orders for landmarks. The paper goes through all the steps of identifying candidates
and
verifying landmarks, assessing (obedient) reasonable orders, and using them
break down
the search space into sub-tasks for planners to handle. The results are then
compared
among ordering and non-ordering of each search.
The paper was easily readable with a limited amount of jargon, good step-by-step
definitions, and examples of challenging concepts. Not only does it give a good
background and theory for their addition to path planning, it goes through
implementation, effectiveness assessment, and future work to be done. The paper's
weaknesses include the substantial length needed to go through everything and
the fact
that the method does not stand on its own and the reader needs other advanced
material to
utilize it. Another topic only lightly treated in the paper is that of how best
to resolve
cycles in the ordering graph. Test were alluded to but none were detailed for
this step in
the algorithm. An almost excessive number of tests are detailed such that you
really feel
the authors surveyed a broad cross-section of planning problems. While this
is good for
understanding the improvements made, it can also be deceptive as to what types
of
planning are left out and being over confident in the methods positive effects.
Hoffmann, J. and Nebel, B. (2001) "The FF Planning System: Fast Plan
Generation
Through Heuristic Search", JAIR, Volume 14, pages 253-302.
I selected this paper to compliment the previous paper and get a good basis
for the
actually planning techniques needed to solve the proposed problem. The paper
introduces
a method for planning that puts together known pieces in a novel way and adds
certain
performance tweaks. FF planning is a heuristic search planner, HSP, variant
that uses
GraphPlan to generate a heuristic based on relaxed paths to the goal and enforced
hillclimbing
search along with a couple specific pruning methods. The GraphPlan algorithm
was shown to run in polynomial time for relaxed search and heuristics of No-Ops
first,
minimal difficulty, and action set linearization for the backtracking phase.
The paper has many of the strengths of the above since the main author is the
same but
it is a bit less theoretical and has more of an applications motivated feel.
For each portion
of the algorithm, the authors looked at efficient implementation and while this
added
many small details, the total was an very effective search. One of the weaknesses
in the
FF method described is its over reliance on benchmarks to motivate design choices.
While this makes it effective in solving these sets of problems efficiently,
in the future
there may arise an important set for which FF performs horribly. To address
this concern
the authors say they are working on classifying what is the underlying trait
FF works well
on. I also question the authors' decision that upon failure of enforced best-first
search all
progress should be discarded rather than backtrack a bit and use another full
heuristic
search. They also described some methods for finding and avoiding dead-end states
said
they did not wish to use but could have considered using upon failure.
Ahmadabadi, M., and M. Asadpour. "Expertness Based Cooperative Q-Learning."
IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 32, no.
1
(February 2002).
This paper would be useful for taking account of unexpected circumstances arising
in
the real world context. The system would have to learn and adjust for individual
and
environmental differences. The text was meant to continue previous work on weighted
strategy sharing (WSS) by the same authors. It reviewed the previous method
and added
more realistic conditions (communication uncertainty, false data) and more testing.
This
paper complements the other two by looking at the layer below action planning.
It deals
with effective action execution and learning. The robots experiences need to
feed back to
the way actions are carried out for instance if one of them is working slower
than others it
should figure into our heuristic for actions and this could be accomplished
through
expertness criteria.
The subjects were described in an easy to understand fashion with a minimum
of
implementation details. The paper assumed a background in such systems including
knowledge of Q-Tables and how a learning system would be implemented in the
first
place. The methods described were for addition to such a multiple learning robot
platform
to take better advantage of cooperative learning. I found that the paper did
not have much
to offer in impressive results and the method was tested on only one benchmark.
The
authors also could have presented their results more understandably and also
put them in
the abstract. The organization and explanation of relative effectiveness could
have been
clearer emphasizing the most striking cases out of the large tests. Overall
the results could
be useful in that where positive reenforcement dominated taking the positive
weight
signals worked best and the same held for negative reenforcement and signals.
For my project, I could design a system to extract actions and states from a
full
blueprint including all materials and positions for a portion of a building
for a group of
robots to constructed. The planner would break down sequences of events into
necessary
landmarks in the process and assign efficient tasks to sets of robots. The system
could use
some kind of knowledge set to construct the state network and then use the FF
planning
and reasonable orders detailed above. Then if time allows another layer could
be added to
the planning to take account of feedback and learning from the robots as the
structure
progresses. The robots actions would be simulated and have random discrepancies
added.
1) I am primarily interested in SLAM. I plan to do my project on an
application of SLAM, and thus I am interested not only in the method we
learned about in class, but also the method outlined in the DP-SLAM
reading on the course webpage. I only want to implement one method, but I
need to know how both methods work so that I can figure out which is
easiest/best to implement. I will probably want to do my advanced lecture
on a SLAM algorithm.
2) I am also interested in probabilistic roadmaps. We covered basic
probabilistic roadmaps in class, which is something that I use in my own
research, to control two simulated robotic arms. I am also interested in
learning more about RRTs and MOPs, and would not mind doing an advanced
lecture on a specific algorithm related to them.
3) Finally, as a topic I am very interested in learning about in class
that hasn't been covered, I am particularly interested in MDPs and POMDPs.
I have learned about HMMs, but not about POMDPs, and the information in
Russell and Norvig is woefully inadequate.
Part D: Researching a Critical Reasoning Method
Paper 1: Thrun, Sebastian."A probabilistic online mapping algorithm
for
teams of mobile robots." International Journal of Robotics Research,
20(5):335-363, 2001.
The SLAM algorithm we learned about in class (Leonard's method of
using multiple overlapping submaps with Kalman filters) is useful when
errors in your robot's motion can be accurately represented using a
Gaussian. However, for robots with nonlinear, inaccurate motion, using a
Gaussian to represent the robot pose can be a poor approximation.
Instead, one can use particle filters, which represent the robot pose
essentially with a cloud of possible poses. Instead of a mean and
covariance matrix, the uncertainty in the robot pose can be captured by
recording a large set of 'particles' that each contains one possible robot
pose. Each particle is weighted by the probability that the robot is
actually in that pose based on the latest observation. At each time step,
the particles are sampled according to their probabilities, and the motion
model is used to extend them according to the new motion input (with
randomly sampled error included). Thus, the particle cloud expands as the
robot moves, but a good observation will weight only those particles that
are likely correct with a high weight, causing the cloud to shrink again.
This is also a good method to represent branching hypotheses; for
instance, if the robot could likely be in one of two separate places, the
standard Kalman filter method cannot represent this fact with a Gaussian.
The idea of particle filters is the basis for a whole set of major
SLAM algorithms, mostly developed by Sebastian Thrun. The reason I picked
this paper is because, unlike later papers that all assume that you know
all about particle filters and basic SLAM, it gives a coherent explanation
that starts from the beginning. This includes what particle filters are,
as well as how to do other low level details of the SLAM algorithm, such
as how to compute the most likely robot pose given a map estimate (using
gradient ascent), or how to represent the map as an occupancy grid whose
squares reflect the probability that there is an obstacle there.
The actual system discussed in this paper is somewhat primitive
compared to later particle-filter-based SLAM algorithms. In summary: the
robot uses a laser rangefinder, and the map is represented using an
occupancy grid rather than landmarks. However, only the maximum
likelihood map is kept track of, rather than multiple map hypotheses, as
we will see in later papers. The system is able to close loops, but it
does so in one lump adjustment using a lot of gradient ascent once a
conflict is found. The paper also discusses mapping with multiple robots
that can communicate with each other constantly (one robot is the 'team
leader', and new robots must be able to localize themselves in the team
leader's map, at which point they start adding to the map just as the
first robot does) and 3-D structural and texture mapping (their robot gets
a panoramic camera; the 3-D info is not used in SLAM in any way, the 3-D
images are just built using the SLAM results).
The algorithm used in this paper will probably not be used for my
cognitive robot, but many of the low-level details from this paper will
probably be used, and knowing this algorithm helps one to understand the
algorithms in later papers. While it might be useful for multiple spy
robots to be able to make the map together, it is not necessarily safe to
assume that they can be in constant communication with each other.
Paper 2: M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. "FastSLAM:
A
factored solution to the simultaneous localization and mapping problem."
In Proceedings of the AAAI National Conference on Artificial Intelligence,
Edmonton, Canada, 2002. AAAI.
FastSLAM appears to be a well-known, much-used algorithm, which is
why I chose this paper. As before, a laser rangefinder is used to collect
data, and the algorithm is based on particle filters. Unlike the last
paper, landmarks are used rather than occupancy maps. Also unlike the
last paper, each particle contains not only the robot pose, but also a
Gaussian for each landmark in the map. Thus, each particle records one
hypothesis for the map, and these can diverge widely if need be, rather
than relying only on the one maximally likely map as in the last paper.
Despite estimating the full posterior distribution over both the robot
pose and all the landmark locations, the computation time scales
logarithmically with the number of landmarks in the map (O(M logK), where
M is the number of particles and K is the number of landmarks). This is
not quite as good as Leonard's submap algorithm's time complexity of O(1),
but it appears to be quite sufficient for quite large maps (with 1 million
landmarks) even with only 100 particles, and was just as accurate as
Leonard's method on the locations tried.
This paper also discusses the data association problem, which is
the problem of deciding whether a landmark is the same as one you've seen
before (and which one it is) or if it's a new one, which is another
low-level detail not often covered in these papers.
For my cognitive robot, I would use a particle-filter algorithm,
but may or may not want to use landmarks (vs. an occupancy map). If I
were to use landmarks, this is probably the algorithm to use.
Paper 3: "DP-SLAM: Fast, Robust Simultaneous Localization and Mapping
Without Predetermined Landmarks." Austin Eliazar and Ronald Parr, 18th
International Joint Conference on Artificial Intelligence (Acapulco,
August 2003).
This paper is on the reading list for the class, and thus it is
the one I started with before realizing that I had no idea what particle
filters were. However, after reading the first two papers (the algorithm
in this paper is an extension of the work in the first two papers), it
becomes much easier to understand.
The algorithm in this paper combines features of the first two
papers. Like FastSLAM, it represents the entire map with each particle,
and can thus keep track of hundreds of candidate maps. However, unlike
FastSLAM, and like the first paper, it uses an occupancy grid as its map
representation. As before, a laser rangefinder is used to collect data.
The main idea is that it finds a way to make copying of particles feasible
when each particle has to keep track of an entire occupancy grid. If each
new particle just copied over the old map and changed it, it would take
O(MP) time, where M is the size of the grid and P is the number of
particles. Instead, the algorithm presented keeps a particle ancestry
tree, so that each particle does not have to contain the entire map. Each
square of the occupancy grid keeps a separate tree that keeps track of the
IDs of every particle that has made changes to the occupancy of that
square, along with their observations. Each particle keeps track of its
ancestors, and thus when the particle wants to find the value of a grid
square, it checks each of its ancestors to find the most recent change to
that square. If nobody has changed that square, the occupancy of that
square is unknown.
The complexity of this algorithm can become somewhat bad (O(ADPlgP), where
A is the area that the laser sweeps out, D is the depth of the ancestry
tree, P is the number of particles, and M is the size of the occupancy
grid), but in their experiments, the complexity was closer to
A*lg(squared)P, which is < M.
If I wanted to use an occupancy grid rather than landmarks for my
cognitive robot, this is probably the algorithm I would use.
My cognitive robot was a tiny spy robot (think cockroach-bot), capable of
slipping in through a crack in a building and autonomously mapping the
place while detecting motion and trying to avoid being seen. Even though
I would do my project in simulation, the full capabilities of such a robot
would be too much to implement.
Thus, here is a manageable project that embodies the salient
features: a tiny simulated robot, driven by a human (to avoid having to
develop a solid exploration function) to map the inside of a simulated
building using either FastSLAM or DP-SLAM. Even ignoring the
motion-detection-and-people-avoidance problem, this is already a
significant project. This is because the tiny size of the robot requires
the robot to do things a bit differently than most SLAM robots. The robot
would have to be able to get over low thresholds and the like, and so a
real-life tiny spy robot would probably have to be legged, likely giving
it terrible odometry. (The simulated robot need not be legged; it just
needs to approximate the movements of a legged robot.)
While no tiny laser rangefinders are currently available, it is quite
conceivable that a tiny laser rangefinder could be developed. This is
because the laser itself is just a tiny diode; most of the mechanism of a
laser rangefinder is for precisely aiming the laser. If a MEMS device
with tiny mirrors could be built to do that job, a laser rangefinder could
be made that was tiny enough to fit on a tiny cockroach-bot. However, the
robot would have to creep along walls to avoid being seen, rather than
charging down the middle of large corridors, possibly complicating the
typical problem of exploring corridors and rooms.
Support Vector Machines – Kernel Methods
Support vector machines (SVMs) is a relatively new family of methods for pattern
analysis. They can be viewed as a particularized form of neural networks. SVMs
generally are composed of two parts, a portion that handles mapping of the data
into a feature space and a learning algorithm designed to discover linear patterns
in that feature space. These can be used for a wide variety of data including
simple types such as text and images through more complex objects such as biosequences,
graphs, and grammars. I am particularly interested in how this new topic will
contribute to machine vision.
Nello Cristianini and John Shawe-Taylor wrote An Introduction to Support Vector
Machine which has since been used as a main research reference book for this
topic. This past year, they have followed that work with Kernel Methods for
Pattern Analysis. This work provides a large toolset of algorithm, kernels and
solutions to be implemented that will be very useful for the aforementioned
fields.
Path-Planning – MILP using Receding Horizon Control
I am particularly interested in this topic to see how it can be applied to the
Mars Airplane project. Path planning, itself, has long (relatively) been a research
focus in artificial intelligence. Russell and Norvig mention that STRIPS was
the first major planning system, but I am particularly interested in Mixed Integer
Linear Programming (MILP) to be applied towards path-planning for autonomous
aerial vehicles. Jonathon How and Eric Feron at MIT have contributed recent
work in this area that I want to further investigate.
Decision Making – POMDP’s
Sequential decision making can be specified with a Markovian transition model
and additive rewards in what is called a Markov Decision Process (MDP). When
the environment is only partially observable, this complicates the problem.
Partially Observable Markov Decision Processes (POMDP) reflect an accurate description
of the real world. The problem can be divided into belief states which then
can be mapped to an optimal action. This provides the basic framework for approaching
this problem.
The Witness Algorithm as presented by Kaelbling, Littman, and Cassandra in “Planning
and acting in partially observable stochastic domains” provides a solution problems
of smaller sizes and I am investigating current work that further delves into
this topic.
Georgios Theocharous and Leslie Pack Kaelbling, "Approximate Planning in
POMDPs with Macro-Actions," Advances in Neural Information Processing Systems
16, Vancouver, 2004
This paper was selected since large-scale planning using macro-actions in
POMDPs will help us to effectively manage and solve for POMDP solutions in a
large belief space. This will directly apply to the Mars Airplane since we will
need to manage the belief space for the potential areas that the system can
travel.
This paper’s major contribution is to present an algorithm that will allow us
to learn POMDP policies faster and execute information gathering more efficiently.
The algorithm does this by using dynamic grid approximation to stratify and
approximate the belief space and it uses macro-actions to reduce the number
of belief spaces visited by the agent. In addition, it uses an effective reward-shaping
approach that uses Monte Carlo updates to the Q-values.
While the paper shows the benefits of using macro-actions which in effect stratifies
the belief space, it does not show how to dynamically generate good macro-actions
that will be effective in creating a good policy. The paper notes that this
is ongoing research and will be a further subject in my future literature search.
This paper relates to the next paper that I will be discussing since the trajectories
created by that algorithm can be viewed as macro-actions for the Mars Airplane.
The third paper relates to this paper by investigating planning rules that may
help us in choosing good macro-actions.
John Bellingham, Arthur Richards, and Jonathan How, “Receding Horizon Control
of Autonomous Aerial Vehicles, ” submitted to American Controls Conference 2002
This paper was selected since it presents a strategy for using Receding Horizon
Control to effectively generate trajectories that we will be able to use for
the Mars Airplane. A part of the concern for the airplane is for its ability
to navigate the topography of Mars as it may be directed to fly closer to the
terrain if its science missions require it for exploration. We will need to
be able to quickly calculate new trajectories if the higher level mission planner
decides to modify its path goals and investigate a new area due to its sensors.
The major contribution in this paper is to demonstrate a new strategy for using
receding horizon control while solving a MILP problem that will allow us to
avoid entrapment. They do this by using a cost map to estimate the future cost
that is beyond the receding horizon. Receding horizon control allows us to solve
for trajectories while reducing the complexity of the task, as measured by computation
time. The trajectories generated are also still close to the time optimal trajectory.
This is a major strength of the paper and will allow us to change course and
redirect the system’s exploration.
The key aspect to this algorithm is the generation of the cost map. This is
one area that I wish had more details of in the paper, but I intend to pursue
other literature that will shed light on more of the process of creating the
cost map.
As I mentioned earlier, I see the relationship of this paper with the prior
paper as a component of the overall system. I hope to be able to integrate the
strategies presented to help create a successful design.
Hanna M. Pasula, Luke S. Zettlemoyer, and Leslie Pack Kaelbling, "Learning
Probabilistic Relational Planning Rules," Proceedings of the Fourteenth
International Conference on Automated Planning and Scheduling , 2004
This paper was selected to investigate the methods we can use to choose good
macro-actions for the Mars Airplane. We will need to be able to come up with
a system that can learn which areas will be useful for further investigation.
This paper demonstrates a new algorithm for the learning of rule-based operators
in a simple set of domains.
The paper does not investigate the partially observable cases, which will be
critical to our system. But that is not its intended goal. It instead presents
an algorithm that is demonstrated to be able to use relational rules to learn
more effectively than Dynamic Bayesian Networks (DBNs).
The major strength of this work has been that it has represented the action
effects using a set of alternative outcomes. Previous work in learning deterministic
action models are unable to do this. This is important since actions can have
an unlikely outcome that should be modeled explicitly.
While this paper is different than the other two papers since the algorithm
is for a static and discrete domain, it provides an effective method for learning
relational rules that is more robust than previous work in this area. I am hoping
to use its concepts to help provide a system in discovering rules for effective
exploration or other mission goals for the Mars Airplane.
I am particularly interested in attempting to integrate a finite horizon POMDP
into a high level mission planner for system integration with a MILP trajectory
designer to be used on the Mars Airplane. I am fairly certain now that I will
choose to work on this high level mission planner. I plan to use a simulator
to feed in the topography of Mars and generate new mission goals with their
uncertainties in real-time. These new mission goals may take the form of water
erosion sites or ground rovers distributed across a plain. The simulator will
have a flight model of the Mars Airplane and be designed to receive inputs from
the MILP trajectory planner.
A potential expansion to this project would be the image analysis using pattern
recognition that may use support vector machines or other tools, to recognize
and score new mission goals. In particular, this can be used for scoring new
sites for potential water erosion or it may be used to identify ground vehicles.
Another expressed interest for the Mars Airplane is model-based diagnosis for
fault tolerance of the system. This again would be an expansion to this project
and if others are interested, please contact me.
Another interest from Draper Laboratory is the use of path-planning to manage
turbulence for air drops. To integrate this with the Mars Airplane, we can use
the tools generated to manage the wind turbulence for the high level mission
planner and coordinate a new path-plan in the MILP trajectory planner that will
allow successful delivery of the equipment. We will have to then add to the
simulation, air drops of new scientific tools and equipment for rovers on the
ground.
Robotic articulated systems for mobility
Robotic articulated structures other then humanoids robots, for instance in
vehicle design.
There has been a lot of research done in humanoid robotic systems for the obvious
challenge of replicating some, or all of our human motorist and perceptual skills
largely
driven by the interest to replace or assist humans in dangerous or demanding
task from
fighting wars to assisted living.
Sports often act as catalyst in pushing technologic development in relation
to human
activity forward. Sort equipment also often acts as an extension of the body
augmenting
human movement and reach or interacting with the body.
Negotiating human control input and robotic autonomous control
How to translate human control input into a robotic articulated structure resulting
in
"legal" moves that do not put the structure into danger but still
try to follow the human
intention as closely as possible within the constraints of the system.
Teaching Robotic articulation beyond copying movements as a recorded
motion.
If a robotic structure can articulate itself over several degrees of freedom,
expression
becomes a possibility, just like an instrument offers a range of possibilities
of being
played some better then others an articulated structure offer a range of often
redundant
motion paths leading to the same result. Is there such a thing as graceful robotic
movement? Graceful of course can be defined on many levels, for instance from
the
perspective of energy efficiency, shortest path, continuity of movement, acceleration,
appearance of movement path, registration with some external cue etc. But how
to design
and teach such graceful motion is probably a challenge I would be interested
in
exploring.
Real-Time Randomized Path Planning for Robot Navigation
James Bruce and Manuela M. Veloso Computer Science Department Carnegie Mellon
University 5000 Forbes Avenue Pittsburgh PA 15213, USA {jbruce,mmv}@cs.cmu.edu
Reason to select the paper: The exploration of random trees for path planning
in highly
dynamic domains could serve as a relevant precedent for driving the articulated
vehicle.
Major contributions: Improve the efficiency of rapidly exploring large state
spaces
without tiling. This improved path planning speed in combination with kinodynamic
path
planners.
Strength: Clear illustration of the RRT path planning technique for the robo
soccer
domain.
Weaknesses: The paper lacks an explanation in detail of how the randomized paths
are
generated in a generalized fashion from the robots navigation constraints.
Relationship to the other two papers: The path planning addresses higher level
navigational task that look into the future and are only partially influenced
by the
constraints of the embodiment of the robot itself. The randomly generated paths
are all
valid paths though and therefore do not have to be translated into robot specific
actions.
How is it applicable to the cognitive system I develop?
It is crucial for the vehicle to be able at anytime to scan the possible moves
it can make
next in reaction to possible user input and choose the one that gets it closest
to the goal.
Since the vehicle is riding on spatially highly constraint streets at relatively
high speeds it
is important to configure RRT so the randomly generated paths are as close as
continuous
as possible within the road boundaries.
The paper demonstrates the advantage of rapidly exploring random trees (RRT)
in highly
dynamic domains, in this case robo soccer. The advantage of employing RRT is
to turn a
reactive system into high performance path planning system.
A control structure for autonomous model helicopter navigation
Gilbert Lai, Kingsley Fregence, David Wang
Reason to select the paper: The paper addresses autonomous control systems in
a
model-sized helicopter, which is related to the articulated chassis problem
I am trying to
address. The paper lays out a hierarchical software architecture that structures
the
problem into different layers.
Major contributions: Three layer hierarchical functional control structure
Strength: Clear layout of the problem domain and focus on the lower level helicopter
dynamics specific to this particular type of robot.
Weaknesses: The paper deviates very little from the conventional control layer
hierarchy
in comparison to other papers I came across.
Relationship to the other two papers: This paper addresses much more the robo-soccer
paper the lower level robot specific issues in autonomous control systems
How is it applicable to the cognitive system I develop: The model helicopter
is a good
example of a semi autonomous scale model exhibiting a number of degrees of freedom
and the paper illustrates the controller architecture used in a straight forward
fashion.
Human-Machine Cooperative Telemanipulation with Motion and Force
Scaling Using Task-Oriented Virtual Tool Dynamics
Tomotaka Itoh, Member, IEEE, Kazuhiro Kosuge, Member, IEEE, and Toshio Fukuda,
Fellow, IEEE
Major contributions: The realization of a new human robotic Telemanipulation
system
with motion and force scaling based on semi autonomous virtual tool dynamic.
Strength: The paper addresses clearly how the mapping of movements at human
scale
and strength can be mapped scale independent onto a range of telemanipulated
devices
using virtual tool dynamics.
Relationship to the other two papers: Where the first paper addresses higher
level path
planning issues in relationship to a dynamic environment and the helicopter
paper
demonstrates robot specific dynamics details for the control system the telemanipulation
paper addresses the issue of human machine interaction and the very important
aspect of
scaling of forces. The human interaction can have an impact on all levels of
the control
architecture and thereby a crucial component of any autonomous system that will
be
interacting with humans.
How is it applicable to the cognitive system I develop: The articulated car
should be
capable of processing human input to its chassis articulation as it is navigating
the roads
in addition to the autonomous stability control. In addition the control movements
of the
driver would have to be scaled up both in the range of motion as well as in
the force to
allow the driver to experience driving as a body like extension. The introduction
of a
virtual chassis model to negotiate the driver based interaction with the control
mechanism
and the articulation of the car chassis seems like a promising technique to
avoid
oscillation and feedback problems.
Project for the cognitive robot
I am developing an articulated chassis for a ongoing group project that is exploring
the
possibility of deploying an articulated chassis for a sportive vehicle.
One of the challenges is the mapping of the body movement onto the articulated
chassis
in a way that allows for the exploration of new steering movements but at the
same time
does not endanger the stability of the vehicle. The articulated vehicle should
be capable
of operating autonomously in response to high level user input as for instance
accelerate,
decelerate and steering input while keeping its balance and safe driving dynamics.
In
addition a user should be able to override or augment the robotic control with
human
input with the goal to allow for individual variations of the autonomously controlled
moves.
The project has a crude but functioning servomotor based mockup to test the
interplay of
the range of degrees of freedom. The goal of the final project would be to develop
a
simple software architecture that translates user steering input into the appropriate
chassis
movement using for instance an acceleration sensor for feedback and a simple
path
planning strategy to explore the range of possible motions to match the user
input with
the robotic chassis.
I am fascinated with the following topics:
*Path Planning:
I am interested in learning answers to the following questions:
How to plan a shortest / fastest / cheapest path in a known map to reach a goal
? How to include constraints, like moutains / threats ? How to deal with unpredictable
threats ? How to make a smooth path from a coarse path (for instance drawn by
a human being as a draft) ?
*SLAM
How to build a map of the robot’s environment while locating itself on it ?
Which strategy for team of robots performing SLAM (like: half of them moving,
the rest remaining as landmarks, etc) ?
*Voice & Vision
I am fascinated with the idea of a robot being able to interpret its vision
/ noise environment. I would like to learn more about how a robot can deduce
from a snapshot of its environment the things that are actually surrounding
it. I would like to learn more about the strategies to identify a known object
when it is hidden or seen from another point of view. I would like to learn
how the voice recognition softwares work.
Paper 1
Title
Experimental Demonstration of Multiple Robot Cooperative Target Intercept,
T W McLain, R W Beard, J M Kelsey
Why I chose this paper
I chose this paper because it deals with path planning for a team of robots
cooperating to attack several targets at the very same time. This topic is obviously
close to my project in that it implements path planning for a team of robot.
Moreover, the constraints in this paper are close to those in my project: threats
here are similar to the presence of mountains in my projects. Pop-up threats
are similar to mountain not present on the robots’s map.
Major contributions
Its major contribution is to show an experimental demonstration of the concept.
The robots achieve to plan their path in cooperation so as to reach the targets
at the same time. The paper present a method to write the goal in a way that
makes the decentralization easy. It presents also a way to generate kinematically
acceptable paths.
Strengths
The paper presents a decentralized control strategy for the team to reach the
targets together. It presents a method to generate trajectory according to the
kinematic constraints. It shows the success of experiments made on those principles.
Weaknesses
The control strategy is not explained very well.
Relationship with the system I want to implement
In this paper, a team of robots plan their paths to targets they want to reach
at the very same time. In my project, I will also work on path planning, and,
if possible, also for a team of robots. The threats that the robots in this
paper want to avoid are similar to the mountains my robots will avoid.
Paper 2
Title
Probabilistic Path Planning for UAVs, A Dogan
Why I chose this paper
This paper presents a probabilistic approach to path planning, which I find
interesting. It also presents a smart way of making the trade-off between the
length of the path and the probability to be shot down by the threats.
Major contributions
This paper presents a new probabilistic approach to path planning for a team
of UAVs.
It makes such a parametrization of the problem that the user can easily make
a trade-off between his desire for a short path and his frighten of the threats.
Strengths
The approach being probabilistic, it can be used in the same way for pop-up
threats, without changing anything.
Weaknesses
The method presented does not avoid limit cycle in the paths. Even though it
detects them, it is too late.
Also, the local minimization method does not provide an optimal path.
Relationship with paper1
This paper covers another aspects of my project: it deals more with the way
of representing the constraints / threats. Also, it is for a single robot, not
for a team as paper 1.
Relationship with the system I want to implement
I am interested in the idea of using a probabilistic description of the moutains.
In my project, the robots may have or not a map. In any case, this map is not
perfect, so a probabilistic description may fit well with my goal.
Paper 3
Title
Applying Kinodynamic Randomized Motion Planning with a Dynamic Priority
System to Multi-Robot Space Systems, C M Clark, T Bretl, S Rock
Why I chose this paper
In this paper, a new dynamic priority system is presented. Thanks to it, the
robots whose environment is the most crowded has the highest priority to plan
its path.
Major contributions
This paper shows a new method to avoid collision: instead of the static priority
system existing before, it presents a dynamic priority system, where the robots
whose environments are the most crowded can decide their paths before others.
Strengths
Thanks to the algorithm presented in this paper, a large team of robots can
perform path planning while avoiding collisions, by setting cleverly the priorities
between the various robots.
Weaknesses
No special weakness noted.
Relationship with papers 1 and 2
This paper is more interested in collision avoiding, while paper 1 deals with
cooperation to reach the goals at the same time. Paper 2 dealt with a probabilistic
representation of the constraints.
Relationship with the system I want to implement
In the system I want to implement, I need a strategy to set the priorities in
the team of robots. For instance, if the position of the people to rescue is
unknown, they have to cooperate, in a certain priority order, to scan the whole
area.
For my project, I would like to design an intelligent code for UAVs trying to
rescue people lost in mountains, while avoiding obstacles.
There are various levels in this project, which I could do depending on how
hard they are:
-at a first level, one helicopter (here a point with basic kinematic restrictions)
planning the fastest path to reach a known goal, in a known moutainous environment.
-adding unknown threats.
-having the goal uncertain / completely unknown.
-adding other helicopters to cooperate (share map updates / people position
updates / threats knowledge updates).
-adding helicopter-like behaviour for the motion / adding fuel constraints.
-adding altitude changes.
-multiple people to save.
-being as far as possible reachable by radio from the base.
I guess I could do the 3~4 first points. However, I do not know how ambitious
it is.
I will actually pursue this project, or maybe do something slightly different
and still related with Aeronautics/Space (for instance a team of UAVs attacking
a target/several targets in ennemy territory). However, again, I do not know
what is interesting or not, ambitious or not in this project.
i. Reward-based planning under uncertainty. What are effective ways to plan
given that state favorability is expressed with a known reward function but
the world is
not completely observable? I.e., how do we marginalize the PO in POMDP?
ii. Bayesian inference distributed over multiple agents. Given a collection
of
agents with partial knowledge of the world (derived from limited observations),
how do
we aggregate the information into a form suitable for statistical inference?
iii. Rule-based actions and logical description of environment. What are suitable
ways of logically describing an environment and the robots actions? What are
robust
Friedman
et al., “Learning Probabilistic Relational Models”,
This paper provides an introduction to the concept of a Probabilistic Relational
Model
(PRM). As I mentioned in the previous part, PRMs are like Bayesian networks
except
that they admit a relational structure over the variables that are being described.
Thus,
we have objects with certain types and attributes, where the values a certain
object’s
attribute assumes are correlated with other objects to which it is related.
Again, the
conditional probability relationships can be expressed in a graphical structure
like a
Bayesian network. This paper describes how to find the parameters of the model
given a
scheme and corpus of relational data. To do so, it builds on existing methods
for training
Bayesian networks.
This paper is important because it is widely cited and seems to have started
the
field. It is important for our purposes because it is necessary to understand
PRMs before
one can move on to DPRMs, the structure that we will be using to model state
as it
evolves through time.
Sanghai
et al., “Dynamic Probabilistic Relational Models”,
In this paper DPRMs are defined. Put simply, DPRM is to PRM as a DBN is to a
conventional Bayesian net. PRMs have schemas, descriptions of their relational
structure
(like in a database). An instantiation of a PRM is to fill in the schema with
particular
objects and values for its attributes. The idea of a DPRM is to describe, probabilistically,
how an instantiation at a given instant in time evolves to the next time-step.
Since an
instantiation is basically a set of objects (and associated values) and instantiations
change
over time, this setup permits objects to enter or leave the world (or, our model
of it).
The authors go on to describe a Monte-Carlo procedure for performing inference
in their model based on the application of simplifying assumptions. The technique
is
based on the observation that DPRM can be unrolled into DBN, albeit a very large
one.
Rao-Blackwellising the relational attributes reduces the size of the state space,
without
which the procedure wouldn’t work. The authors show, by experiment, that this
method
is able to perform inference accurately in domains consisting of over 1000 objects.
The significance of this paper is that it gives us a model that supports statistical
inference of structured data over unbounded domains. For our purposes, this
means that
that number of friendly robots about which we have to reason is immaterial,
so long as
they all fit the same relational description.
Doucet, et al.
“Rao-Blackwellised Particle Filtering for Dynamic Bayesian
Networks”,
This article provides an account of the development of Particle Filtering (PF)
techniques
up to the use of Rao-Blackwellisation, an analytical technique used by Sanghai
et al. to
make a tractable sampling technique for DPRMs. In its own right, PF is a Monte-Carlo
technique for doing inference in DBNs. They work by propagating a collection
of
sample points from one time-step to the next via sampling in accordance with
the
transition probabilities. The new points are then reweighted in accordance with
the
observations at the new times step. Finally, the points are resampled according
to their
weight, so that a greater number of samples lie in the region that is most probably
the
correct value of the hidden variable. This process is repeated until convergence.
The Rao-Blackwellised PF (RBPF) is an advance in that it uses analytic methods
like the junction tree algorithm to marginalize away some of the hidden variables,
thus
reducing the size of the sample space and hence the number of particles necessary
to
perform accurate inference. The RBPF algorithm is described in this paper, which
I will
use in the implementation of DPRM inference.
I propose to simulate a collection of mapping robots that explore some virtual
terrain. They will implement the reasoning capabilities described in Part C
in the sense
that they describe themselves, other robots, and the world around using a DPRM.
I will
simulate noisy and unreliable sensors, thus forcing the robot to consult its
failure models.
To preserve salient features of the multi-agent model, the ability for two robots
to
communicate, thus exchanging map data, will be left intact. Further, each robot
will
maintain estimates of its peers’ regions of expertise.
Although a fair amount of intelligence can be imbued into the executive, at
present it is not the aim of this project. Instead, it will likely be hard coded
with what
seems to be desirable behavior, nonetheless relying heavily on the robot’s estimates
of
world state.
Vision-based object recognition
I am interested in exploring vision-based object recognition in greater detail.
The Sift algorithm created by David Lowe was one example presented in class.
I am interested in learning about other scale and orientation invariant algorithms
and their inherent advantages and disadvantages. However, I don’t want to limit
my investigation to just Sift-like algorithms. I came across the use of tree-structured
belief networks (TSBNs) and dynamic tree-structured belief networks (DTSBNs)
as other examples in doing the literature investigation and I am curious as
to the breadth of algorithms used in visual recognition and why one algorithm
would be used over another. Also in my investigation, I came across a great
deal of research in facial detection and recognition. I would like to see how
much of this work is applicable to detecting and recognizing man-made objects
in natural environments. Finally, I would like to better understand what the
state of embedded visual object recognition is currently and what work is being
done to improve this state.
Multi-Agent Planning
Another area I would be interested to investigate further would be multi-agent
planning. More specifically, I am interested in learning about the dynamic reallocation
of resources based upon new information as it becomes available. I would like
to learn about what is the best way to communicate this new information amongst
the agents. Also, I want to learn about where the best place to compute a plan
is for the system. Said differently, I am interested in learning the tradeoffs
associated with each individual agent computing a plan and then comparing solutions
versus having one centralized computation center for action and then distribute
the plan to the agents. Similarly, I like to learn about the different methods
under development regarding how decisions are best made in a multi-agent environment.
I have worked briefly with genetic algorithms but would like to know more.
Neural Nets
The final topic of fascination is neural nets. I honestly do not know much at
all about neural nets right now. As a result, I am primarily interested in learning
topics other more advanced students might feel to be elementary. I would like
to know the context of the origins of neural nets and what work has been done
in the past. I am interested to know what the inherent advantages and disadvantages
of neural nets are and why they would be used and in what scenarios. I would
like to better understand what current research is being done and where participants
in the research feel that the work will lead them over the course of the next
5-10 years.
The first article that I read was entitled “Detection of Artificial Structures
in Natural-Scene Images Using Dynamic Trees” and was written by Sinisa Todorovic
and Michael Nechyba at the University of Florida. This article was accepted
to the International Conference on Pattern Recognition in August 2004 and can
be found online at http://www.mil.ufl.edu/mav/publications/papers/16.pdf.
I selected this article for two reasons. The first reason was that the detection of artificial structures in natural environments is a key component to the success of the UAV I described in Part B, which had a goal of maximizing the tracking of high priority targets while also maximizing the area surveyed. The first step towards achieving this goal is detecting man-made objects in a natural scene. Also, the authors are associated with the Center for MAV (Micro Air Vehicle) Research at the University of Florida, specifically the endeavor entitled Active Vision for Control of Agile Autonomous Flight. The University of Florida is a leader in MAV technology and research on the university level and has done work on vision-aided control of the MAVs they have developed, including horizon-based stabilization. It seems logical that the work discussed in this paper will eventually find its way into one of the MAVs.
The authors assume that the man-made objects are characterized “primarily by geometric regularities, and that artificial structures are rigid and composed of smaller, uniformly colored sub-parts.” They motivated the methods performed in the paper by giving a very brief reference to previous methods. The first method used an algorithm that extracted edges from images into larger geometric structures and the extracted edges served as nodes of a graph, where the geometric relations between the lines were links in the graph. The authors say that the graphs only accounted for nearest neighbor relations, limiting the complexity of the artificial structures. Next, the authors state that tree-structured belief networks (TSBN) have recently been used since they represent “pixel neighborhoods of varying size.” Though successful, the authors state that TSBNs give rise to “blocky segmentations.” The authors then propose to model man-made objects by dynamic tree-structured belief networks (DTSBNs) to eliminate this problem.
The DTSBN structures provide a way to detect both whole objects and parts of an object. The whole objects are modeled through the “root nodes,” while subcomponents are likened to the parent-child relationship on the tree. The authors thus believe that DTSBNs can lead to recognition of an object in the case of occlusion.
In addition to the choice of how to model objects, the authors also detail how they selected the image features. In looking for an efficient edge extraction method, they became convinced that traditional methods such as wavelets and Gabor filters were flawed and that a new method was required that took into account more than the “multiscale and localization properties” of wavelets. The authors suggest that geometric and color cues should be used to discriminate man-made objects from the surroundings. To do this, the authors proposed the use of multiscale linear-discriminant analysis (MLDA). The authors claim that MLDA “encodes both color and texture through a dynamic representation of image details” and that it “extracts edges over a finite range of locations, orientations and scales, decomposing an image into dyadic squares of uniform color.”
The authors then describe the method for creating the dynamic trees and applying the MLDA to the image. Due to space constraints, I will leave that discussion to the authors in the actual paper. The analysis was applied to 100 256 x 256 natural-scene images with both artificial and natural objects captured by a ground camera at varied distances. This was done as a comparison between TSBNs and DTSBNs. The experimentation showed that DTSBNs outperformed TSBNs and that trained DTSBNs could be used for Bayesian image classification, leading to man-made object recognition in the future. The authors also concluded that DTSBNs could provide a “unified framework for unsupervised unknown object registration.”
The strength of this paper is that it detected artificial objects in an unsupervised, natural environment with a method that demonstrated improvement over some existing research. The weaknesses of this paper in regards to the UAV project discussed was that the method relies on color images, a luxury night afforded to the UAV during night operations, and that the camera was ground based.
The papers I reviewed were selected independent of each other, but were instead based upon needs created by the cognitive robot definition. As stated above, the need that this paper addressed was the ability to detect and recognize targets in a natural environment. While this method could be investigated for daytime application, it would not work at night or if a color camera could not be used. Thus, a different approach would need to be found.
The second article I reviewed was entitled “A Low Cost Embedded Color Vision
System” written by Anthony Rowe, Charles Rosenberg, and Illah Nourbakhsh at
Carnegie Mellon University. This article was accepted to the IROS 2002 conference
and can be found online at http://www-2.cs.cmu.edu/~cmucam/Publications/iros-2002.pdf.
The reason I selected this article is because a soldier-portable UAV requires
an embedded image system that is restricted to a payload-like size and weight
requirement. When the entire vehicle only weighs 10 lbs, the vision system is
necessarily restricted to being at the very most 1 lb. Thus, a typical desktop-frame
grabber scenario is invalid. I wanted to see what the capabilities of an embedded
image system and the article served as one such benchmark. Also, the CMU system
allows for the tracking of a colored object. Tracking a detected object is another
vital piece of the required functionality of the combat zone UAV described in
Part B.
The article describes a product called the CMUcam, which is an embedded vision system which can perform basic color blob tracking at 16.7 frames per second. The motivation for the product was to extend vision system capabilities to applications that are restricted by size, complexity, and budget. In these types of applications, the traditional desktop computer with separate frame grabber is not feasible. Instead, the development and increased availability of CMOS cameras and microcontrollers combined to make an embedded vision system possible. The authors state that a major advantage of CMOS cameras versus CCDs is the “ability to integrate additional circuitry on the same die as the sensor itself.” The result is a vision system that is 1.75” x 2.25” and less than 2” deep with the camera module and lens attached, with a power requirement of 5 V at 200 mA.
The camera used in the product is an Omnivision OV6620 CMOS camera with an image array of 101,376 pixels, a supported resolution of 352 x 288, and a maximum refresh rate of 60 frames per second. The camera interfaces with the microcontroller using a standard serial port.
The microcontroller that is used to process the video is a Ubicom SX28 operating at 75 MHz. It is a RISC processor and operates at 75 MIPS, with a 2048 word flash programmable EPROM and 136 bytes of SRAM. The microcontroller also has fast and deterministic interrupts and three multi-bit I/O ports that were used to implement a serial UART port, a standard hobby servo PWM output port, and to control a status LED on the system. All firmware for the vision board was written in C and compiled using the ByteCraft SXC v2.0 compiler.
The color blob tracking was done using a simplistic algorithm that user to specify a the color limits of the object to be tracked and the bounding box in which the tracking was to be done. Then, analysis was performed to compare the number of color pixels within boundary to the colored pixels outside the box. More detailed analysis could be done, such as determining whether only one compact object was contained in the box or multiple objects, as well as a slew of color statistics and windowing options, but the details are best left to the paper due to length considerations here.
The authors reported that once the camera’s color bounds were set, that the CMUcam could track a blue 14” x 15” x 10” recycling bin confidently up to 35 feet away. With the appropriate IR coated lens, the authors reported that the system “performs well in a wide range of lighting conditions, including direct sunlight outdoors,” with nearly identical results outdoors as inside.
Finally, the authors created a 4” x 3” x 3” robot containing a small differential drive mobile base and a PIC microprocessor to further test the camera’s tracking abilities. The demonstration robot could successfully track a “small brightly colored red doll” at distances up to 15 feet.
The strengths of this article is that it details a system that is on the correct scale for size and weight as that which is required for a vision system that would be employed on a soldier-portable UAV. However, the weakness was that the system was completely incapable of providing the necessary level of functionality for the task. The tracking algorithm relied on color, which would not be available during night operation. The objects being tracked were either monochromatic or very simply colored to facilitate the simplistic algorithm instead of man-made objects in complex terrain. Even with the simplifications, the range, processing power, and memory were drastically undervalued for the desired application.
As previously stated, the three articles were not selected to have any relation to each other, but were instead selected to address functional requirements of the purpose cognitive robot. This article addressed the need for an embedded vision system that weighed less than 1 lb but could still perform object tracking. The article gives an idea of the state of technology for embedded systems on the scale of interest for the desired project.
The last paper I reviewed was a bit different than the other two. Instead of
being directly related to the algorithms required to add the necessary intelligence
to the proposed UAV project, the paper discusses the creation of an image database
created for facial recognition. While the direct application of the topic does
not apply, the issues dealing with capturing an image database across different
poses, illumination patterns, and expressions can be considered analogous to
the problems that would be encountered if an image database were to be created
of military targets. The article was entitled “The CMU Pose, Illumination, and
Expression Database” and was written by Terence Sim, Simon Baker, and Maan Bsat.
It can be found online at http://www.ri.cmu.edu/pub_files/pub4/sim_terence_2003_1/sim_terence_2003_1.pdf.
The authors addressed the need for a database consisting of “a fairly large number of subjects, each imaged a large number of times, from several different poses, under significant illumination variation, and with a variety of expressions.” To accomplish this, the authors suggest that imaging something from multiple poses either requires multiple cameras or multiple shots taken consecutively. They propose that the multiple camera option was more advantageous, if possible, for reasons including less data collection time, the fact that if the cameras are fixed in space, the relative pose is the same for every subject and there is less difficulty in positioning the subject to obtain a particular pose, and the fact that the imaging conditions are the same if the pictures are taken simultaneously. The disadvantage to this method is that multiple cameras, digitizers, and computers are needed, the cameras need to be synchronized such that the shutters all open at the same time, and that the cameras will have different properties.
The authors decided to use the multiple camera, simultaneous data collection method due to the “3D Room” at CMU. The 3D Room has 49 cameras, 14 of which were high quality Sony DXC 9000s. To produce similar image quality across the database, 13 of the 14 Sony cameras were used to take the pictures.
To obtain illumination variation in the pictures, the authors installed a “flash system” in the 3D room that was similar to one used at Yale by Georghiades et al in 2000. The authors used an Advantech PCL-734, 32-channel digital output board to control 21 Minolta 220X flashes. Generating a pulse on one of the output channels then caused the flashes to go off. The flash control code was integrated into the image capture routine such that flash occurred while the shutter was open. The authors modified the image capture routine so that they were able to capture 21 images, each with different illumination, in approximately 0.7 s. The capture routine took about 10 minutes per subject, allowing the authors to capture and store over 600 images from 13 poses, with 43 different illuminations, and with 4 expressions. The images were color and had a size of 640 x 486. This required approximately 600MB per person for a total of around 40GB.
While the topic of an image database of facial pictures does not have much direct application to a soldier-portable vehicle, the strength of the paper was that it provided background into some of the issues associated with creating an image database for objects. If such a database were created for military targets, some of the issues overcome by the authors of the article would need to be addressed. For instance, while the expression of a tank might not have any meaning, the illumination differences and the relative position of the viewer to the tank might make a difference, depending on the visual recognition algorithm. In order to obtain a full definition of all sides of an object, multiple pictures would have to be taken and analyzed, as was the case in this paper.
In addition to addressing some of the obstacles encountered in created such a database, the paper also gives a first approximation for time required to create such a database and the amount of storage required. While this certainly varies depending on the number of pictures taken per object and the amount of objects imaged, the paper at least provides a baseline estimate.
I would like to pursue a project in either visual object recognition or in developing
the preliminary work towards an automated landing capability for a UAV using
information provided by a lidar system.
The first type of project mentioned above is still as of yet poorly defined in my mind primarily due to my lack of substantive knowledge on the subject. I have never worked with visual recognition of objects before so I do not have a good feel for the scope of a reasonable project given a 1-2 month timeframe. Based upon the requirements for the cognitive robot described in Part B, I would like to implement a visual object detection algorithm that can at the very least detect what objects are present in a given picture or video stream. Once these objects are detected it would be ideal to perform at least some nominal comparison of features to that of known objects to see if a match could be found. No requirement of tracking would be necessary for the project, as object detection and recognition serve as the foundation for the UAV functionality.
I do not know at this time if implementing such an algorithm is realistic in the given time frame and if so, which algorithm best lends itself to the application. To mitigate my lack of knowledge and specificity on a vision-based object detection system, another possible project of interest is working towards an automated landing procedure for a UAV based upon information by a lidar system. As mentioned in Part C, such a system would be applicable in a broad range of UAV application, would save in both operational cost and the efficiency of vehicle design, and is in demand by the military currently. A lidar system is a laser radar, which contains a laser that sweeps across a designated swath and returns the distance to the closest object at a given angle. If this tool was aimed downward at an angle such that it projected ahead of the vehicle, the resulting distances could be used to define a topological map of the area flown over by the aircraft. Combined with the UAVs position, the lidar could be used to map the landing area.
As discussed in Parts B & C, two different methods could be used to land the UAV with the lidar. In the first method, the UAV would attempt to land with no prior knowledge of the surroundings whatsoever. The autoland system would attempt to avoid obstacles as they are detected, but there would be no effort to find an ideal landing location.
In the second method, the UAV would utilize two passes over the designated landing region. During the first pass over the region, the lidar would collect the information and the topological map would be created. Then, as the UAV is in transit for the second pass, the autoland system would use the topological map to determine where the ideal location is for landing. It would then determine the optimum trajectory to guide the UAV to the selected landing site, thereby maximizing the chance for survival. At a certain elevation above the ground, as determined by the lidar system, the autoland system would execute a pitch-and-stall procedure to bring the vehicle down as gently as possible.
My proposal would be to use the second method to land the UAV. I would of course
need several simplifying assumptions. The first obvious statement is that this
would all be done in simulation and not hardware development. Next, I would
assume that the simulated UAV would behave exactly as commanded in real-time
without error. I would also assume that the first scan of the landing region
was already performed and the lidar had properly detected and mapped the region.
From these assumptions, I would then create a 3D topological map and select
the best landing location. From this location, the surrounding terrain, and
the known position, attitude, and velocity of the aircraft, I would then attempt
to calculate a safe trajectory to the landing site. To be a bit more realistic
with the given time constraints, I would say that a safe trajectory is more
than adequate rather than an optimal trajectory and that the surrounding would
need to be “reasonable,” so that the UAV would not be attempting to land in
ridiculous environments in which no pilot would ever attempt to land.
Fault detection and diagnosis: When operating in real-world environments, autonomous
systems
must be able to monitor themselves and react appropriately when a failure occurs.
In some
cases, the system may be able to work around the fault and continue. In others,
it is sufficient to
avoid making the problem worse and await further instructions. In either case,
the system must
be able to use limited numbers of sensors (which may themselves be noisy or
faulty) to
determine, first, that something is wrong and, second, identify the specific
fault that has
occurred. Both of these things can be difficult, especially given limited computational
resources
and real-time constraints. I would like to learn more about current techniques
used to solve these
problems in systems such as rovers and space probes.
Detection and tracking of moving objects: In the version of SLAM presented in
class, an
assumption was made that the world was static, i.e. that any changes perceived
by the sensors
were a result of the robot’s actions. I would like to learn more about techniques
that could
remove this restriction and allow a robot to deal with other moving objects
in the environment.
These objects might be obstacles (like other cars on the road for a driving
robot) or things to
interact with (like other robots or the ball for soccer-playing robots). The
first problem is to
separate moving objects from the static background, especially when the robot
(and its sensors)
may itself be moving. The second problem is to track these objects and estimate
their
trajectories for purposes of avoiding, following or intercepting them.
Face detection and recognition in computer vision: To interact naturally with
humans, it is
helpful for robots to be able to detect and recognize faces. The first step
is simply for the robot
to recognize that it is looking at some face. This information can used to count
or track people in
its view since each person has exactly one face. In addition, a system such
as an intelligent kiosk
might want to know when a person is in front of it and looking in its direction.
The movement of
facial features can even be used to control aspects of a user interface. Even
better than detecting
that there is some person in a scene is the ability to recognize who that person
is. This is useful
for surveillance or access control but can also allow a robot to customize its
behavior to an
individual. Both of these problems are complicated by the fact that, in the
real world, people
tend to move around and may not be looking directly into the camera. I would
like to learn more
about current techniques to identify and recognize faces, even when they are
misaligned or
partially obscured. Some of these techniques operate on still images while others
use video to
aggregate data from a variety of angles.
R. Washington, “On-Board Real-Time State and Fault Identification for Rovers,”
In Proceedings
of the 2000 IEEE International Conference on Robotics and Automation.
This paper was selected because it describes a fault detection and identification
system for a
rover-type robot. My robot would have appendages similar to what might be found
on a rover
and could therefore suffer similar types of failures. This system is based on
the more traditional
(than the next paper) techniques of Markov models and Kalman filters but attempts
to fuse the
two to reduce the computational burden of state estimation. In my robot, this
work could be used
to detect faults in the appendages or drive system.
The system presented is a hybrid of a discrete-state Markov model and a continuous
state
Kalman filter. In this context, the state space includes both normal states
of the robot and states
where one or more faults have occurred. Thus, by estimating which state the
robot is in, we
determine both that a fault has occurred and identify which fault it is likely
to be. A Markov
model is used to represent different global modes of rover operation (e.g. idle,
driving, drilling).
Different modes imply different “normal” versus “fault” behavior. For example,
sensor values
indicating a high drive current may be normal in some modes but indicate a fault
in others. The
Markov model state is then used to set the parameters and state space for a
Kalman filter. The
Kalman filter uses sensor data to estimate the continuous state within the Markov
mode. In this
way the system prunes the state space that must be estimated by the Kalman filter.
The
estimated state from the Kalman filter is then propagated back to the Markov
model in the form
of a simulated “observation” matrix.
However, when the system changes from one state in the Markov model to the next,
it must carry
over the continuous state to form the new initial conditions for the Kalman
filter. Therefore,
different instances of the discrete states are actually made unique by the continuous
state fed into
them. This effectively increases the number of possible Markov model states
and can lead to an
explosion of model size. Therefore, the system limits the number of Markov states
to ensure that
the computational complexity stays bounded. This means that some improbable
states will be
lost and it is possible for the state update formula to return a null state.
They claim that this
problem has been addressed in previous work but do not explain it here.
Based on their results, this system does a decent job detecting faults on a
rover type robot and
runs in a reasonable amount of time with limited computational resources. However,
the system
requires a large number of Kalman filter parameters and seems to be very sensitive
to their
values. It would most likely require a large amount of data to be collected
from various trials of
the robot to properly establish these values. In addition, the limitation on
the size of the Markov
model limits the accuracy of the state estimations. While this is clearly a
limitation, it also
allows for a trade off between computation and accuracy. If more computational
resources are
available, the Markov model can be allowed to grow larger, increasing it’s accuracy.
V. Verma, G. Gordon, R. Simmons and S. Thrun, “Particle Filters for Rover
Fault Diagnosis,”
IEEE Robotics and Automation Magazine special issue on Human Centered Robotics
and
Dependability, June 2004.
This paper was selected because it presents an alternative solution to the fault-monitoring
and
identification problem on rover-type robots. It is based on a substantially
different idea than the
previous paper thereby broadening my view of the topic. Again, the algorithms
presented would
be applicable to monitoring faults in the appendages or drive system.
Most traditional techniques for fault detection (e.g. the first paper) attempt
to approximate the
problem that needs to be solved in order to reduce computational complexity.
In the previous
paper, this took the form of limiting the size of the state space. This paper
uses a technique
called particle filtering to find an approximate solution to the complete, complex
problem. The
basic idea behind particle filtering is to use a Monte Carlo technique to pull
samples (“particles”)
from the true solution. The distribution of the samples is then taken as an
approximation of the
true distribution. The main problem with particle filters is that a large numbers
of particles must
be drawn in order to accurately estimate the state when there are low probability
states (e.g. fault
states). This tends to negate the advantage of using particle filters in the
first place.
This paper presents three techniques which allow particle filters to accurately
model low
probability states using fewer particles. The first technique assigns a risk
factor to different
states and uses it to influence the choice of particles. Fault states, which
are usually lowprobability
but high risk to the mission, get a higher risk factor and are therefore more
likely to
be sampled. This helps to ensure that potentially serious faults will not be
over-looked, even
when using relatively small numbers of particles. The second technique groups
together multiple
fault states that have similar symptoms (like wheel failures) and treats them
as a single state
which now has a higher probability. If this merged state begins to become more
probable (i.e. it
appears that one of the faults in the grouping has occurred) it can be decomposed
into individual
states to find the exact fault. The third technique incorporates sensor data
into the choice of
particles, thereby making it more likely that states which explain those data
will be sampled.
Using these techniques the authors demonstrate impressive accuracy using relatively
small
numbers of particles. However, they do not provide data on the actual amount
of computation
required to implement their system so it is difficult to compare it directly
to the results from the
previous paper.
J.L. Bresina and R. Washington, “Robustness via Run-time Adaptation of Contingent
Plans,”
Proceedings of the AAAI-2001 Spring Syposium: Robust Autonomy. Stanford, CA.
This paper was selected to help answer the question of what happens after a
fault is detected.
The authors describe a command language and execution environment which allows
for plans to
be dynamically altered in response to failures or new opportunities without
requiring a fullblown
re-planning. It is especially useful in environments like the current Mars rovers
where the
robot is not capable of doing its own planning to begin with.
In this system, it is assumed that the robot receives a plan from an external
source and then
operates autonomously until its next scheduled plan upload. With existing technology,
if the
robot experiences a failure and cannot complete its plan, it will sit idle until
it can receive
instructions. Using the system proposed here, the uploaded plan would include
alternate plans
that could be executed if the robot detects that conditions were not as expected.
This could allow
a rover to try alternate techniques to achieve its goals or even switch to different
goals if it
encounters a difficulty with the original plan.
Two core techniques are presented, plan step skipping and alternate plan merging.
Using plan
step skipping, the robot would be able to skip over portions of the plan which
are no longer
reasonable and continue on with the rest of the plan, thereby achieving at least
some of its
original goals. For example, if a plan included taking several types of measurements
but of the
instruments had a failure, the other measurements could still be performed.
Using plan merging,
the robot would able to splice in a plan fragment either by inserting it between
plan steps or
replacing existing ones. This could allow the robot to retry an action which
has failed, perform
alternate actions if it continues to fail or perform extra actions if resources
permit. The key to
making this work it to assign utilities to different portions of the plan and
the alternate plan
fragments. Each plan step also includes conditions which must be met for the
step to be
executed. If the robot finds that a step cannot be executed, it can consider
skipping steps or
merging alternate plans to find the new plan with the highest expected utility.
This paper is an interesting extension to the status quo for rover control.
It allows rovers to
execute complex alternate plans by simply recalculating utility functions. While
the resulting
plans are more limited than you would get from a complete replanning, the reduced
computational complexity makes it a good solution where computational resources
on the robot
are extremely limited.
The robot proposed in part B would be a very ambitious project. I believe that
a manageable
alternative would be to design one or two appendages, model them in a simulation
and
implement a fault detection system for those appendages. Since I would not have
an actual
robot, I would have to simulate failures and sensor noise. I could create a
library of motion plans
for the appendages by hand rather than implementing a planner.
I’d say the chances of my pursuing this project are about 40% but I would still
like to pursue a
project having something to do with fault detection and replanning.
Dynamic Bayesian Networks and Continuous Time Bayesian Networks
Dynamic Bayesian networks are nowadays the method of choice for supervised sequential
learning. One particular case of DBN is hidden markov models that are widely
used in different domain such as speech recogntion. One extension to DBNs are
continuous time Bayesian networks. Continuous time Bayesian networks (CTBN)
describe structured stochastic processes with finitely many states that evolve
over continuous time. A CTBN is a directed (possibly cyclic) dependency graph
over a set of variables, each of which represents a finite state continuous
time Markov process whose transition model is a function of its parents. CTBNs
are relatively new in the field of machine learning, they provide a way to do
inference on continuous state spaces without having to discretize the state
space as usually done in dynamic Bayesian networks. I would like to learn more
about them and probably implement a reasoning system using DBNs or CTBNs.
Filtering and state space estimation with particle filters
Particle filters are nowadays very popular and probably the method of choice
for doing inference on Bayesian networks and dynamic Bayesian networks. The
reason is because they are an approximated way of doing inference based on monte-carlo
sampling techniques and their accuracy depend on the number of particles used
(samples of the system). Also, particle filters are widely used in position
estimation since they can represent and solve non-linear problems that the Kalman
filter and EKF cannot solve or takes excessive time to solve. I would like to
gain practical experience by using this technique applied to inference on large
Bayesian networks and state spaces.
Reinforcement learning
There is a lot of research done in applying machine learning algorithms and
pattern recognition techniques to infer human activity by smart homes and robots.
However there is very little research on how to fix the models of human activity
in real-time if they are not performing well. More importantly, most of these
ML techniques require extensive training data to perform with a reasonable accuracy
and most of the time, in the home setting, it is not feasible to assume that
the user will provide 50-100 examples of activities to recognize. I believe
that
novel reinforcement learning techniques could provide a robust way to fix the
models in real-time by just having to provide answers such as ‘Yes’ or ‘No’
whenever the system correctly or incorrectly predicts the human activities.
The three papers selected are the following:
Philipose, M., K. Fishkin, et al. (2003). "Guide: Towards Understanding
Daily Life via Auto-Identification and Statistical Analysis." UbiHealth
Workshop at Ubicomp 2003.
In this paper, a system for recognizing activities in the home setting is proposed
by using a combination of dynamic Bayesian networks and RFID tags attached to
everyday objects in the environment. I think that this is a very interesting
approach to the problem of figuring out what a person is doing in real-time.
The authors assume that a person using such a system in the home would wear
a wearable RFID glove in their hand that will read the RFID tags every time
the person interact with different everyday objects. The main idea is the following:
You wake up, touch the toothpaste, then the toothbrush and also open the cold
water faucet. If you have sensors in these different objects and an inference
algorithm, a computer or robot could infer that you are brushing your teeth.
Another novelty of the paper is the fact that they automatically extract human activity models from the web by mining websites such as ehow.com that have “recipes” on how to carry out different human activities. Thus, this approach do not require extensive examples of activities in order to train the machine learning algorithms. The authors compile the models of human activities from the web in dynamic Bayesian networks and use particle filters to infer what the most likely model is given the trace of RFID sensor activations. I basically have two different criticisms: one is that we cannot assume that a person at home will always use a wearable glove that is the basis of the system to work. Furthermore, I think that the approach of compiling human activities mined from the web into DBNs is promising, however, I believe that it would be important to provide a way to fix or retrain the system in case the system is not recognizing the activities accurate enough. I think this is the main flaw of this particular approach. This paper is relevant to the cognitive robotic system I am planning to build because I will use a similar approach. Instead of making the person wear a glove, I will place the sensors in the environment. Small sensors everywhere that could provide me information about object’s usability.
Wilson, D. "Simultaneous Tracking & Activity Recognition (STAR) Using
Many Anonymous, Binary Sensors." Pervasive 2005.
In this paper, a system for recognizing activities in the home setting and for
tracking people’s location in a room base level is proposed. The system uses
sensors commonly found in current security systems such as contact reed switches
located in doors and windows, as well as motion sensors. For the inference,
the author proposes an inference algorithm based on layers of dynamic Bayesian
networks in order to infer the inhabitant’s activities. The beauty of using
layers of DBNs is that the lower layers of the DBNs can be retrained to compensate
for changes in sensor readings due to noise or different home settings while
keeping the upper layers with no changes. Given the limited amount of sensors
that the author suggests, the granularity of the system in terms of the different
activities that it is able to recognize is not big enough. The author proposes
to recognize a subset of 5 different everyday activities as well as to track
one persons location in the environment. This paper is relevant because it uses
sensors in the environment to track people’s location as well as people’s activities.
I like the idea of using different layers of dynamic Bayesian networks to perform
inference at different levels of granularity about human activities. I will
probably implement something similar for my final project and probably add another
layer for correcting the system whenever a misclassification is performed. Finally
the system proposes to train the system by providing explicit training examples
which I think might be impractical for some activities. I believe that the combination
of prior information about human activities (probably mined from the web) as
well as from explicit training examples could prove to be particularly powerful.
Patterson, D., D. Fox, et al. (2003). "Expressive, Tractable and Scalable
Techniques for Modeling Activities of Daily Living." UbiHealth 2003: The
2nd International Workshop on Ubiquitous Computing for Pervasive Healthcare
Applications.
This paper is relevant since it touches on different aspects of tractability
and scalability in methods for inferring and modeling human activity. The methods
proposed are similar to dynamic Bayesian networks and the main inference algorithm
proposed is again a particle filter. Prior information about human activities
is provided in the form of activity models. The author proposes some changes
to the conventional way probabilities are used in DBN systems and proposes the
concept of mutually exclusive probabilities. This paper is relevant because
it also uses similar reasoning techniques to infer people’s activities but touches
on issues of scalability and tractability.
I think that for my final project I will implement a stationary robot embedded
in the home environment that will be able to reason about its inhabitants activities
and take intelligent decisions according. One possible approach could be to
infer the right time to interrupt the user to provide some information or reminder.
Another interesting approach could be to try to do some action execution monitoring
for elderly people. Some people suffering from dementia are have problems performing
activities such as preparing coffee or tea because they forget if they have
already put sugar in the coffee. A system that would provide assistance is such
domain should be able to take readings from sensors placed in different objects,
infer the current step in an activity that the user is in, and provide a plan
of action towards what the next step in the activity sequence should be.
This problem combines interesting aspects: A reasoning algorithm to infer the
current system state, and an action/planning algorithm to find what the next
step or state of the system should be.
1.1 human-robot interaction
Part of the effort to make robots integrated into society is the effort to improve
human-robot
interactions. Human-Robot interactions can be likened to an intelligent user
interface, which might
need to modify its behavior based on human moods, attitudes, expertise, et.
Some of the current
work in general human-robot interaction includes language processing, fuzzy
logic, decision trees,
emotion imitating, personality, gesture/facial recognition, and modeling human
behavior. I am
interested both in what qualities make for a robot that is easy to interact
with, and how a robot
can model human behavior.
1.2 Bayesian Belief Networks
These networks bring together probabilities and directed graphs, and are used
frequently in machine
learning. The graph carries information about the conditional dependencies between
nodes, and
about the probability distribution functions of the nodes (often assumed Gaussian).
Nodes can be
random variables, parameters, observations, or hypotheses. The problem with
using these networks
is that acquiring knowledge and updating them are time costly. Some problems,
such as inference,
are NP-hard. I would like to more about how these networks can be applied, and
what sorts of
approximation algorithms exist to help learn a Bayesian network.
1.3 Multi-agent Task Solving in Robotics
Some of the problems of multi-agent robotics include cooperation, diversity,
communication, dis-
tributed learning and distributed planning. Some of the test beds of these principles
include robotic
soccer, foraging, and defending a home base. Some of the current research looks
at optimizing ones
individual task for the benefit of the group, collective locomotion, insect
behavior, and learning
team solutions. I would like to learn how a group of entities learns about a
problem, as opposed to
a single entity.
I chose three papers on work in virtual tutors, focusing on weakness identification
and student
models. Of the three papers, one paper provides a current implementation example,
the second
describes a higher level goal for what should be included in a virtual tutor,
and the last gives a
mathematical model for a student. Aaron D’Souza’s paper gives an example of
what sorts of work
are currently being done in commercial virtual tutoring, and provides concrete
implementation
details. The second paper, “Tactical Action Officer Intelligent Tutoring System
(TAO ITS)” by
Richard Stottler, describes a direction in which to move ITS, and describes
the effectiveness of
ITS in a high dimensional learning environment. The last paper, section 4 of
“An Intelligent
Distributed Environment for Active Learning” by Yi Shang, Hongchi Shi, and Su-Shing
Chen,
gives a mathematical description of a student model, and describes the strengths
and weaknesses
of this model. Because the student model is only roughly covered by the previous
two papers, it
fleshes out some of the important details that need to be understood before
implementing a virtual
tutor. Each of these papers describe a different architecture, and focus on
teaching a different
kind of knowledge to the student; however, all of them give information on student
models and
interactive tutoring.
2.1 Paper 1: An Automated Lab Instructor for Simulated Science Experiments
Source:
Aaron D’Souza, Jeff Rickel, Bruno Herreros, and W. Lewis Johnson. “An
automated lab
instructor for simulated science experiments”. In Proceedings of the International
Conference
on Artificial Intelligence in Education (AI-ED 2001), pp. 65-76, San Antonio,
TX, May 2001.
I selected this paper while searching for ideas on how to identify student weaknesses.
I found few
papers that mentioned student weaknesses identification at all, and most of
those that did referenced
this paper as a concrete implementation. This paper turned out to be about the
implementation
of an automated lab instructor (ALI). The ALI program allows students to modify
variables in a
few simulated experiments, and asks them multiple choice questions on what they
have learned.
ALI also suggests experimental variables to modify. ALI’s big successes are
asking appropriate
questions and suggesting good variables by noticing when an experiment demonstrated
a concept
well, and remembering if the student has already learned that concept.
This paper made several contributions towards implementing a virtual tutor.
It claims to be
one of the first fusions of previous work done in language, knowledge representation,
tutoring, and
physics simulation. Many commercial tutors used language or physical simulation,
but not both.
This paper provides an architecture of the program, which could be used to implement
similar
projects. It also provides evidence of the system’s effectiveness with a small
study of the program
on 6 high school students.
This paper’s biggest strength is in its descriptions of implementation details,
which are fairly
clear and explain enough to help someone else build a similar system. The study’s
methodology
also made a strong case; after using the program, the students gave more concrete
answers and
focused on the relationship between variables on the post test.
This paper’s greatest weakness was a lack of emphasis on big ideas, or higher
level concepts.
It main idea is the architecture for organizing the important components, such
as a student model
and low level reasoner. However, each of these components is very simplified.
For instance, the
student model only stored if the student had seen concepts, had them explained,
and correctly
answered questions on them. It did not store the way in which the student learned
concepts, or
what problems the student could apply them to. The paper did not make clear
if the program
could easily be extended to include this information, or what other components
were similarly
incomplete. Also, while the methodology of the study seemed very good, the size
of the study (6
students) was too small to be significant.
After reading this paper, I went back and revised my cognitive robotic system’s
architecture.
This paper’s emphasis on symbolic representation also gave me the idea to embed
the higher level
concepts into the programming language, similar to the fault isolation lecture
in class.
Many of the low level implementation details in this paper would not apply to
cognitive robot,
because of the differences in environment. While ALI could work in an environment
in which the
student behavior was restricted to modifying a small number of variables in
a controlled situation,
or choosing a multiple choice answer in response to a question, a robotic karate
teacher would need
to work in an environment where the student response was more varied, and the
student knowledge
for each concept could not be expressed well as ’done’ or ’not done’.
2.2 Paper 2: Tactical Action Officer Intelligent Tutoring System
source:
Stottler, R., M. Vinkavich (November 2000), ”Tactical
Action Officer Intelligent Tutoring Sys-
tem”, Proceedings of the Industry/Interservice, Training, Simulation &
Education Conference
(I/ITSEC 2000)
I found a reference to the TAO ITS while searching for a method of creating
a student model
when the simulation has more unknowns than how the student will modify two or
three experimen-
tal variables. The TAO ITS is a program owned by the U.S. Navy, used to help
train future officers
in tactics. The tutor observes the student playing a ship on a simulated battle
field, and provides
feedback after game play on how successful the student was. The program can
also design simula-
tions, warn a human teacher about student difficulties, give reading material
on tactical concepts,
and replay student errors. This program interested me because the student has
a range of actions
that he can choose to solve a problem in which enemy goals are ambiguous. I
selected this report
on the system because it appeared to have the most details about the program
and what were the
goals of the designers.
This paper’s biggest contribution was the description of the qualities of a
good student model.
A student model should include tasks the student has performed, and his performance
on those
tasks. A student model also needs a method to estimate mastery of skills, knowledge,
and the
ability to apply them when appropriate. The tutor must also keep track of what
circumstances the
student can apply the knowledge. This student model not only accounts for factual
knowledge, but
the ability to integrate that knowledge into experience. The high level ideas
in this paper were its
strength. The ideas in the paper suggested a direction in which ITS research
should move.
However, very little implementation suggestions were made. After describing
what ITS is, and
the high level goals of it, the paper mostly focused on look of the user interface.
Additionally, while
the paper provided a promising looking study of the opinions of 12 students
on the program, the
study did no evaluation to see if the students actually learned by using the
program.
This paper describes several aspects of an ITS that I feel are necessary for
the karate tutor
cognitive robot. The student model and the weakness finding of the tutor model
share several key
features. For example, in both tactics and sports the ability to apply knowledge
at the appropriate
time is as important as the knowledge itself. Also, the student learning environment
in this system
has both environmental uncertainty, and many options for student behavior. The
karate tutor will
need to be able to function in an equally difficult environment. However, there
are some differences
between the tactics and karate teaching, such the value of waiting to the end
of a simulation to
provide feedback, which would need to be modified for the karate tutor.
Source:
Yi Shang, Hongchi Shi, and Su-Shing Chen, “An
Intelligent Distributed Environment for Active
Learning” Section 4, Department of Computer Engineering and Computer Science
at University of
Missouri-Columbia, CopyrightMay 1-5, 2001, Hong Kong.
While looking though summary papers on student models, I noticed that several
of them men-
tioned using a Bayesian model for the student. I chose this paper because it
provided a mathemat-
ical explanation of how to feasibly model a student as a Bayesian model, and
blatantly listed the
strengths and weaknesses of the model.
The major contributions of this section are concisely explaining what methods
of student mod-
eling have already been attempted, what kinds of weaknesses are common in student
models, and
proposing a student model which is both computationally reasonable and models
uncertainty in
student behavior. The proposed model is explained in mathematical terms, and
the model’s weak-
nesses and strengths are listed.
Section 4 provides the information needed to understand many other papers on
ITS. It also
completely opens up the black box student model, and shows a great deal of the
details that would
be needed. This section’s main weakness is that it doesn’t provide an example
of this being used,
or show evidence of any testing.
This paper shows some of the internals of a student tutor that the karate tutor
would need
to model student behavior. This model is robust against students accidentally
making errors or
guessing, to it gives a better explanation of students than the boolean model
described in the first
paper. Also, it is a linear time algorithm in the number of data items, so it
should be able to be
run in real time. While this exact model does not meet the karate tutor’s needs
because it can only
model one skill, it could be extended to meet the karate tutor’s needs.
While I would really like to give an advanced lecture on this sort of material,
I am worried that its
scope is too large to simplify into a simple term project.
A simple project in this area would be to create and test a student model which
could identify
student weaknesses in addition to their strengths. To complete this goal, at
least 5 modules would
be needed: an environment, a abstract concept representation, a student model,
and two reasoners.
The simple project would not have a tutor model to avoid the need for language
processing.
The environment in which the student learns should be at least as complex as
a lode runner
type game, where there are a variety of actions which constitute the same higher
level concept,
and there are several scenarios where each concept could be applied in. Ideally,
the game should
involve some uncertainty, such as other agents or two human players, so that
the space of learn-able
concepts is richer.
In order for the student model to be tested, the system would also need two
reasoners. One
reasoner would identify situations in which a concept could be learned. The
second reasoner would
identify how the student behaved in that situation. Both of these reasoners
would be dependent
on a good concept representation model.
Rather than have a tutor, the student model could print out suggestions for
a human tutor
listing the concepts seen and the expected level of understanding. Creating
a tutor in addition to
a student model would probably be too large a project for the course of one
term, because of the
amount of labor that would need to go into language processing, user interface,
and creation of new
environments to practice the skills that are weak.
In order to test the success of the student model, several human students should
interact with
the program, with the observation of a human teacher. Between sessions, the
human teacher should
provide tutoring in specific concepts, while not providing tutoring in others.
The response of the
model over time should indicate a faster strengthening in the tutored concepts
than those that are
not tutored.
This project would embody a few of the salient features of the karate tutor.
Like the karate
tutor, the learning environment of this project would be reasonably complicated,
and the student
would have a variety of possible actions to take.
I don’t think it is likely that I will do this project. I am worried about the
size of the project,
compared to the value. I believe that this project would be a rediscovery of
work that has already
been done with TAO ITS. In addition to the student model and reasoners which
I am interested
in working on, this project also requires creating the environment for the student
to learn in, and
learning enough about this environment to develop higher level concepts for
them to study. I worry
that the cost of this project is much greater than the value gained.
Another project that I have been thinking of is extending SLAM to include multiple
agents
that could communicate maps to each other. For this project, I would need to
create a software
environment, re-implement SLAM, and create some robotic agents for the environment.
I would
then need to extend SLAM to incorporate simplified maps from other agents, and
identify moving
objects. Agents would behave slightly differently, so that the map received
from another agent
would have an unknown level of trust. I feel that this project is a better term
project because there
are incremental levels of completion, and it is easy to add other interesting
features if I finish early.
Also, most of the programming would be focused on the reasoning skill, rather
than the test bed.
The three topics in reasoning applied to embedded systems I would like to learn more about are the following. The first two are related to the project I would like to pursue, and the third one to something I already partially study and that is a real key issue for incoming space exploration.
• Search algorithms: I don’t know much about this topic which is the most important for chess players. Today’s computers allow us to do amazing calculations and to explore tons of possibilities in very short times, and it is easy to forget that for even pretty simple real-life problems, the pool of answers in so huge that we can’t just explore everything. The real issue when we want to solve a problem is not the capability of the computer to theoretically solve it, but rather the time it needs to solve it. Indeed if it will need six months to find an answer, it can be useless since the problem itself could change during these six months. This is even truer for potentially infinite problems such as what a chess player should play in a certain chessboard configuration. Most of the time there won’t be any solutions that will lead to victory whatever the human player choose to play (100% probability victory). In such cases we have to decide when to stop the algorithm searching for the best move, knowing that it could be a potentially better move that has not been found yet. As the limit is usually given by the calculation time – actually it is the deepness of the search but this is chosen functioned of the calculation time – the real challenge is to find faster search algorithms. Thus the level of a chess player is firstly given by the search algorithm that it uses.
• Machine learning: by that I mean the ability of a computer program to evaluate from its original state to improve its performance. This is also a real issue in robotics and any AI system. Theoretically I believe that if we can develop a good enough learning process, we will be able to create human-like robot able to grow up from “childhood-like state” to adult state and that will really be unique and autonomous as we are. As far as current interest, I would like to learn more about this topic because this is the real essence of AI and robots. If we are to send robots on Mars, it would be a real improvement if those could act more by themselves and not wait all the time for the eight minutes communication delay with Earth. However as we don’t really know the Martian environment, this could only succeed if the robots are able to learn alone from what they see and touch. Closer to my project, it can be useful to develop a chess player that can improved itself by playing against the same group of human and even against humans in general (for I think humans in general – with the exception of world champions – tend to play according to a certain scheme). It could allow the AI chess player to define its own heuristic process that will lead it to victory.
• Communication with humans: Future space exploration should involve human / robot cooperation since one day we will be bored to send only robots and we will like to bring the space exploration to another level, and because, at the same time, it’s cheaper and safer to send robots than humans. Then if we want to use both robots and humans in cooperation, we must find a way for them to communicate while carrying a mission on Mars for example. The robots are currently seen as “extra-hands” for the astronauts, therefore they must be able to understand the orders and request the humans will tell them. This is the first part of communication and I already worked on this problem of voice recognition. I coded a Matlab program able to recognize a small dictionary based on voice characteristics. But at the same time robots should be able to address concerns, warning or simple answers to the astronauts: we would largely improve the cooperation if communication could go both ways. I would be glad to conceive a robot able to discuss with humans, play trivia or tell jokes and stories in a pretty fluent language – even limited to certain fields.
I chose three papers on computer chess player dealing with both learning and search algorithms as it is the foundation of the AI that I will use. My last paper is not directly related to what I will do during my project, but is rather an opening to show what we can do with AI chess player beside only play normal chess.
“An Evolutionary Approach for the Tuning of a Chess Evaluation Function
using Population Dynamics”. Graham Kendall, Glenn Whitwell.
I chose this paper because it explains how to use computers real power, which
is to be able to do the same thing hundreds of time in a really short time with
no error, to find the best chess evaluation function. It has the advantage of
giving an optimal evaluation function with a rather low cost. I also chose this
article because it shows that even light modifications to the base of the chess
player, like the weighing of the pieces, can really change the way the AI plays.
The paper presents how to use machine learning in order to find the optimal
evaluation function just by comparison between all the possibilities. Instead
of imposing a certain weighing to the evaluation function, the idea of the article
is to use machine learning techniques to allow the computer to choose the best
estimate of the optimal evaluation function within a random population. The
algorithm first step is to generate the random population of evaluation function
candidates. Then the members of the population compete against each other, to
find out which one is the more efficient. When a candidate looses a game, it
is discarded from the pool and a clone of the winning candidate is added to
the population instead, in order to accelerate the convergence, until there
is only one member left in the population. The authors point out that if we
apply just a naïve method like that, we might end up with the best estimate
of the optimal evaluation function within the population, but this may still
be far from the optimal itself. In order to reach this objective, they present
another way to replace the discarded candidates by a crossover evaluation function
child of the two parents with weightings which depends upon the result of the
game and the standard deviation of the parents’ parameters. This allows us to
find a final function, toward which the population is converging, closer from
the optimal evaluation function. If the initial repartition of the population
was fair enough, we will hopefully find the optimal itself. The final evaluation
function that comes up at the end of this algorithm is then tested successfully
against a commercial chess program which proves the efficiency of the method.
What I liked in this article is the idea that a simple algorithm can lead to
a great improvement of the chess player. Indeed both the original idea to optimize
the evaluation function and the way to do it are at the same time simple and
smart and the results are great. On the other hand, there are maybe some points
that need more explanations like the fact that the members of the population
compete only against some other members and not all of them. Maybe one kind
of weighting might be efficient against another one but worse in average… At
the same time this is only validate for three moves ahead, which is pretty low
for computer chess and one can think that it could for example help the bishops
against the knights since the latter needs more moves to be efficient.
Compare to the two other papers, this article is like a first step as it chooses
the coefficients that will be used in the search algorithm. It is therefore
needed if we want the search to be efficient and it must be done before trying
to improve the search itself.
It will apply this method to my chess player because it leads straight forward
to the optimal evaluation function for the search algorithm.
“New Advance in Alpha-Beta Searching”. Jonathan Scaeffer, Aske Plaat.
I chose this article because it describes the core of any computer chess player:
the alpha-beta searching. Indeed most chess players use this pretty simple and
brute algorithm. I found the paper interesting because it states how this algorithm
works, what are its strengths and it weaknesses, and above all what can be done
to improve it despite the years of researches already spent on it.
After recalling the importance of the Alpha-Beta algorithm in game tree search,
the paper summarizes its evolution from the earlier version to the actual one.
It points out four improvements that have already been done and then presents
three new enhancements. The first four modifications that has been done is the
use of transposition tables to reduce the complexity of the tree by recognizing
previously visited nodes (cutoffs), and buy calculation time with memory resources.
Then move ordering can maximize cutoffs effectiveness (using a transposition
table) if we sort the moves at each nodes in a best-to-worst order instead of
a depth order. The heuristic choice of the search window at the beginning of
the search also helps a lot to eliminate many nodes that with a certain level
of confidence shouldn’t lead to the best solution. And finally, we can save
resources using a variable search depth that allocates more resources to the
promising nodes and leaves aside the weaken ones. The paper then proposes three
new enhancements to the current version of the Alpha-Beta search used for computer
chess. The first idea is useful if the search algorithm ends in a fail low -
the value of the first move is less than the search window - because of an unlucky
guess of the search window. The authors propose to restart the algorithm using
the transposition table to seed a new search which seems to improve the quality
of the search even if it is hard to evaluate the real value of the improvement.
The second idea helps to narrow the search window with a guessing of the values
that we should obtain at the end of the algorithm so that the calculations time
is a lot smaller. It uses the evaluation function score from previous iterations,
according to the MTD algorithm, which improves the algorithm by 9%. Finally
the last enhancement is an attempt to maximize the benefits of the transposition
table by doing additional lookups after cutoffs. This is called the Enhanced
Transposition Cutoffs and it leads to a reduction of the search tree by 28%.
The results are illustrated with experiments comparing current computer chess
player with specific one that present these new enhancements and it appears
that with the three new ideas presented in this article we may be able to reduce
the search effort in chess by 35%.
The strengths of this paper are all the tricks it gives to improve the Alpha-Beta
search – both the old and the new ones – but at the same time this looks a little
bit like a laundry list. Indeed, there is no link between the tricks and the
theoretical justifications are often dark. They seem to present empirical process
that the reader has to believe rather than theoretical analysis.
This paper is dealing with the main aspect of the chess player which is the
search algorithm and it is the continuation of the first paper in a sense that
it finds some tricks to improve the search algorithm using the evaluation function
designed in the first paper.
In my project, the results presented in this paper will be very useful because
it will allow me to improve the efficiency of the Alpha-Beta search in order
to construct deeper trees with less time resources.
“Information-Theoretic Advisors in Invisible Chess”. Bud, Albrecht, Nicholson, Zukerman.
This article is a little bit different from the two first one because it won’t
be useful directly for my project. However it gives a good idea on how to use
chess as a test for decision making strategy in complicated real-life problems.
I am really interested in chess player by themselves because it is fun to apply
my computer science knowledge on a game, but at the same time I am also interested
in creating a pseudo-AI able to play chess because it is the first step to develop
more powerful AI for real-life robots. This paper is therefore really appropriated
since it shows how we could use chess as a test for real-world strategic situations.
If games have some properties of real-world situations the main difference between
chess and most of the real life problems is the level of uncertainty. Indeed,
whereas real-life is full of uncertainty, chess are known to be one of the only
games with no uncertainty at all. Therefore, in order to use chess as a test-bed
for real-world problems, the article introduces the concept of invisible chess.
This is basically a normal chess game except that each player has a certain
number of pieces that his opponents can’t see. This is interesting because by
choosing the number of pieces that are unknown, we can change the degree of
uncertainty of the game – to match real-world situation – and as it is still
chess we can keep a high strategic level. Since each player can’t see some pieces
of the other player, if we were to use only normal chess trees to find the best
move, there would be a combinatorial explosion. The authors propose to counterbalance
the new strategic complexity of invisible chess by adding an Information Theoretic
Advisor to the Strategic Advisor. Indeed the early results – tests using only
the strategic advisor – tend to show that the player with more information about
the game will win more often, which seems pretty obvious. We can therefore increase
the chances of success by using a weighed combination of the two kinds of advisor.
According to the results presented in the articles, the efficiency of the algorithm
is better if we use a combination of the two advisors; however, if the weightings
are too much in favor of one advisor the results become quickly worse.
Thus this paper proves that we can use chess to demonstrate results about more
complicated problems closer to real-world situations. I liked this idea of using
chess as a demonstration field for something (uncertainty in this article) that
has nothing to do with this game, because it illustrates why chess has been
such an important step in AI development. However, the paper misses some details
about the strategic advisor they used and what could be the applications in
real-world situations of these results. Moreover, I think that more experimental
data on the advisors, with maybe more that only three ratios for the relative
influence of the advisors could have brought interesting subtle differences
in the results.
This paper is like an opening after the two first articles, showing that we
can test with chess more that only strategic issues. This is the main reason
why I chose it.
This paper won’t really help me in my project, but it could help me after my
project if I want to use what I did in another context.
During the last third of the class I would like to conceive a chess player
applying improved search algorithms – based on my computer knowledge – and heuristic
processes – based on my chess experience. I would also like to include machine
learning to have the AI improving its level with games. This project could be
done by a group of students designing the core of the AI together, but each
developing his own version of the chess player so that we can have them play
against each other at the end of the term to see the relative efficiency of
the heuristic processes.
Planning with observations/sensor data
Traditional planning assumes that we have complete knowledge of the world in
which the plan runs in. However, this assumption is unrealistic for real world
scenarios where parts of the environment may be unknown or uncertain. There
are several approaches to dealing with planning which involves sensor data that
that can only be obtained after the initiation of a plan: reactive planning,
contingency planning, etc. But the one I’m more interested in is when a system
interleaves planning and execution. This method requires the planner to make
as complete a plan as possible prior to execution, and modifies incomplete parts
of the plan as information becomes available. There are benefits and issues
associated with this approach which will be covered more in section D.
Human Robot interactions
This is a complicated subject that involves various types of reasoning. The
first thought that comes to mind when thinking of human robot interaction would
be the method of communication between the 2 parties which could involve techniques
such as image mapping, sound recognition, body language/identifications of gestures,
etc. Another could be the functionality of the robot and how it may be useful
to humans. However, a question always on the back of my mind is the optimal
yet safe level and method of control that a human should have over a robot with
autonomous capabilities. This question originates from the aircraft example
of who gets final authority between human pilot and onboard computer system.
And I’m not sure how much research was done on this topic within robotics, or
whether it’s even an issue. But as robots get more advanced with faster and
better reasoning algorithms, it is possible to face situations where a robot’s
analysis of the state of a system is different from that of the human controlling
or accompanying it. Then the questions becomes, if a human issues a command
that conflicts with the robots internal reasoning system, should the robot be
able to override it? If not, and a conflicting command is indeed issued, how
would that affect the robot’s logic and reasoning, and how should the robot
deal with it? Hopefully, I can explore this question a little more through this
course and other opportunities.
Learning Algorithms
For me, rather than a topic of interest, machine learning can probably be more
accurately described as a topic of fear. Through some basic knowledge, state-of-art
algorithms, and recognition of patterns from real world experience, a system
can build on its knowledge and effectively become “smarter.” Now, combine that
with the possibility of a robot with authority or override human command (as
described above). This is probably where all sci-fi nightmares began….
1. “An Analysis of Sensor-based task planning.” By Olawsky, Krebsbach, Gini 1995
This is an older paper (not published within the last 5 years). However, I am presenting it here because it gives a good, comprehensive overview of interleaving planning and execution along with major drawbacks and issues to consider. More recent papers, on the other hand, tend to dive deeply into specific methods and details and fail to address the big picture. (Or it is possible that more recent yet comprehensive papers exist, and I have simply failed to uncover them within the time available to research this topic.)
Traditionally, the planning process is completed before execution begins; but this paper describes the approach of incorporating sensor data collected at execution time into the plan when it is beneficial to do so. In the case where planner lacks sufficient information, it may either assume a default value or defer that portion of the plan to a later time when sensor data can be obtained. The former can increase the uncertainty in the plan; however, it does allow the planner to generate a more complete plan prior to execution. Deferring, on the other hand, can greatly reduce uncertainty but it does present problems of its own: 1) deferring may be prohibitively expensive, 2) early execution of actions without certain required information may interfere with goals not yet considered, and 3) data later collected may still be incorrect or uncertain.
Given these considerations the authors identified three key factors for consideration
when selecting a strategy for making such a plan:
1. Goal ordering – what goals the planner should try to satisfy first, not necessarily
the order of execution, but the order in which a particular portion of plan
is completed. The key strategy here is to sense as much of the information as
possible in the early parts of the plan.
2. When to sense – when to switch between planning and execution. There are
various methods, but the authors choose to defer goals where further planning
requires sensing and only begin execution when plans for all goals are either
completed or deferred.
3. What to sense – what to sense and what to default. These decisions are made
based on default reliability, sensor reliability, planning costs, execution
costs, and the cost of human intervention if applicable.
The authors also provided two metrics for assessing the quality of a plan generated: execution cost, cost of actions during all phases of execution; success rate, percentage of problems where no action needs to be undone due to premature execution of a plan.
Given these factors of consideration and plan quality measurements, authors then proceed to dive into their test world call the Tool Box and to provide detailed cost and success analysis on tests performed in this world. The major flaw of this paper is that the Tool Box world is much more simplified than the real world, and their planner exploit domain depended properties of the Tool Box world not useful for a generic world; there are also various simplifying assumptions made throughout the analysis. Therefore, although the Tool Box does demonstrate the functionalities of interleaving planning and execution, it does not serve as convincing evidence that this method would be feasible for more general situations.
Finally, the paper concludes on a particularly helpful note by briefly covering other methods of dealing with missing information such as reactive and contingency planning, emphasizing the difference between uncertainty and missing information, and defining open world and close world assumptions used.
2. “Interleaving Execution and Planning via Symbolic Model Checking.” Bertoli, Cimatti, Traverso. 2003
This is practically the only recently presented paper I found that considers interleaving execution and planning. Most of the other research in the area seems to focus on dealing with uncertainty, conformant planning, and how to reduce the large, impractical conditional tree plans so that one may hope to generate the entire plan off-line, prior to execution. But although this paper focuses on interleaving planning and execution, the method presented is very different than that described in the previous one.
Like the title suggests this paper presents an interleaving algorithm that uses symbolic model checking to plan over nondeterministic, partially observable domains. In order to achieve this, it uses a state-of-the-art off-line conditional plan generator between executions, and uses observations made during executions to further propagate the plan. An oversimplified version of the algorithm follows:
1. Start with a given belief state
2. Generate a partial conditional plan up to the next observation point. (A
conditional plan is a tree structured plan which branches over non-deterministic
domains and noisy sensor inputs. Each branch in the tree transverses a series
of belief states)
3. The system executes the plan and is guaranteed to follow one of the branches
in the partial conditional plan.
4. The system takes in the new current belief state and generates another conditional
plan. 5. Algorithm continues until termination.
One key point about this algorithm is that it is guaranteed to terminate because there is a condition in the algorithm which states that every run must transverse a belief state that have not been previously visited. However, the major weakness of this algorithm is that it does not guarantee a plan which reaches the goal state. This is to say, one of the termination points could be a state from where the goal cannot be reached; so the algorithm is guaranteed to terminate at the goal state only within safely explorable domains, where the algorithm cannot get trapped in a state where no strong path to the goal exist.
3. “Complexity of Planning with Partial Observability.” Rintanen. 2003
Perhaps the greatest weakness of this paper (for the purpose of this assignment) is it contains nothing directly applicable to the design of a planner or my potential final project for this class. While the first 2 papers discussed above either present an algorithm or a planner capable of performing planning interleaved with execution, this third paper is much more theoretical in nature and focuses on the complexity of the planning problem. However, I feel that the complexity bound is what gives us the bottom-line evaluation metric for the difficulty of a problem. Thus, although not suited for practical use, the material presented in this paper is no less important than the other 2 and thus deserves to be reviewed.
The main finding presented in this paper is, “for non-probabilistic (success probability 1) partially observable planning the plan existence problem is 2-EXP-complete.” The rest of the paper is mostly spent on proving this finding as well as providing more direct proves for the EXP-hardness of conditional planning with full observability and the EXPSPACE-hardness of conditional planning without observability. This paper also pointed out that, “plan existence for classical planning (deterministic, one initial state) is [only] PSPACE-complete.” At least for me, this really put into scope just how difficult planning for real or realistic scenarios is and the need for better, faster algorithms if we wish to accomplish reasonable planning outside the controlled lab environment.
The paper also leaves open the question of complexity of plans with success probability less than 1 and the difficulty level of optimal planning.
The somewhat obvious project that derives from parts B and C would be to implement a planner which interleaves planning and execution and uses observations to revise or complete plans. The exact type of planner and algorithm that I would use is still unknown at this point. And depending on its complexity, other features can be added or removed, such as algorithm to identify target (or goals), a simulation (via on screen display) where the plan generated is actually carried out, etc. Sensory data can be taken from a simple piece of hardware linked to the planner or simulated using keyboard inputs or even data files.
The likelihood for me to pursue this project depends largely on what I learn
about this type of planning through the course of the semester. I may also be
swayed by what interesting projects other people came up with. (I would very
much like to do something involving human interaction with robot!) And the direction
of my research will also play a role in project selection.
1. Reinforcement Learning
I am interested in reinforcement learning primarily because I think it is one
of the better models for how humans act. Take for instance the problem
of the inverse kinematics of a robot. With very good measurements of the
system, they can be painstakingly computed. But if you let a person play
with the controls for a while, they can make the robot do what they want,
without measurement or calculation. Furthermore, the human control is
more robust.
2. Exploration vs Exploitation
I think this is a fascinating problem. I am regularly faced with the decision
of how to drive from Cambridge to Lexington. Let me tell you that this is
a diÆcult problem. I have some estimates of the means and variances of
the time certain routes will take. But I don't really have much data on the
true distributions especially when conditioned on the other information I
have at my disposal in the morning. Luckily, topology and speed limits
bound the number of possible paths to a manageble number. Even so,
when rt 2 is a parking lot, it brings into play new potential routes for
which I naturally have less information.
3. Dynamic Baysian Networks
I nd DBN fascinating because they are a powerful tool for representing
temporal information. They are what I would consider the usable version
of the HMM. As fascinating and useful as the Kalman lter is, its restric-
tions to gaussian distributions is potentially over-limiting. Lets say, for
instance, that I have this sensor that is perfectly correct except for once
in a while just it says 7 regardless. An astute person will quickly learn to
trust the 7's far less than the 8's. I think that a computer should be able
to do so also. The DBN is able to capture a much larger swath of the real
world. I am excited to learn more about making optimal decisions with
under more general probability distributions.
Inverted Helicopter
Inverted autonomous helicopter via reinforcement learning, An-
drew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben
Tse, Eric Berger and Eric Liang. In International Symposium on Experimental
Robotics, 2004
I selected this paper since there was a picture of a helicopter in AIMA whose
controller was developed by Andrew Ng. On his webpage he had a number of
interesting articles. This was among the most relevant.
The important feature of this paper was that reinforcement learning was
able to hover an autonomous helicopter inverted. The methodology consisted of
having a human pilot the craft inverted and collecting data. Then they used
this data to construct a model. They ran simulations using this model in order
to nd a policy that could the helicopter inverted. Finally, they successfully
implemented the controller on the real helicopter.
A feature of the simulations was that each potential policy was tested on
the same set of pre-determined random trials. That prevented variation in the
trials to unduly aect which policies were considered the best. He described
this previously in (Pegasus. Ng and Jordan 2000). The important idea is that
one can eliminate the non-determinism of an MDP by generating a set of ran-
dom trial outcomes and evaluating each policy on the same trials, treating the
transitions as deterministic. As long as the trials are representative, the
results
will be representative of the true, non-deterministic case.
While this was a very exciting result, the methodology was nothing new.
Furthermore, the reward function had a number of magic parameters that are
not described and I can only suppose they had to be hand tuned. I would also
like to limit the amount of learning that is done in simulation. I think that
if
the controller starts overy slowly (and with a little guidance), it could do
all
of the learning online without any crashes. People manage to do precisely that.
Relational Reinforcement Learning
Relational Reinforcement Learning. Saso Dzeroski. 2003
I found this paper in a citation by Forbes who was cited by AIMA.
This paper presents the RRL algorithm in which the Q-function is approx-
imately represented by a regression tree. This regression tree is maintained
so
that similar examples are combined (or one is rejected) and suÆciently
dier-
ent examples yield an additional branching of the tree. Thus, a size constraint
can be maintained. In cases where states are best described relationally, this
data structure is intuitive and compact. For instance, it was eective in nding
optimal solutions to the block stacking problem. The tree is populated by gen-
eralizing a set of manually provided optimal solutions.
A problem with this algorithm is that it scales poorly. The number of po-
tential relationships grows very quickly. The algorithm over-generalized when
it received too many training cases and was pretty bad at tetris. More im-
portantly to the task in question, since the relationships were propositional,
it
doesnt generalize into the continuous case very well.
Eective Reinforcement Learning
William D. Smart and Leslie Pack Kaelbling, "Eective Reinforcement Learning
for Mobile Robots," International Conference on Robotics and Automation,
May
11-15, 2002.
I found this one on LPK's webpage because I had her as a professor and knew
she was into this stu.
This paper deals with the problem of Q-learning with sparse rewards. Since
the initial distribution of rewards is so the agent ends up having to pick
actions arbitrarily which means it might take a very long time for the agent
to
nd a good policy. She xes the problem manually much like in Dzeroski, with
manual information. In one case, she provides a simple starting learning policy
that ensures that the goal is found. This allows the Q-function to start being
lled in. Then she drives the agent manually towards the goal. In this case,
unlike in Dzeroski, it doesnt matter if this initial input is optimal or not,
it is
only to aid out the Q-values. In all three papers, a human operator is
used to hone in on the region of the state space that is of interest.
I thought this paper was fantastic. It was interesting and well written. I
guess if I had to complain about something it would be that there is no discussion
of choosing the discount factor or learning rate. Also, some details of how
the
Q-values are updated are not specied.
I propose implementing a controller for the Quanser helicopter that learns how
to on its own and does not rely on prior knowledge of the system model. I
plan on using Q-learning or policy search. I would start by providing a dense
reward surface for hovering and let the controller determine the control. Then
I will try to extend that to general trajectory following. If that goes well,
I
will try a much sparser set of rewards. If possible, I would like to have it
learn
from a human operator like in the Kaebling paper. I estimate the probability
of persuing this project at 3/4.
The first topic I would like to explore is probabilistic reasoning with Bayesian
nets. I see that reasoning under situations of uncertainty is a very important
area of AI, and it is apparent that these techniques are the standard approach
for effectively modeling this uncertainty. I am particularly interested in learning
how to model complex relationships between variables in spite of redundant links
in Bayesian Nets. Also, I would like to study to the automatic generation of
Bayesian Nets from large training set.
Secondly, I would like to investigate reasoning about opponents using game theory
and Minimax search. Beyond the typical examples using a single opponent (ie
chess), I would like to explore further the idea of reasoning in a world with
potentially more than one cooperating and/or adversarial agent.
Finally, I would like to explore reasoning using first order logic. As described
in Russell and Norvig, these concepts and methods seem to be very refined for
basic objects, their properties, and the relationships between them. This is
very useful for implementation of these properties in object-oriented data structures
in any artificial intelligence application.
i.) Approximating Game-Theoretic Optimal Strategies in Full-scale Poker by Billings
et. Al.
This paper’s main contribution is a model for reducing the search space of Poker
minimax through the use of “bucketing”. This concept is very basic, and is used
by most human players. Rather than describing a given situation using every
detail, situation can be grouped into categories. You will often hear poker
players describe these situations, such as “I had top pair top kicker”, or “She
turned a set on me”. These situations frequently have their own notations (Top
Pair Top Kicker = TPTK) in poker literature. I find that Billings et Al. have
performed an appropriate grouping of situations based on a particular poker
expert/writer, David Sklansky’s rank of hands. However, they have oversimplified
the problem to poker involving only one opponent. In real poker, you are very
rarely up against one opponent, and so in some sense they’ve cheated by reducing
the minimax search space beyond what is useful. I intend to use their bucketing
system for my search without the two-agent reduction.
ii.) On Pruning Techniques for Multi-Player Games by Sturtevant and Korf.
The major contribution of this paper is that it shows that minimax search against
multiple opponents, known as MaxN, has very limited capacity to be pruned via
alpha-beta and branch-and-bound pruning. Using examples from the games of Sergeant
Major and Hearts, they have shown a way to turn multiplayer minimax search into
a “paranoid” two-player situation, in which every opponent has formed a coalition
against you. This is an oversimplification, probably induced by their bias toward
the game of hearts. In most games, each opponent is usually out for each other
as much as they are out for you. In hearts, however, when you are attempting
to shoot the moon, at some point your opponents may form an explicit coalition
against you. This situation is exclusive to hearts, and I cannot think of another
game where this is relevant.
I think their point regarding the limited utility of pruning in Multi-Player
minimax is valid. However, I do not necessarily agree with the usefulness of
transforming a MaxN tree into a simple two-player model. Also, in their conclusion,
they allude to the fact that the incorporation of domain knowledge is key to
reducing the search space. I will work to reduce this multi-player search space
by incorporating deterministic and probabilistic knowledge about the game in
my robot’s strategies.
iii.) Deep Blue by Campbell et. Al.
This paper describes and evaluates some of the design decisions regarding the
design of Deep Blue. I believe the major contribution of the design of this
system was not necessarily the intelligence (or lack thereof) behind its reasoning,
but with the positive press and attention this brought to the AI community.
I have mixed feelings regarding Deep Blue as an accomplishment in AI. In some
sense, the fact that this machine beat a world-class player is an accomplishment
in computing, but not necessarily intelligence. I liken this to a situation
where our calculators reduce complex mathematical formulas in microseconds.
Although this is impressive, most people do not consider this a marvel of intelligence.
The common understanding is that Deep Blue was a brute force attack on the search
space. As we begin to address more complex information spaces with search, it
becomes increasingly important to incorporate knowledge from the particular
domain to reduce that space. This is essentially the theme that I introduced
in the evaluation of the previous paper, and I hope to incorporate this notion
into my implementation of strategy for games.
Although the challenges of playing poker in a physical world are great, the
most difficult task for my robot is actually determining a winning strategy.
I would like to focus on this particular task, and the multiple levels of reasoning.
In particular, my goal is to be able to win Small-Stakes/Low-Limit Texas Holdem.
Although this game is characterized by poor opposition making numerous mistakes,
it is by no means an easy game. However, I am optimistic given the body of literature
and availability of large sets of training data that this can be done in the
last third of the course. The likelihood of me choosing this project is very
high.
References
[1] Billings, D., Burch, N., Davidson, A. Holte R., Schaeffer, J., Schauenberg,
T., & Szafron, D. (2003). . Approximating Game-Theoretic Optimal Strategies
in Full-scale Poker. In Proceedings of the 2003 International Joint Conference
on Artificial Intelligence.
[2] Campbell, M., Hoane, A. J., & Hsu, F. (2002). Deep Blue. Artificial
Intelligence (134), pp. 57 – 83.
[3] Russell, S. & Norvig, P. (2003). Artificial Intelligence A Modern Approach.
Pearson Education International.
[4] Sturtevant, N. R., & Korf, R. E. (2004). On Pruning Techniques for Multi-Player
Games. Computer Science Department, University of California, Los Angeles.