![]() |
|
Developing Mobile Robot Wall-Following Algorithms
Using Genetic Programming
This paper demonstrates the use of genetic programming
(GP) for the development of mobile robot wall-following behaviors. Algorithms are
developed for a simulated mobile robot that uses an array of range finders for navigation.
Navigation algorithms are tested in a variety of differently shaped environments to
encourage the development of robust solutions, and reduce the possibility of solutions
based on memorization of a fixed set of movements. A brief introduction to GP is
presented. A typical wall-following robot evolutionary cycle is analyzed, and results are
presented. GP is shown to be capable of producing robust wall-following navigation
algorithms that perform well in each of the test environments used.
Introduction
This work focuses on using genetic programming (GP) to develop
mobile robot navigation strategies that will perform well in the real world. A simulated
robot is used for learning, but the simulation is designed to closely approximate a real
robot. The algorithms developed with GP can then be tested in the real world, on a real
robot. Wall following was selected for the initial problem domain because it is a fairly
simple problem to set up and evaluate. It also lays the groundwork for more complex
problem domains, such as maze traversal, mapping, and full coverage navigation (i.e.,
vacuuming and lawn mowing). The development process for these behaviors is described, and
the results of the experiments are presented. (1)
Genetic programming is attractive for this work for several
reasons. First, it is a machine learning technology, so control strategies are emergent
rather than predetermined. The GP environment is set up for a particular problem domain
and it is left to "the unyielding weight of evolution," as Dr. John Koza puts it
[1], to drive out solutions. Also, GP is a black-box system in that a-priori knowledge of
the system under analysis is not required. Some method of analyzing the results obtained
from a set of input conditions is all that is necessary. Finally, the results of a GP
learning session can be arbitrarily complex. Complexity is bounded by imposing limits on
the maximum size of the computer programs that GP can create. GP will explore solutions
that range up to the largest allowable size, regardless of the magnitude of that limit.
Genetic programming is a methodology for automatically generating
computer programs. GP uses the concepts of natural selection and genetic recombination in
a manner very similar to that used with genetic algorithms [2]. Rather than evolving
fixed-length bit streams, however, GP evolves computer programs of dynamic size and shape.
GP is founded on the premise that computer programs can be represented as tree structures,
as shown in figure 1a. The functions (f) operate on terminals (T) to produce a result.
Functions are operations that take one or more arguments. They can be sequential (i.e., +,
AND), conditional (i.e., If, When), or iterative (i.e., DoUntil, WhileDo). Terminals are
operations that take no arguments but return a value (i.e., variables and constants).
Figure 1b shows a simple arithmetic function. If terminals T1, T2
and T3 evaluate to 5, 7 and 2, respectively, then the program will return the
value 6 when it is executed. (5 is added to 7 for an intermediate value of 12, which is
then divided by 2 for a final result of 6.) These diagrams represent simple Lisp-like
s-expressions that result in a single output at the top. In this simplest form the
functions are both sequential and deterministic, but functions and terminals can be
defined to perform iteration, recursion, side effecting, memory evaluation, and other
complex operations.
The genetic programmer defines a set of functions and terminals
appropriate for the problem at hand. These may include highly generic functions, such as
those mentioned above, or they may be specific to the problem domain. The set of functions
and terminals defined must satisfy two constraints: sufficiency and closure. Sufficiency
requires that a solution to the given problem must exist within the space of all possible
computer programs that can be created using the specified set of functions and terminals.
Obviously, if a solution does not exist GP will not be able to find it. Closure requires
that every function and terminal return a value of compatible type, and that every
function accepts arguments of this same type. This constraint guarantees that any
combination of functions and terminals that comprise a complete tree structure (i.e., a
terminal is placed at every leaf node) will be a syntactically valid computer program that
can be evaluated.
The genetic programmer also defines a fitness test that
quantifies the performance of a computer program. The performance of a computer program in
GP can be a function of the final result that it produces, or it can be based on an
analysis of some side effect that the computer program has produced. In the work presented
herein, the function and terminal sets contain operators that affect the contents of a
2-dimensional memory array. The final state of this array is evaluated to determine the
computer program's fitness.
The GP process initially creates a number of computer programs by
randomly combining functions and terminals into complete tree structures. This collection
of programs, or individuals, is called a population. The GP engine evaluates and assigns a
fitness value to each of the individuals in the population. Not surprisingly, the
performance of the individuals in this initial, randomly created population can be quite
poor. Nevertheless, there is variance in the performance levels: some individuals perform
less poorly than others. The population is ranked according to fitness. A new population
is generated by selecting individuals from the previous generation to participate in
creative operations. Individuals are selected for inclusion in the creative process based
on their fitness - the higher the fitness, the higher the likelihood of inclusion. Three
basic creative operations are used in GP: reproduction, crossover and mutation (2). Reproduction is simply the copying of an individual from the previous
generation into the next generation without any modification of its structure.
Figure 2 - The reproduction operation.
Mutation is performed by randomly selecting a node in an
individual tree structure, and removing that node along with any sub-tree that may exist
below it. A new sub-tree is then generated randomly and "grafted in" at the
position where the original node was removed.
Figure 3 - The mutation operation.
Crossover involves selecting two individuals and selecting a node
at random in each of them. The selected nodes, along with any sub-trees that exist below
them, are exchanged between the two individuals.
As successive populations are created by applying these genetic
operators, the average performance of the population tends to improve. The performance of
the best individual in the population can improve to optimal or near-optimal levels. There
is no guarantee, of course, that GP will find an optimal solution (in less than infinite
time), but a well thought out set of functions and terminals with a reasonable fitness
test will usually produce good results.
The Wall Following Robot
The goal of a wall-following robot is to navigate through open
spaces until it encounters a wall, and then navigate along the wall at some fairly
constant distance. The successful wall-follower should traverse the entire circumference
of its environment at least once without straying either too close, or too far from the
walls.
Figure 5 - The four room
configurations used to develop the wall-following behavior.
A two-dimensional simulated environment is used to test the navigation algorithms developed by GP. The environment consists of four completely enclosed rooms with wall contours of varying complexity. Each individual algorithm from the GP population is tested in each of the four rooms, and scored on how well it performs the wall-following behavior. By testing each individual in a number of different environments GP is encouraged to develop general-purpose solutions rather than simply memorize a set of movements, or capitalize on insignificant coincidental properties of the simulator [3].
A screen display of the simulation environment is shown
in figure 5. The dark shaded areas represent the walls of the environment. The lightly
shaded areas represent the desired path of the robot. The desired path is for display only
- the robot has no visibility of these areas. All sensory input to the robot comes through
the use of simulated range finders. The robot has eight range finders positioned at 45
increments about its center. The solid square in the center of each room designates the
starting point of the robot in that room.
Each room in the simulated environment consists of a 40 by 40
array of grid locations that are each 10 pixels square. The pixel size is used as the
basic unit of movement (3). The navigation algorithms are scored on how
many of the lightly shaded "corridor" grid locations they can pass through - the
more the better. The robot is allowed to wander between 10 and 20 units from the wall (the
width of a corridor grid) without any performance demerits. GP calculates fitness based on
the number of corridor grids that are not visited during the course of a run, and attempts
to minimize the following fitness equation:
NumberOfCorridorGridsNotVisited is the most significant
aspect of fitness. A weighting factor of 1000 ensures that GP will remain motivated to
navigate through corridor grids, even when there are only a few unvisited grid points
remaining.
DistanceOfEndingPointFromCenterOfCorridor is the distance
from the final resting point of the robot to the center of the nearest corridor grid
(i.e., 15 units from the nearest wall). This term is used to "kick-start" the GP
process. Without it, a robot that does not move at all (travels zero distance) will score
higher than a robot that moves towards a wall but does not quite reach the corridor grids,
and GP will often suffer premature convergence on these solutions. In other words, this
term encourages initial exploration by rewarding algorithms that move towards walls, even
if they do not reach those walls. This provides the necessary scoring gradient in the
early generations to encourage GP to develop individuals that leave their starting
location and move towards the walls. A weighting factor of 100 is applied to this term so
that it will eclipse TotalDistanceTravelled any time the robot does not reach the corridor
grids.(4)
TotalDistanceTravelled is the least significant term in
the fitness equation. It is included to discourage the navigation algorithms from
traversing the corridor grids by spiraling haphazardly through the open spaces of the
environment. A robot that navigates directly through the center of the corridor grids will
score higher than one that meanders back and forth across the grids.
As described above, genetic programming solves problems
by combining and organizing a predefined set of functions and terminals. This set of
functions and terminals provides the fundamental domain knowledge that allows GP to
discover solutions. The functions and terminals used for the wall-following experiments
presented herein are defined below.
Do2 (Arga, Argb) This function provides for sequential
operations. It evaluates argument Arga, then it evaluates argument Argb. The return value
is the result of argument Argb. Multiple sequential steps can be accomplished by nested
Do2's.
WhileInCoridorRange (Arga) This function checks each of
the distance values, selects the minimum value (i.e., the closest wall) and determines if
it is within the allowable wall-following range. If the robot is out of range of a wall,
this function returns the value 0.0. Otherwise, it evaluates argument Arga continuously
(so long as the robot is within acceptable range of the nearest wall) and returns the
value obtained from the last evaluation of argument Arga. The range of values that
satisfies the "InCorridorRange" condition is defined as 10 to 20 units,
inclusive.
WhileTooCloseToWall (Arga) This function is identical to
WhileInCoridorRange except that it evaluates the argument as long as the robot is too
close to a wall. The range of values that satisfy this function is less than 12 units. 12
is used rather than 10 to provide a hysteresis band between the in and out of range
conditions.
WhileTooFarFromWall (Arga) This function is also identical
to WhileInCoridorRange except that it evaluates the argument as long as the robot is too
far away from the closest wall. The range of values that satisfy this function is greater
than 18 units.
IfConvexCorner (Arga, Argb) This function tests for the
condition of the range sensor at angle 90 reading too far to the nearest wall, and the
range sensor at angle 135 reading an acceptable distance. This condition occurs when the
robot is following a wall that drops away in a convex corner. If this condition is
satisfied, the function evaluates argument Arga and returns its value. Otherwise it
evaluates argument Argb and returns its result.
MoveForward() This terminal causes the robot
to move forward five units. The return value is 1.0 if the robot completes its move
without colliding with a wall, and 0.0 if a collision occurs. If the robot collides with a
wall, movement stops at that point (i.e., the robot is not allowed to move into or through
the walls). These return values are arbitrary and were selected to coincide with the C/C++
integral values used to represent Boolean true and false.
TurnRight(), TurnLeft() These terminals cause the robot to
rotate 45 to the right or left, respectively. The return value from these terminals is the
angle that the robot ends up facing after the rotation.
TurnTowardsClosestWall() This terminal checks each of the
current sensor values and selects the lowest value (i.e., the closest wall). It then
rotates the robot such that the robot is pointed in the direction of the sensor that had
the lowest reading. Note that the robot turns towards the perceived closest wall, and not
necessarily towards the actual closest wall. The return value is the new direction of the
robot.
TurnAwayFromClosestWall() This terminal is similar to
TurnTowardsClosestWall, except that the robot rotates to face the opposite direction of
the perceived closest wall.
TurnParallelToClosestWall() This terminal is also similar
to TurnTowardsClosestWall, except that the robot rotates such that its 90 sensor is
pointed in the direction of the perceived closest wall.
Population size was set to 500 individuals, and experiments
typically required 200 to 300 generations to converge on a solution. This population size
was selected by trial and error to provide acceptable results in a reasonable amount of
time on the hardware available. All experiments were performed on Pentium-100 PC's, and
ran for six to eight hours. The probability values for reproduction, crossover and
mutation were set to 39%, 60% and 1%, respectively, for all experiments. These
values were selected somewhat arbitrarily, based on earlier work [4]. The reproduction and
crossover values set the balance between exploitation and exploration, and mutation allows
for recovery of any lost genetic materials. All experiments invoked an operation to force
the best performing individual to reproduce at least once into the next generation. This
prevents outstanding performers from being lost in a genetic roll of the dice (which
happened surprisingly often without this operator in place).
The genetic programming engine was, indeed, able to organize the
functions and terminals described above into fairly proficient wall-following algorithms.
This section describes a typical evolutionary cycle. The initial generations of each run,
as expected, performed quite poorly. The robot paths in figure 6 (shown as dotted lines)
are typical of early generations. The robot manages to maneuver into the wall-following
corridor and claim a few grid points, but has not yet figured out how to turn and follow
the corridor. The target corridor shading has been removed for clarity. The black dot at
the end of the robot path designates the stopping point of the robot.
Figure 6 - Typical result of early
generations in a GP run.
The GP-generated code that produced the behavior displayed in
figure 6 is shown below. The line numbers are for reference only, and the indentation
represents the calling hierarchy. This particular solution capitalizes on the hysteresis
band in the WhileTooFarFromWall function to skim the edge of a wall following corridor in
the upper right room.
It begins with a WhileTooFarFromWall loop, with a Do2 function as
the body. The robot starts in a too-far-from-wall position, so the Do2 is evaluated. The
first argument of the Do2 is a TurnAwayFromClosestWall terminal which sets the direction
of the robot to the left in all cases. The second argument of the Do2 is another
WhileTooFarFromWall loop with a MoveForward terminal as its body. This loop repeats until
the too-far-from-wall test fails (i.e., the robot ends up within allowable wall-following
range). The inner WhileTooFarFromWall loop terminates at that point. The Do2 is complete,
so the outer WhileTooFarFromWall loop evaluates the position of the robot. Finding that
the robot is not too-far-from-wall, this loop terminates and the program ends with a total
of 13 hits out of the possible 933.
- 0 WhileTooFarFromWall
- 1 Do2
- 2 TurnAwayFromClosestWall
- 3 WhileTooFarFromWall
- 4 MoveForward
GP soon begins to develop solutions that exhibit rudimentary
wall-following behavior, as shown in figure 7. These solutions typically perform
reasonably well in the square room, and claim a few corridor grids in the more complex
rooms by ricocheting around the environment.
The code for the individual displayed in figure 7 is shown below.
It begins with a WhileTooFarFromWall loop with a Do2 as its body. The arguments of the Do2
are a MoveForward and a Do2 with two WhileInCoridorRange loops as arguments. While the
robot is out of corridor range the WhileInCoridorRange tests will fail. The result is that
the robot will move forward until it finds itself within corridor range of a wall. When
the MoveForward on line 2 moves the robot into a wall-following corridor, the
WhileInCoridorRange functions are able to execute. The first (on line 4) turns the robot
away from the closest wall repeatedly until the maximum loop counter is exceeded.(5) The second (on line 6) employs a Do2 to turn the robot parallel to the
closest wall and move forward. These two steps are repeated until the robot moves out of
range, or the loop times out.(6) When either of those conditions occurs,
another iteration of WhileTooFarFromWall (on line 0) is made. If the robot is out of range
and the loop counter is not maximized, the process repeats. This algorithm passes through
a total of 275 corridor grids of the possible 933.
Figure 7 - Typical result of a GP run as it begins
to learn the wall-following behavior.
In almost every run GP is able to generate algorithms that
perform reasonably well. The individual shown in figure 8 makes a couple of errors, but
for the most part it is exhibiting the desired behavior. It passes through 894 of the 933
corridor grids.(7)
Figure 8 - Typical end result of a GP run.
The code for the individual displayed in figure 8 is shown above.
Analysis of this code (which is typical of the complexity of GP-generated algorithms) is
left as an exercise for the reader. (8)
Conclusions
These experiments show that GP is capable of generating
wall-following algorithms using the function set provided. Although a 100% solution
(passing through all possible corridor grids) was never attained, most learning cycles
produced algorithms that exhibited the desired behavior. The convex corners proved to be
particularly troublesome and warrant further analysis. This is probably a sufficiency
issue and resolution should enhance performance, possibly leading to 100% solutions.
The algorithms generated are robust in that they perform
well across the four environments used in these experiments. It seems reasonable to expect
that the algorithms will also perform well in additional environments of similar
complexity, but this hypothesis remains untested. The original intent was to test the
algorithms on a physical robot equipped with ultrasonic range finders, but such a robot
was unavailable during the required time interval. Follow-on work will focus on employing
GP-developed algorithms on physical robots.
This project demonstrates the feasibility of using
genetic programming to develop mobile robot navigation algorithms. It lays the foundation
for planned follow-on projects of maze traversal, map generation and full-coverage area
traversal. The results of this project show sufficient promise to warrant further research
into these more complex tasks.
J.R. Koza, Genetic programming: On the programming of
computers by means of natural selection, MIT Press, Cambridge, MA, 1992.
J.H. Holland, Adaptation In Natural And Artificial Systems,
University of Michigan Press, Ann Arbor, MI, 1975.
C.W. Reynolds, Evolution of Obstacle Avoidance Behavior: Using
Noise to Promote Robust Solutions, Advances in Genetic Programming, MIT Press,
Cambridge, MA, pp. 221-241, 1994.
R.A. Dain, An Overview of Genetic Programming with Machine Vision Examples, Northwest Artificial Intelligence Forum Journal, Volume 4, pp. 21-30, 1995.