Ben Schwartz's Mission 2006 Annotated References

December 2, 2002

At the request of the presentation team, I have proposed a specific set of variables to characterize a particular area of rainforest.  Consultation with expert Michael Keller revealed that only an expert botanist could identify and determine the densities of various species in, for example, a one-hectare plot in one day.  Therefore, after reviewing a preprint of a paper by Asner et. al. on integrating field and remote sensing, I suggest that the set of variables be the proportion of the plot covered by various classes of vegetation, as well as soil and possibly other surfaces, such as felled trees.  These proportions can be determined by spectroscopic analysis of satellite data combined with fast on-site measurements with field spectroradiometers like the ASD FR-Pro.  This data is decomposed into relative cover using spectral mixture analysis implemented by a program known as AutoMCU.   Data of the correct type is already available from existing Landsat 7 satellites.  Combined with water and soil samples taken to be analyzed in dedicated labs off-site, there should be enough data to fully characterize any plot of land, while keeping the necessary amount of time spent by an investigator to within one workday.

December 1, 2002

Part I: Modelling

After much arduous fighting with OpenOffice.org, the ArcView PVA Table Tools Beta v1.03, and MATLAB, I have successfully executed a simulation that provides some insight into the behavior of animal species.  I used OpenOffice to modify the animal's lifetable database, which describes the life cycle of the hypothetical species.  ArcView with the PVA extension was the tool of choice to model the population time series, and MATLAB allowed clear presentation of the results.

The model was specifically designed to demonstrate the sensitivity of the survival of a population to relatively subtle aspects of its life cycle.  Specifically, the species compared were identical in every way except for the distribution of fecundity over age groups.  Although they had the same total fecundity, one population was moderately fertile for much of its life cycle, while the other was twice as fertile for half as long.  The populations were both initialized at 6 times the theoretical carrying capacity of the environment, simulating the resulting population crash, for example, from deforestation of an animal's habitat resulting in reduced carrying capacity.   Here is a graph of the two populations over time.  The blue line represents the population with distributed fecundity, while the red line represents the population with concentrated fecundity.


Click to download EPS file
Logarithmic Plot of Population versus Time

The two populations behave identically for the first thirteen years.  After this, the population with distributed fecundity survives past fifty years (and in fact, further simulation shows that at 100 years it has stabilized).  By contrast, the population with the concentrated fecundity goes extinct within 35 years.  The following graphs provide the key to interpreting these results.
Click to download EPS original
Click to download EPS original
Age structure of the surviving population over time
Age structure of the extinct population

The simulation was run under the assumption, valid for many species, of an age-dependent birthrate, where (virtually) no young survive while the population is in excess of the carrying capacity. Both populations remain in excess of the carrying capacity for twelve years, and the fact that no young are born during this time is manifested by the twelve-year-long hole in the age structure of the surviving population.  However, after year twelve that population continues to produce young and ultimately comes to an equilibrium age structure.  By contrast, the extinct population never has an increase in births even as it decreases below the carrying capacity, because by the time it has done so all of the members of its population are too old to breed.

Although this scenario is admittedly artificial, the important point remains that subtle aspects of a species's life cycle can have drastic effects on its long-term survival. Clearly, a great deal of information will need to be obtained through monitoring and research to establish the life cycles of species well enough to forecast their risk of extinction.  If such data were available, conservationists could distinguish populations in danger of extinction in the future from those that will naturally find a new equilibrium, and presumably combat extinction in the Amazon basin more effectively by focusing their efforts on the former category.  A combination of Population Viability Analysis and monitoring can help conservationists make this decision.

Part II: Indexation

After the resolution of several miscommunications, I propose a method for indexing forest health.  Such a method must be applicable to very different areas within the rainforest, must be relatively simple to apply to any given parcel of land, and should ultimately come up with a number from 0 to 100, where 100 represents pristine rainforest and 0 represents development rendering the area unusable for the foreseeable future.  Under the current administrative plan, such a method would have five years to be developed before it would need to be applied.  I propose the following method for generating an index:

1. Gather a group of experienced field biologists to list all of the likely indicators of forest health that can be measured in one day of field work.  These variables are expected to include the densities of various tree species and possibly animal species, the amount of surface water per hectare, and any available satellite data.  It may also include human variables, such as the number of humans per hectare, and even the proportion of the land that is covered by human structures of various kinds.  Whether or not a variable is necessarily a positive or negative factor need not be known in advance, so long as it is deemed likely to have some effect sufficient to justify the time necessary to measure it.

2.  Assemble a team to visit sites distributed throughout the Amazon, measuring each of the selected variables and additionally assigning the site an index from 0 to 100. This team need not be constant, so long as membership remains high enough.  If 20 people were employed full time making measurements, and were able to visit 100 sites each year for five years, the result would be 10,000 samples in the finished index.  That appears to be a sufficient number at this time.

3. Once all of the raw data is available, normalize each variable as a percentile among the samples and (using a computer) plot each site as a point in n-dimensional space, where each of the n axes represents a normalized variable.  

To determine the index for a site, investigators will go in and measure each of the variables.  This will represent a point in the n-dimensional space described before.  To determine the index of a site, the value at that point must be interpolated from the initially measured points.  There are many computational methods in literature for accomplishing such approximations, but they are largely beyond the undergraduate level of mathematics.  However, simple methods such as nearest-neighbor approximation, where a point is assigned the value of the nearest point with known index, may prove sufficient.

This method is superior because it removes all potential for bias from the investigator at the site of any plot.  Since the relationship between variables and index may be complex, and since the investigator does not determine the index himself, there is little or no opportunity for the corruption and bribery that might normally occur in the culture of Latin American governments.  Since there is no judgment call in the index, minimal training necessary to execute the required measurements is sufficient.

Theoretically, this method is also resilient, since if one of the variables was not able to be measured, the index can be computed, albeit with some loss of accuracy, in a N-1 dimensional space, as if the missing variable had never been considered.  In general, if the variable does not indicate something drastically different from the rest of the variables, the result should be the same.  This also allows quick unofficial assessments.  However, if the authorities discern any corruption in the form of investigators deliberately not recording certain measurements, complete data can be required for official results.

While this method is not complete, a great deal of field work might be necessary before it could be.  This is just a framework on which such a model may be constructed.

November 3, 2002

The GIS-focused subgroup of group 9 has had two meetings with Sarah E. Williams of the Rotch Library on the past two Wednesdays.  Ms. Williams created the 12.000 GIS webpage and is MIT's GIS specialist.  We have been learning how to manipulate views and themes, as well as how to load extensions, in ArcView.  I tested the Population Viability Analysis extension on the Athena Linux version of ArcView, and after determining the value of ArcView's directory environment variables found that it requested a DLL file, indicating that it would only ever run under Windows.  In our hour-and-a-half meeting with Sarah E. Williams on Wednesday, October 30, we tested the extension on Rotch Library's Windows workstations, with the puzzling result that the extension seemed to have no effect.  (In Ms. Williams's experience, extensions normally alter the menu bar in order to provide access to their functions.)  The documentation link on the extension's website was dead, so we e-mailed the person responsible for the site, only to find out that her e-mail address was no longer valid.  We are still waiting to hear from other people involved in the creation of the PVA extension.  In the meantime, we have requested licensed copies of ArcView for use in our dorms, which we will be able to pick up on November 4 in the MIT Computer Connection.  We also have received licenses for online ArcView courses through MIT so that once we receive licenses we can become familiar with the software in the event that we should ever figure out how to use ArcView for modelling.  The focus of our group is no longer bibliographic research, since the only remaining research we can do will be based on the availability of data for the parameters inside this ArcView extension, and we do not yet know what these are. If we can make the ArcView PVA extension function properly, we will shortly be doing research in the far more real sense of executing a model that has never been done before.

October 22, 2002

Although there is no MIT license for Ramas GIS, there is a free extension to ArcView, for which we do have a license, called Animal Movement.  This has not been thoroughly tested (and is therefore cutting-edge-ish) but should allow us to analyze the effects of rainforest fragmentation on animal survival probabilities.

October 21, 2002

There seems to be a tentative agreement within the group that we will do sensitivity analysis on an interaction network model of a rainforest ecosystem.  There is some support for a PVA using Vortex.  I have found a more detailed description of the inner workings of Vortex.  This page also admits that certain parameters like the degree to which inbreeding is dangerous are unmeasured for most species, and suggests that the values published in Ralls et. al. 1988, Conservation Biology, "Estimates of lethal equivalents and the cost of inbreeding in mammals" can be used instead.  This paper averages experimental values over 40 mammalian species, with results of 3.14 lethal recessive alleles per diploid genome, with an upper quartile of 5.62 and a lower quartile of 0.90.

October 20, 2002

The group has decided to do two different kinds of models, one more qualitative and one more quantitative.  It seems mostly agreed that at least one of these two will be a sensitivity analysis on a  system with many parameters whose values remain to be measured.  One specific type of model involving sensitivity analysis is Population Viability Analysis (PVA), which is a very general method for predicting extinction probabilities given certain initial conditions.  Sensitivity analysis is used to determine the uncertainty in the resulting probability.  A free PVA  software package called VORTEX is distributed by the Chicago Zoological Society.  VORTEX is one of the best PVA programs, as shown by this comparison paper, because it incorporates the negative effects that random events can have on small populations.  RAMAS GIS, while it tends to underestimate the extinction probability by an average of 16% unless only the limiting sex is considered, can incorporate spatial information from satellites and model the effects of population segregation on extinction probability.  However, this software is not free, so we would be unlikely to use it unless one of the MIT departments already has a license and access to the GIS database.  This does not seem to be the case.

Given our limited resources, MATLAB Simulink may be our best modelling platform, since it is freely available over Athena.  In further narrowing what types of models we will consider, a reference to Environmental Information and Modelling Systems may be helpful:
Considering ecosystems either as "adimensional conceptual units" (ACU) or as geographically bounded "basic units of nature" (BUN) is a long lasting theoretical discussion within ecology (Marin, 1997). Although this problem may seem to be mostly of the interest of theoreticians, it has important implications when facing ecosystem management issues (Blew, 1996; Rowe & Barnes, 1996). If ecosystems are ACU, then the boundaries of these systems are left either to the scientists studying a specific process or to the stakeholders of the system under exploitation. This view, supported by the General System Theory (Marín, 1997), allows any geographic area, regardless of its size, to be defined as an ecosystem.
Using GIS data would implicitly define the Basic Unit of Nature to be 1 pixel, while doing standard differential- or difference-equation models would endorse the ACU concept.  For the sake of balance, it would probably be best if one of our models could be of each type, although this might prove too difficult.  Even if these things are too difficult for us to execute ourselves, it is still important that we know about them so we may describe what methods a ecological modelling organization in the Amazon might employ.

October 6, 2002

Selecting Something to Model

We're going to need to pick a system to model if we are to actually do any modelling. This is a difficult choice, because the types of models we are considering do not consider spatial variation, and therefore can only be valid inside a moderately homogeneous region. We are not actually attempting a comprehensive model of the system. If we were, it might be useful to partition the Amazon basin into ecologically similar regions. Methods such as K-Means Clustering or Improved K-Means attempt to automate this sort of procedure, by successively reassigning elements (in this case, small areas of rainforest) to groups until each group is as homogeneous as possible. This does, of course, leave open the question of defining similarity.  The application of clustering to ecology is described further in Numerical Ecology by Lebesgue and Lebesgue.

Seeing the Forest for the Trees

Given our limited resources, it is in our best interest to model the behavior of keystone species in the Amazon Basin's most prevalent ecological system.  To see why, read Keystone Species by Rhett Butler. One World Wildlife Foundation Report on Peruvian Rainforests by Rhett Butler states:
"Although there are hundreds of tree species in tropical forests, a surprisingly small number are absolutely essential to the survival of many of the birds and mammals inhabiting them. These keystone species are not necessarily the most abundant trees but those which bear fruit in periods of food scarcity (Hartshorn 1992). In Manú there are 10 or 12 species of figs and palms which perform this function (Terborgh 1986)."
Rainforest Relationships also supports the fig tree as a keystone species, stating that it provides up to 70% of herbivores' diet in certain locations, including those of birds, bats, monkeys, pigs, mousedeer, and occasionally fish.  This is due to its continuous fruiting state, providing a constant backup nutrient source to feed the rest of the food chain.  The Tropical Plant Database has more information on many plants, but not fig trees.  It appears that there is little general agreement on which are the keystone species. In fact, on Brazil Exchange one author wrote that "it is not unreasonable to consider humans as a keystone species in much of the area and continued human intervention to be vital to its health."

September 23, 2002

Systems Theory

Although our group was originally titled "Systems Theory and Interaction," it now appears that systems theory itself is of little use to us. Systems theory is so theoretical, based almost entirely on the concept of information flow, that while insightful, it is not useful when trying to make predictions about real-world systems, and certainly not about the rainforest.  Were we to continue down this path, the following resources might be useful:

Principia Cybernetica
 Applied systems theory is often called cybernetics, although this is not to be confused with the science-fiction term used to describe the science of building cyborgs.

MIT's System Dynamics Group  A project of the Sloan School of Management, which is an indication of where systems theory is truly applicable.

John McPherson's Living Systems Theory Page  This website, by a professor at University of California San Diego, summarizes the work of his colleague Dr. James G. Miller, the inventor of so-called LST.

Population Modelling

Now that our goal seems to be heading toward population modelling, it is necessary to learn more about this.
One reasonably advanced modelling technique for a single species seems to be The Leslie Population Model .  This models the time-varying age distribution by noting the probability of an individual in a given age group surviving to reach the next age group, and the probability that an individual in a given age group will reproduce.  This model has both the advantage and disadvantage that it treats time as discrete.  This allows better modelling of real populations whose instantaneous survival probabilities (will you survive another second? how about another millisecond?) are not known, but also limits the detail with which it can make predictions (no more resolution than the size of each age group).  This technique might be useful to the group if age-distribution data exists because it can tell us, for instance, how long a population will take to recover, and what its natural population distribution looks like.

Variations on the Leslie model include Usher and Lefkovich models , which take into account not only age but also other factors, such as size.  This is useful for species with very variable maturation rates.  While primarily applied to animals, it could theoretically be employed for plants as well.

For more than one species, the simplest relationship is predator-prey, of which the simplest model is Lotka-Volterra Difference Model . This time-discrete model predicts an endless cycling of predator and prey populations.  This is rarely the case, although the lynx and the hare are a famous example.  Since plants can act as prey, this model could be useful for determining the effect of deforestation on herbivores.  There is also a differential Continuous Lotka-Volterra model as well as a more advanced model taking into account carrying capacities and maximum predation efficiencies.   Another simple relationship between two species is competition for resources .  

Ultimately, population models must be much more complicated in order to attempt to determine their parameters from experimental data. The Science Magazine article Noisy Clockwork: Time Series Analysis of Population Fluctuations in Animals reviews this problem and its proposed solutions so far.  One example is the problem of data representing a system affected both by environmental noise and experimental measurement error.  It is in fact possible to separate these errors, because environmental noise affects the system and thus is echoed later on, while experimental noise has no time correlation.  The types of noise also tend to show different distributions (e.g. Poisson vs. Normal).  The paper includes a review of analyses of the lynx-hare cycle, with the conclusion that there is at least one more additional factor involved, such as hare food supply.  Although the analyses presented here are probably too sophisticated for our group to actually attempt, the article does provide an idea of what common sources of error are, and how reasonable our goal of modelling any given system might be.

MIT Logo

Valid HTML 4.01!