Debating Complexity in Modeling

Randy Hunt (rjhunt@usgs.gov)
U.S. Geological Survey, 8505 Research Way, Middleton, Wisconsin 53562, USA
Chunmiao Zheng (czheng@ua.edu)
Department of Geology, University of Alabama, Tuscaloosa 35487, USA

Complexity in modeling would seem to be an issue of universal importance throughout the geosciences, perhaps throughout all science, if the debate this year among groundwater modelers is any indication. During the discussion the following questions and observations made up the heart of the debate.

As scientists trying to understand the natural world, how should our effort be apportioned? We know that the natural world is characterized by complex and interrelated processes. Yet do we need to explicitly incorporate these intricacies to perform the tasks we are charged with? In this era of expanding computer power and development of sophisticated preprocessors and postprocessors, are bigger machines making better models? Put another way, do we understand the natural world better now with all these advancements in our simulation ability? Today the public's patience for long-term projects producing indeterminate results is wearing thin. This increases pressure on the investigator to use the appropriate technology efficiently. On the other hand, bringing scientific results into the legal arena opens up a new dimension to the issue: to the layperson, a tool that includes more of the complexity known to exist in the real world is expected to provide the more scientifically valid answer.

These were among the issues addressed in a special session, "Groundwater models: How much complexity is warranted?" at the AGU Spring 1998 Meeting. Discussions ranged from philosophical to regulatory.

The large data requirements required to construct appropriate complex models were a recurring theme. Charles Andrews in his paper, "Complexity of groundwater models: A consultant's perspective," observed that the computing power has increased by 100 fold, but the average model run time -- the amount of time that gives a modeler enough runs to understand the system -- is still around an hour. In this case it appears that complexity is driven by the speed of the computer on the modeler's desk and the project budget. However, Andrews noted, while the average model run time has generally not increased, the amount of time to assess the results of the model has increased drastically. This indicates that complexity can manifest itself in new ways -- for example, not in the time it takes to solve the problem but rather in the time it takes to understand that solution. A commonly expressed fear was that this increased time resulted in less time spent understanding the system and more time spent constructing and managing data input and output.

Mary Anderson, revisiting a 1983 Ground Water editorial in her talk, "Model complexity: Does the Emperor have too many clothes?" observed that during the past 15 years problematic areas (for example, untrained modelers) have become less troublesome, but other conclusions from 1983 still hold. Field data sets commensurate to the model objective are still needed, and these data needs have become more demanding as models have become more complex. In that sense, it is not the amount but the quality of the clothes that is important. In addition, the amount of "clothes" (or complexity) the Emperor has is driven in many cases by the modeling objective.

Three examples were provided in Anderson's talk: a parsimonious model constructed for water supply concerns, a research model developed to test a new lake package module for MODFLOW, and a model constructed by a company looking to obtain a permit to mine in an environmentally sensitive area. The mine model had the largest amount of regulatory and public involvement and also was the most complex. It was so complex, in fact, that the regulators relied on a simple two-dimensional model constructed after the complex model to assess the appropriateness of parameters used (for example, boundary conditions and recharge). These examples underscored the problem with complex models, that careful quality assurance and quality control are required when the input data are large.

These concerns became even more apparent when discussions moved from models of groundwater flow to models of contaminant transport. An interesting exchange of viewpoints between Michael Wallace and Chuck Byrum concerned the complexity of models developed at the Waste Isolation Pilot Plant (WIPP) in Carlsbad, New Mexico. From the contractor's point of view, two factors had contributed inordinately to the apparent complexity of many models developed for WIPP -- the rigorous quality assurance requirements and the extended peer review process. From the regulator's point of view, however, the two factors actually brought a better balance to the modeling and data acquisition. The consensus was that contaminant transport models do not necessarily enjoy greater success as predictive tools with the inclusion of increasing levels of complexity. Indeed, Don Siegel challenged the audience to provide an example of a contaminant transport model that could adequately predict movement (within certain requirements). The take-home message from his presentation was that we should take models for what they are: powerful heuristic tools, but with limited predictive capabilities.

This is not to say that there were not signs of hope in the morass of complexity. An overview of recent developments in fractal and multifractal scaling of hydraulic conductivity distributions by Fred Molz indicated an increasing degree of heterogeneity with decreasing measurement scale and a belief that fractal-type scaling may help determine appropriate measurement scale and level of complexity needed for transport modeling. An example provided by Ken Bradbury illustrated that a complex fractured rock setting could be simulated using the simple Wellhead Protection Area (WHPA) code if a well's zone of contribution was all that was desired. If the time of travel to the well was needed, however, only a much more complex, multilayer model would suffice. In addition, a case was presented by Glen Champion where a coupled lake-groundwater model was able to model drought and recovery of a lake district in northern Wisconsin, something not feasible using simpler techniques.

Others suggested ways to tame the complexity beast using approaches such as regression methods to determine supportable and desirable model complexity presented by Mary Hill; stochastic approaches to accommodate randomly occurring contaminant sources in deterministic models proposed by Chunmiao Zheng; and alternate modeling techniques that allow inclusion of complexity in the near-field but simple models in the far-field flow systems discussed by Otto Strack. It was apparent that the issue is not complexity per se, but the appropriate application of complexity when and where it is needed.

Henk Haitjema succinctly summarized the issue in "Step-wise groundwater flow modeling: Keep it simple" by demonstrating that in many cases simple answers can convey a large part of the complex answer. In so doing, he noted the distinction between "modeling" (constructing a model to answer a specific objective) and "simulation" (constructing a model to mimic as many aspects of a portion of the natural world as possible). If models are kept in the context of their objective, we should feel comfortable resisting the siren of complexity and construct simpler, less encompassing models. It is the responsibility of the scientist, not the lawyer, to construct the appropriate entity that is by definition a simplification of reality.

The special session, "Groundwater models: How much complexity is warranted?" was held at the AGU Spring Meeting in Boston, Massachusetts, May 27, 1998.

EOS , Transactions, American Geophysical Union
Vol. 80 , No. 3 , p. 29, January 19, 1999


1999 AGU