THE CREATION OF STANDARD
DATA SETS FOR THE EVALUATION OF NEW SPATIAL ANALYSIS METHODS
Gregg
Petrie, Senior Research
Scientist
National Security Directorate
P.O
509 372 6057
gregg.petrie@pnl.gov
Ian
Anderson, Senior Product
Manager
Leica Geosystems GIS
and Mapping
ian.anderson@gis.leica-geosystems.com
(404) 248-9000 ext 2306
Scott A.
Bennett, Vice
President - Sales
ImageLinks, Inc.
Western Regional Office:
Tel: 303-215-1700
cell 303 517 4329
sbennett@imagelinks.com
Haans
Wesley Fisk, Remote
Sensing
Analyst
Remote Sensing
2222 West 2300 South
hfisk@fs.fed.us
(801) 975-3760
Center
for Precision Agriculture Systems
Irrigated Ag.
509-786-9257
voice; 786-9370 fax
eileen_perry@wsu.edu
Thomas
K. Windholz, Associate GIS Director
208 282 5606
ABSTRACT
Remote sensing
systems and supporting technologies have improved recently in several ways that
promote the development of new algorithms.
However, a set of standard imagery is needed to promote the objective
evaluation and comparison of new methods that will be developed to exploit new
opportunities. Standard images and
corresponding ground truth maps require a number of useful characteristics: they
should be complete, widely recognized, easily accessible, environmentally
representative, and well understood. To
develop a definitive collection of data sets with these characteristics, it
will be necessary to identify and generate ground truth maps for a variety of
different environment types, such as urban, industrial, agricultural, forest,
and rangeland, for example, that are both important and representative. For each environmental category, it will also
be necessary to identify the representative classes that are complete in both
an academic sense and useful in a practical, commercial sense. Once this is done, a set of images will be
found and acquired that captures the range of spectral (e.g., the number and
type of bands), spatial (pixel size), and quality (8 bits versus 11 bits
numeric precision) characteristics available to the remote sensing community. Ideally, these image data sets would include
both the processed (e.g., registered and atmospherically corrected) imagery and
the raw source data. They must include
both the complete documentation/metadata to explain the image processing that
was done, and complete ground truth registered to 1/10 pixel.
Remote sensing systems and supporting technology
have improved recently in several ways that promote the development of new
algorithms. For example, the QuickBird system offers both submeter
panchromatic and 2.4-meter multispectral band
imagery. Hyperion offers new
hyperspectral opportunities, and Moderate Resolution Imaging Spectroradiometer
(MODIS) a National Aeronautics and Space Administration (NASA)/Earth
Observation Satellite (EOS) instrument, offers high temporal sampling. In addition,
A collection of SI data sets would also
provide a useful tool for education that would (1) help reduce cost for
individual instructors; (2) help ensure a consistent high level of instruction
across institutions; and (3) provide a common experience among students that
can help them both communicate with their peers and move between different
software packages for their work. As an
example, if the same training examples were used by both Imagine and ENVI, it
would be easier to transfer between the two systems.
Whereas
the value of using SI to promote the objective evaluation of new methods has
been well demonstrated in other disciplines, the selection and presentation of
SI is not necessarily a trivial task. The purpose of this short paper is to
present an initial discussion of some of the issues to provide a starting point
for further discussion and action within the remote sensing community.
We suggest that SI and the corresponding auxiliary
data and ground truth information would ideally have a number of useful
characteristics that include the following:
IMPLEMENTATION STRATEGY
The above SI characteristics imply that the
effort to generate a set of SI will not be inconsequential. Coming to agreement
on representative sites to generate imagery and auxiliary data for an SI set
could be an expensive and time-consuming process. However, there are possible strategies to reduce
cost. Instead of simply collecting raw
image data, providing complex models that allow users to create their own data
sets to meet their specific needs may be more appropriate. An important advantage of this strategy would
be that because the user would fully understand the data, there would be less
ambiguity in interpreting the results. The
investigator could create a wide range of test cases to fully exercise the new
methodology. A major disadvantage would
be that with model imagery, there is perhaps less chance for informative
surprises. Real data do not include any hidden
model basis and can therefore sometimes provide a more satisfying test. A compromise may be to modify an SI data set
with model results or imagery from other sites. For instance, to provide a compact test set, it
may be efficient to cut out features from several data sets and combine them
into one small image. Such a strategy
may work well for testing new classification methodology that does not consider
information from its adjacent pixels. Alternatively, the parameters that a
model uses would have to be calculated from real imagery. As an example, if a model required a mean and
standard deviation for spectral signatures for oak trees used to generate Landsat images, the users would be required to use a
standard set of parameters with a well-understood history.
One way to make the generation of SI more
practical would be to use current channels of distribution. For instance, both
the commercial and open source communities have expressed a clear interest in
supporting the exposure, maintenance, and distribution of SI sets to the remote
sensing community. The open source
community has an extensive Internet infrastructure that can naturally support
the efficient distribution of the standardized data sets. The commercial software vendors can use the
SI sets as training examples and thus support both their effective distribution
and understanding.
CONCLUSION
In this short paper, we have suggested that a
set or sets of standard imagery, documentation, and supporting auxiliary data could
be important to promote the objective evaluation and comparison of new methods
that will be developed to in response to new capabilities now being offered to
the remote sensing community. Metrics
and methodology to develop these data sets were also discussed. However, these ideas were offered only has a
starting point to promote more discussion of the complex issues involved in the
creation of standard data sets. More
work is clearly needed to further develop the initial concepts presented in
this paper.
REFERENCES
Chen, C. T., K. S. Chen,
et al. (2003). The use of fully polarimetric
information for the fuzzy neural classification of SAR images. IEEE Transactions on Geoscience
and Remote Sensing, 41(9):2089-2100.
Dean,
G., M. Oimoen, et al. (2002). The National
Elevation Dataset. Photogrammetric
Engineering & Remote Sensing, 68(1):5.
Giacomuzzi, S. M., P. Springer, et
al. (1998). The Austrian Academic Computer Network and its
usefulness for teleradiology. Journal of Telemedicine and Telecare, 4:41-42.
Jones
D.R., G.M. Petrie, and S.E. Thompson. (2003). An Overview of
Beowulf Cluster Computing for Remote Sensing Applications. In: Proceedings of the American Society for Photogrammetry & Remote Sensing 2003 Annual Conference,
May 5-9, 2003, Anchorage, Alaska.
Toet, A., P. Bijl, et al. (2001). Image dataset for
testing search and detection models.
Optical Engineering, 40(9):1760-1767.
Yang,
Z. W. and F. S. Cohen. (1999). Image registration and object recognition using
affine invariants and convex hulls. IEEE
Transactions on Image Processing, 8(7):934-946.