Thursday, September 11, 2014

Lab 1: Removing Data Redundancy

Goal and Background
The goal of this lab is to learn the skills vital for extracting statistical data from satellite images, analyze image correlation models, and interpret the information from the correlation analysis. These skills are necessary to remove data redundancy when preparing data for an image processing project. Redundant data is unfavorable because it doesn't display any new information, and it extends duration of computer processing when running complex models. Because these models often take a long time to run, it is important to remove any unnecessary processing time and thus speed up the process.

Methods
Because not all remotely sensed images have data redundancy, it is important to first create a visual model that quickly reveals any redundancy between bands in the image. This visual model is called a feature space plot. These models depict the brightness values of two bands in a plot diagram. Once the models are completed, it is very easy to see the relationships between bands. If the plot has a broad spread, then the two bands have a low correlation and don’t share redundant data (Figure 1). If the plot has a narrow spread, then the two bands have a higher correlation that may be a source of data redundancy (Figure 2).

Figure 2: A space plot with a narrow spread, signifying
redundant information.
Figure 1: A space plot with a broad spread, signifying
unique information.
To see this process in action, we used ERDAS Imagine to create feature space plot models between all six bands in a remotely sensed image of Eau Claire (Figure 3). The process produced 15 models total, which can all be seen in Figure 4. This process is an easy way to visualize relationships between bands in an image, but it doesn’t reveal any precise information about the correlation of the two bands. To do this, we conducted a correlation analysis. This process calculates the degree of interrelationship between two bands by comparing the sum of each band’s brightness values. The result of the correlation calculation is a coefficient between negative one and positive one, where values further from zero (closer to -1 or 1) indicate high association while values close to zero indicate low association. High association between two bands creates data redundancy, while low association assures unique information from each band. Generally, if two bands have a correlation value above 0.95 it is seen as redundant data, and one of the bands should be removed.


Figure 3: The remote sensed image of the Eau Claire area that we are checking for redundancy.

Figure 4: The space plot results for each relationship between the six different bands in the image.

To practice this, we used ERDAS Imagine to create models that extract matrix information from raster images, using a function that displays correlation data. Figure 5 shows the Model Maker that was used in ERDAS. For this exercise, we used the image of Eau Claire from before (Figure 3), a high resolution image of the Florida Keys (Figure 6), and a high resolution image of the Bengal Province of Bangladesh (Figure 7). In the Model Maker, these raster image files were connected to the raster object. From there a function was assigned to the function object. The function we used was ‘CORRELATION (<raster>, IGNORE 0)’, replacing <raster> with the file of the raster that we were analyzing. This function then exported a matrix of the correlation values between each object.

Figure 5: The model that was created to create a correlation matrix. The top
object is the input file (the image), the middle object is the function, and the
bottom object is the output file (the matrix).

Figure 6: The remotely sensed image of the
Florida Keys.
Figure 7: The remotely sensed image of the
Bengal Province in Bangladesh.


Results
From the feature space plots seen in Figure 4, it is quite clear that there is redundant data (notice the plots with narrow spread). This is made even clearer after completing the correlation analysis of the Eau Claire image, and looking at its correlation matrix (Figure 8). Bands 2 and 3 have a very high correlation value of 0.9427, which is a strong sign of data redundancy. From here, a decision must be made as to which band should be removed. The Florida Keys image and the Bangladesh image both have significant data redundancy as well, as seen in Figures 9 and 10. Each matrix displays a correlation value that is well over 0.95. In each of these images, a band would have to be removed in order to eliminate data redundancy.

Figure 8: The Eau Claire correlation matrix. This displays the correlation values between
each relationship between the bands. Highlighted is the correlation between bands 2 and 3, which
has the highest correlation (redundant data).

Figure 9: The Florida Keys correlation matrix. Highlighted is the highest correlation value,
showing that the data from bands 1 and 2 are very highly correlated.

Figure 10: The Bengal, Bangladesh correlation matrix with the highest correlation value highlighted.

Sources
The Eau Claire image is from Earth Resources Observation and Science Center, USGS.
The Florida Keys and Bangladesh images are from Global Land Cover Facility, at www.landcover.org

No comments:

Post a Comment