What do we mean by grayscale and color
image enhancement?
Gamma Correction
Contrast enhancement
Unsharp masking (edge enhancement)
Image smoothing
I've put a few operations here that are relatively simple and in wide use for improving the appearance of grayscale and color images. Each one will be described below. The first two, gamma correction and contrast enhancement are trivial to implement because they only require a lookup table to map from the source pixel value to the dest pixel value. The last two, unsharp masking and image smoothing are a little more complicated because they use some set of neighboring pixels in the source to determine the dest pixel value. All four operations work separately on each color sample, so the color implementation for each one just uses its grayscale implementation three times (once for each color sample). There are many other image processing operations that can be included here, and they will be added as occasion permits. If you have a favorite that's not here, ask me.
What other types of enhancement operations exist?
Image restoration takes an image that has been degraded by some known (typically statistical) process and attempts to regenerate something closer to the original image. These methods are often Bayesian, selecting destination pixels using a maximum a posteriori procedure. The basic idea is very simple. You have a statistical model for the degradation process, given as a set of conditional probabilities for observing a specific degraded pixel when you started with some original pixel. You also have an estimate of the prior probabilities for the original image pixels (or groups of them). Bayes law then lets you estimate the posterior conditional probabilities, which are the probabilities that, given an observed (degraded) pixel, you started with some original one; and you select the maximum of these over the set of all possible original ones. By doing this, you select the original image pixel that is most likely to have produced the observed pixel.
If the noise is known to be sparse additive gaussian ("sparse" meaning that only a small fraction of the pixels are affected), an obvious operation to apply is a median filter, which is very good at removing outliers without seriously affecting other pixels. It is nonlinear, has some smoothing effect, and tends to change pixels at sharp edges.
For binary images, enhancement involves a number of operations, such as removal of pepper noise. If the image has been scanned, cleanup can involve deskew, removal of black pixels near the edge, and special operations on binarized pictorial regions. The latter can involve conversion to gray, followed by grayscale enhancement and halftone screening back to binary. Many scanners give you binary output, and if they threshold the pictorial regions without halftoning or dithering, the result is a high contrast image where much of the gray information can be lost.
If the image is a binary scan that is composed of
connected components, many of which are similar
(such as text characters), it can be enhanced on a component
basis. Use the jbig2 clustering algorithms in Leptonica
to put all instances of connected components that are
sufficiently similar into the same class. Then from
that set of instances, generate a template with less edge
noise. This template can be either binary or grayscale;
the latter gives an improved appearance because the edge
pixels will be gray, causing the edge to appear smoother.
Then build the reconstructed (enhanced)
image by substituting the template image for each of the
instances that were used in deriving it. See the page on
jbig2 for more details.
But things get much more complicated with flat panel displays. Whereas CRTs emit light from the phosphor that is proportional to the electron current hitting it, flat displays use a white illumination with light subtraction due to absorption in the dyes. Flat panels do not have a built-in physical gamma to darken the image; consequently, images look much brighter than on CRTs.
There is also a calibration for displays that has to do with the relative amount of different colors used to produce white. This is expressed by the temperature of the black body radiator that would produce this color distribution. The standard is 6500 degrees K, but manufacturers have found that if they use higher equivalent temperatures, the displays are brighter, and the extra blue is not too noticeable because the eye is relatively insensitive to blue. So, for example, an inexpensive CRT may be calibrated to 12000 degrees K.
What about printing? If you print from a digital camera image without using a gamma correction, the image will appear light and washed out. So printing software typically compensates for the high gamma of the camera.
From a psychophysical viewpoint, people experience intensity logarithmically. Each doubling of the intensity is perceived as an additive constant to the apparent brightness. Our eyes have a dynamic range of about 106, whereas your camera has a mere 8 bits (256 levels) in each color. If the camera were linear, a scene with both light and shadowed regions could have both the shadowed region too dark and the light region washed out. The positive gamma built in to cameras helps somewhat in this respect. The apparent dynamic range in the dark parts of the image is increased, because more of the actual range is assigned to these darker pixels. This gives the camera a logarithmic response. Of course, the lighter part gets even lighter, but not by as much.
So gamma mapping is used both to compensate for nonlinearities
in the display devices and to increase the apparent dynamic
range. The output is related to
the input by raising it to the factor (1 / gamma), properly
scaled so that the end points are not adjusted. The mapping function
looks like this:
The plot was made using prog/gammatest.c. If gamma < 1.0, the image is darkened, with the biggest effect happening for the dark (low input) pixel values. If gamma > 1.0, the image is brightened overall, with the largest changes happening again for the dark shadows.
We provide a top-level interface pixGammaCorrect()
in enhance.c. For display on a CRT, depending on
the source of the image, you may want to apply a gamma
in the range 1.5 to 2.0. For printing, again depending on
the image source, you may want to apply a gamma that is less
than 1.0 to darken the image.
How does one generate a sigmoid curve? One obvious way
is to integrate under a gaussian; this gives a set of
curves with a single parameter. Unfortunately, such an
integral is not an elementary function, so you'd have to
use a table. As an alternative, consider integrating under
a lorentzian. The lorentzian goes as 1/(a2 + x2),
and consequently has large tails compared to the gaussian.
But the lorentzian integrates simply to the arctan function.
This makes a transition between -pi/2 and +pi/2 as the
angle goes from large negative to large positive values.
Using a single parameter to scale the angle, the result is
to take a slice of the function centered about 0. The
parameter is just the width of the slice, in appropriate units.
As the parameter approaches 0, the width gets small, so we're
using the arctan function near 0, which is linear and hence
sets the output equal to the input. As
the parameter increases, the contrast increases. The output
is scaled and translated so that the min and max values of
input and output coincide (here, at 0 and 255). The mapping
function looks like this:
The plot was made using prog/contrasttest.c.
See that file or the implementation in
pixEnhanceContrastGray() for details.
Values of the input parameter greater than 0.0 increase
the contrast. (Values less than 0.0 should decrease it;
this is not implemented, however.)
The top level interface, which takes both 8 bpp grayscale
and full color images, is pixEnhanceContrast()
in enhance.c.
The method implemented here is called unsharp masking. The high pass "edge" image is generated by convolving the image with an approximation to a laplacian filter. In such a filter, the center has a value 1.0 and some set of N surrounding pixels each has a value -1.0/N. For a 3x3 filter, this would look like
-1/8 -1/8 -1/8 -1/8 1 -1/8 -1/8 -1/8 -1/8We implement this 3x3 high-pass filter by first generating a low-pass image using a 3x3 smoothing filter,
1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9and then subtracting it from the original image. The result is
-1/9 -1/9 -1/9 -1/9 8/9 -1/9 -1/9 -1/9 -1/9which is identical with the laplacian given above except for an overall scaling factor of 8/9.
Once the edge image has been generated, some fraction of it is added to the original image. Thus, there are two parameters: